⚓ T300870 Airflow concurrency limits (2024)

Create Task

Maniphest T300870

  • Edit Task
  • Edit Related Tasks...
  • Edit Related Objects...
  • Mute Notifications
  • Protect as security issue
  • Award Token
  • Flag For Later

Assigned To

Antoine_Quhen

Authored By

Antoine_Quhen
Feb 3 2022, 2:23 PM2022-02-03 14:23:46 (UTC+0)

Tags

  • Data-Engineering-Kanban (Done)
  • Data-Engineering (Transform)
  • Data Pipelines (Done)

Referenced Files

None

Subscribers

Aklapper
Antoine_Quhen
JAllemandou
mforns
mpopov
Ottomata

@JAllemandou @mforns and me are proposing this set of rules as a starting point in airflow.

At environmment level (airflow.cfg):

parallelism: This is the maximum number of tasks that can run concurrently within a single Airflow environment. For example, if this setting is set to 32 then no more than 32 tasks can run at once across all DAGs. Think of this as "maximum active tasks anywhere." If you notice that tasks are stuck queued for extended periods of time, this is a value you may want to increase. By default, this is set to 32.

We may set it explicitly to 32.

max_active_runs_per_dag: This determines the maximum number of active DAG Runs (per DAG) that the Airflow Scheduler can create at any given time. In Airflow, a DAG Run represents an instantiation of a DAG in time, much like a task instance represents an instantiation of a task. This parameter is most relevant if Airflow has to catch up from missed DAG runs, also known as backfilling. Consider how you want to handle these scenarios when setting this parameter. By default, it's set to 16.

We should set it to 2. To avoid 1 dag to take the all the resources.

max_active_tasks_per_dag: (formerly dag_concurrency) This determines the maximum number of tasks that can be scheduled at once, per DAG." Use this setting to prevent any one DAG from taking up too many of the available slots from parallelism or your pools, which helps DAGs be good neighbors to one another. By default, this is set to 16.

We should set it to 2. To make sure 1 dag can't take all resources.

At DAG level (in dag definition file):

max_active_runs_per_dag & max_active_tasks_per_dag could be overriden by:

  • max_active_runs: This is the maximum number of active DAG Runs allowed for the DAG in question. Once this limit is hit, the Scheduler will not create new active DAG Runs. If this setting is not defined, the value of the environment-level setting max_active_runs_per_dag is assumed.

    Coudld be set in each dag definition.

  • concurrency: This is the maximum number of task instances allowed to run concurrently across all active DAG runs for a given DAG. This allows you to set 1 DAG to be able to run 32 tasks at once, while another DAG might only be able to run 16 tasks at once. If this setting is not defined, the value of the environment-level setting max_active_tasks_per_dag is assumed.

    Coudld be set in each dag definition.

A clear explanation of the doc is here: https://www.astronomer.io/guides/airflow-scaling-workers

SubjectRepoBranchLines +/-
airflow: change max_active_runs_per_dag back to 1operations/puppetproduction+1 -1
Set default Airflow concurrency limits for an- airflow instancesoperations/puppetproduction+43 -0
Set default Airflow concurrency limitsoperations/puppetproduction+36 -6

Customize query in gerrit

  • Mentions
Mentioned In
T351388: Add a spark global config for better file commit strategy
Mentioned Here
T347076: NEW BUG REPORT Some DAG run attempts fail because File *_temporary/0 does not exist.

Event Timeline

Antoine_Quhen created this task.Feb 3 2022, 2:23 PM2022-02-03 14:23:46 (UTC+0)

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 3 2022, 2:23 PM2022-02-03 14:23:47 (UTC+0)

Ottomata subscribed.Feb 3 2022, 2:57 PM2022-02-03 14:57:42 (UTC+0)

Comment Actions

I wonder if we should set the global max_active_runs_per_dag higher than 2. I could see cases where we explicitly want to to a big backfill in parallel. Since the work isn't actually on the airflow nodes, the resources taken up there will mostly be waiting for dag runs to finish.

We should set the default dag level max_active_runs to 2 in our base default_args, to allow folks to override this if they need. Or, wait, are you saying that max_active_runs_per_dag is overridable to a larger value already by max_active_runs? if so, then I guess we can just do as you say! :)

JAllemandou added a comment.Feb 3 2022, 2:58 PM2022-02-03 14:58:37 (UTC+0)

Comment Actions

thanks a lot for the great summary @Antoine_Quhen! I asume that we wish to set the default values you suggested, and as much as possible not use the per-dag available config overrides.
My only wonder about default values is the global parallelism couldn't be make bigger if we assume all tasks are low-ressource-consumption for the Airflow machine itself (even more if we use Skein)? Ping @Ottomata on this :)

Ottomata added a comment.Feb 3 2022, 3:09 PM2022-02-03 15:09:42 (UTC+0)

Comment Actions

Ya I'd think we could!

odimitrijevic moved this task from Incoming (new tickets) to Transform on the Data-Engineering board.Feb 6 2022, 11:21 PM2022-02-06 23:21:50 (UTC+0)

• EChetty moved this task from Backlog to Discussed (Radar) on the Data Pipelines board.Feb 7 2022, 4:48 PM2022-02-07 16:48:44 (UTC+0)

Antoine_Quhen reassigned this task from Antoine_Quhen to Ottomata.Feb 7 2022, 5:31 PM2022-02-07 17:31:35 (UTC+0)

Ottomata added a comment.Feb 7 2022, 5:35 PM2022-02-07 17:35:27 (UTC+0)

Comment Actions

Tell me precisely what to change and I will change it!

Antoine_Quhen added a comment.Feb 7 2022, 5:35 PM2022-02-07 17:35:43 (UTC+0)

Comment Actions

+ Lets set retries to 3 by default.

We may remove this line:
https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/blob/24931081b7133e62849a9f54bad4e0ff555690e9/wmf_airflow_common/default_args.py#L7
And add globally: with default_task_retries

Antoine_Quhen claimed this task.Feb 28 2022, 4:55 PM2022-02-28 16:55:10 (UTC+0)

Antoine_Quhen moved this task from Discussed (Radar) to Estimated on the Data Pipelines board.

gerritbot added a comment.Mar 1 2022, 5:06 PM2022-03-01 17:06:17 (UTC+0)

Comment Actions

Change 767220 had a related patch set uploaded (by Aqu; author: Aqu):

[operations/puppet@production] Set default Airflow concurrency limits

https://gerrit.wikimedia.org/r/767220

gerritbot added a project: Patch-For-Review.Mar 1 2022, 5:06 PM2022-03-01 17:06:18 (UTC+0)

Antoine_Quhen moved this task from Next Up to In Progress on the Data-Engineering-Kanban board.Mar 2 2022, 2:22 PM2022-03-02 14:22:35 (UTC+0)

Antoine_Quhen moved this task from In Progress to In Code Review on the Data-Engineering-Kanban board.

Antoine_Quhen moved this task from Estimated to In Review on the Data Pipelines board.

Antoine_Quhen added a comment.Mar 2 2022, 2:24 PM2022-03-02 14:24:50 (UTC+0)

Comment Actions

Related to: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/29

gerritbot added a comment.Mar 2 2022, 2:57 PM2022-03-02 14:57:42 (UTC+0)

Comment Actions

Change 767220 merged by Ottomata:

[operations/puppet@production] Set default Airflow concurrency limits

https://gerrit.wikimedia.org/r/767220

gerritbot added a comment.Mar 2 2022, 3:06 PM2022-03-02 15:06:59 (UTC+0)

Comment Actions

Change 767527 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/puppet@production] Set default Airflow concurrency limits for an- airflow instances

https://gerrit.wikimedia.org/r/767527

gerritbot added a comment.Mar 2 2022, 3:13 PM2022-03-02 15:13:26 (UTC+0)

Comment Actions

Change 767527 merged by Ottomata:

[operations/puppet@production] Set default Airflow concurrency limits for an- airflow instances

https://gerrit.wikimedia.org/r/767527

Maintenance_bot removed a project: Patch-For-Review.Mar 2 2022, 4:10 PM2022-03-02 16:10:48 (UTC+0)

• EChetty moved this task from In Review to Done on the Data Pipelines board.Mar 7 2022, 4:34 PM2022-03-07 16:34:19 (UTC+0)

Antoine_Quhen moved this task from In Code Review to Done on the Data-Engineering-Kanban board.Mar 8 2022, 4:20 PM2022-03-08 16:20:48 (UTC+0)

JArguello-WMF closed this task as Resolved.May 31 2022, 3:31 PM2022-05-31 15:31:34 (UTC+0)

mpopov subscribed.Nov 13 2023, 8:40 PM2023-11-13 20:40:42 (UTC+0)

Comment Actions

There's a good chance this is responsible for T347076: NEW BUG REPORT Some DAG run attempts fail because File *_temporary/0 does not exist.

mpopov mentioned this in T351388: Add a spark global config for better file commit strategy.Nov 17 2023, 12:53 PM2023-11-17 12:53:22 (UTC+0)

gerritbot added a comment.Nov 22 2023, 12:58 PM2023-11-22 12:58:03 (UTC+0)

Comment Actions

Change 976700 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] airflow: change max_active_runs_per_dag back to 1

https://gerrit.wikimedia.org/r/976700

gerritbot added a project: Patch-For-Review.Nov 22 2023, 12:58 PM2023-11-22 12:58:04 (UTC+0)

gerritbot added a comment.Nov 22 2023, 3:37 PM2023-11-22 15:37:24 (UTC+0)

Comment Actions

Change 976700 merged by Btullis:

[operations/puppet@production] airflow: change max_active_runs_per_dag back to 1

https://gerrit.wikimedia.org/r/976700

Maintenance_bot removed a project: Patch-For-Review.Nov 22 2023, 4:10 PM2023-11-22 16:10:39 (UTC+0)

Log In to Comment

Content licensed under Creative Commons Attribution-ShareAlike (CC BY-SA) 4.0 unless otherwise noted; code licensed under GNU General Public License (GPL) 2.0 or later and other open source licenses. By using this site, you agree to the Terms of Use, Privacy Policy, and Code of Conduct. · Wikimedia Foundation · Privacy Policy · Code of Conduct · Terms of Use · Disclaimer · CC-BY-SA · GPL

⚓ T300870 Airflow concurrency limits (2024)
Top Articles
Latest Posts
Article information

Author: Tyson Zemlak

Last Updated:

Views: 5890

Rating: 4.2 / 5 (63 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Tyson Zemlak

Birthday: 1992-03-17

Address: Apt. 662 96191 Quigley Dam, Kubview, MA 42013

Phone: +441678032891

Job: Community-Services Orchestrator

Hobby: Coffee roasting, Calligraphy, Metalworking, Fashion, Vehicle restoration, Shopping, Photography

Introduction: My name is Tyson Zemlak, I am a excited, light, sparkling, super, open, fair, magnificent person who loves writing and wants to share my knowledge and understanding with you.