Set up Airflow for Multiple Environments
What's the idiomatic way to setup Airflow so if you have two environments, such as Production-East and Production-West, only the dags from each of these environments show up but they can live in a single repository?
python airflow
add a comment |
What's the idiomatic way to setup Airflow so if you have two environments, such as Production-East and Production-West, only the dags from each of these environments show up but they can live in a single repository?
python airflow
add a comment |
What's the idiomatic way to setup Airflow so if you have two environments, such as Production-East and Production-West, only the dags from each of these environments show up but they can live in a single repository?
python airflow
What's the idiomatic way to setup Airflow so if you have two environments, such as Production-East and Production-West, only the dags from each of these environments show up but they can live in a single repository?
python airflow
python airflow
edited Nov 21 '18 at 8:07
Meghdeep Ray
2,51831838
2,51831838
asked Nov 20 '18 at 20:16
RobRob
1,215828
1,215828
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
The ideal way to achieve this is with named queues.
Have multiple workers set up, some working on Production-East environment and some on Production-West environment. That way both DAGs show up in the UI but they execute only on the worker machines that have that specific environment on them.
From the documentation for queues:
When using the CeleryExecutor, the celery queues that tasks are sent to can be specified. queue is an attribute of BaseOperator, so any task can be assigned to any queue. The default queue for the environment is defined in the airflow.cfg’s celery -> default_queue. This defines the queue that tasks get assigned to when not specified, as well as which queue Airflow workers listen to when started.
Workers can listen to one or multiple queues of tasks. When a worker is started (using the command airflow worker), a set of comma delimited queue names can be specified (e.g. airflow worker -q spark). This worker will then only pick up tasks wired to the specified queue(s).
This can be useful if you need specialized workers, either from a resource perspective (for say very lightweight tasks where one worker could take thousands of tasks without a problem), or from an environment perspective (you want a worker running from within the Spark cluster itself because it needs a very specific environment and security rights).
1
The poster requested a solution wherein the processes for the environment only appear in the UI for said environment. Using a queue to segregate the items won't accomplish this.
– joeb
Nov 21 '18 at 19:29
I agree with your assessment, but this is actually very good advice. I will probably implement the first answer as a temporary solution, and then set this as a longer term goal.
– Rob
Nov 26 '18 at 16:35
Perhaps we could combine these into a single response before I accept that takes this into account? I would like to acknowledge the effort you put into this answer.
– Rob
Nov 26 '18 at 16:36
add a comment |
Have the files for each group put inside a subfolder and then set the dags_folder path to point to the appropriate subfolder for the server.
I appreciate the simplicity of this answer, but also reference my comment below.
– Rob
Nov 26 '18 at 16:36
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53400870%2fset-up-airflow-for-multiple-environments%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
The ideal way to achieve this is with named queues.
Have multiple workers set up, some working on Production-East environment and some on Production-West environment. That way both DAGs show up in the UI but they execute only on the worker machines that have that specific environment on them.
From the documentation for queues:
When using the CeleryExecutor, the celery queues that tasks are sent to can be specified. queue is an attribute of BaseOperator, so any task can be assigned to any queue. The default queue for the environment is defined in the airflow.cfg’s celery -> default_queue. This defines the queue that tasks get assigned to when not specified, as well as which queue Airflow workers listen to when started.
Workers can listen to one or multiple queues of tasks. When a worker is started (using the command airflow worker), a set of comma delimited queue names can be specified (e.g. airflow worker -q spark). This worker will then only pick up tasks wired to the specified queue(s).
This can be useful if you need specialized workers, either from a resource perspective (for say very lightweight tasks where one worker could take thousands of tasks without a problem), or from an environment perspective (you want a worker running from within the Spark cluster itself because it needs a very specific environment and security rights).
1
The poster requested a solution wherein the processes for the environment only appear in the UI for said environment. Using a queue to segregate the items won't accomplish this.
– joeb
Nov 21 '18 at 19:29
I agree with your assessment, but this is actually very good advice. I will probably implement the first answer as a temporary solution, and then set this as a longer term goal.
– Rob
Nov 26 '18 at 16:35
Perhaps we could combine these into a single response before I accept that takes this into account? I would like to acknowledge the effort you put into this answer.
– Rob
Nov 26 '18 at 16:36
add a comment |
The ideal way to achieve this is with named queues.
Have multiple workers set up, some working on Production-East environment and some on Production-West environment. That way both DAGs show up in the UI but they execute only on the worker machines that have that specific environment on them.
From the documentation for queues:
When using the CeleryExecutor, the celery queues that tasks are sent to can be specified. queue is an attribute of BaseOperator, so any task can be assigned to any queue. The default queue for the environment is defined in the airflow.cfg’s celery -> default_queue. This defines the queue that tasks get assigned to when not specified, as well as which queue Airflow workers listen to when started.
Workers can listen to one or multiple queues of tasks. When a worker is started (using the command airflow worker), a set of comma delimited queue names can be specified (e.g. airflow worker -q spark). This worker will then only pick up tasks wired to the specified queue(s).
This can be useful if you need specialized workers, either from a resource perspective (for say very lightweight tasks where one worker could take thousands of tasks without a problem), or from an environment perspective (you want a worker running from within the Spark cluster itself because it needs a very specific environment and security rights).
1
The poster requested a solution wherein the processes for the environment only appear in the UI for said environment. Using a queue to segregate the items won't accomplish this.
– joeb
Nov 21 '18 at 19:29
I agree with your assessment, but this is actually very good advice. I will probably implement the first answer as a temporary solution, and then set this as a longer term goal.
– Rob
Nov 26 '18 at 16:35
Perhaps we could combine these into a single response before I accept that takes this into account? I would like to acknowledge the effort you put into this answer.
– Rob
Nov 26 '18 at 16:36
add a comment |
The ideal way to achieve this is with named queues.
Have multiple workers set up, some working on Production-East environment and some on Production-West environment. That way both DAGs show up in the UI but they execute only on the worker machines that have that specific environment on them.
From the documentation for queues:
When using the CeleryExecutor, the celery queues that tasks are sent to can be specified. queue is an attribute of BaseOperator, so any task can be assigned to any queue. The default queue for the environment is defined in the airflow.cfg’s celery -> default_queue. This defines the queue that tasks get assigned to when not specified, as well as which queue Airflow workers listen to when started.
Workers can listen to one or multiple queues of tasks. When a worker is started (using the command airflow worker), a set of comma delimited queue names can be specified (e.g. airflow worker -q spark). This worker will then only pick up tasks wired to the specified queue(s).
This can be useful if you need specialized workers, either from a resource perspective (for say very lightweight tasks where one worker could take thousands of tasks without a problem), or from an environment perspective (you want a worker running from within the Spark cluster itself because it needs a very specific environment and security rights).
The ideal way to achieve this is with named queues.
Have multiple workers set up, some working on Production-East environment and some on Production-West environment. That way both DAGs show up in the UI but they execute only on the worker machines that have that specific environment on them.
From the documentation for queues:
When using the CeleryExecutor, the celery queues that tasks are sent to can be specified. queue is an attribute of BaseOperator, so any task can be assigned to any queue. The default queue for the environment is defined in the airflow.cfg’s celery -> default_queue. This defines the queue that tasks get assigned to when not specified, as well as which queue Airflow workers listen to when started.
Workers can listen to one or multiple queues of tasks. When a worker is started (using the command airflow worker), a set of comma delimited queue names can be specified (e.g. airflow worker -q spark). This worker will then only pick up tasks wired to the specified queue(s).
This can be useful if you need specialized workers, either from a resource perspective (for say very lightweight tasks where one worker could take thousands of tasks without a problem), or from an environment perspective (you want a worker running from within the Spark cluster itself because it needs a very specific environment and security rights).
answered Nov 21 '18 at 8:07
Meghdeep RayMeghdeep Ray
2,51831838
2,51831838
1
The poster requested a solution wherein the processes for the environment only appear in the UI for said environment. Using a queue to segregate the items won't accomplish this.
– joeb
Nov 21 '18 at 19:29
I agree with your assessment, but this is actually very good advice. I will probably implement the first answer as a temporary solution, and then set this as a longer term goal.
– Rob
Nov 26 '18 at 16:35
Perhaps we could combine these into a single response before I accept that takes this into account? I would like to acknowledge the effort you put into this answer.
– Rob
Nov 26 '18 at 16:36
add a comment |
1
The poster requested a solution wherein the processes for the environment only appear in the UI for said environment. Using a queue to segregate the items won't accomplish this.
– joeb
Nov 21 '18 at 19:29
I agree with your assessment, but this is actually very good advice. I will probably implement the first answer as a temporary solution, and then set this as a longer term goal.
– Rob
Nov 26 '18 at 16:35
Perhaps we could combine these into a single response before I accept that takes this into account? I would like to acknowledge the effort you put into this answer.
– Rob
Nov 26 '18 at 16:36
1
1
The poster requested a solution wherein the processes for the environment only appear in the UI for said environment. Using a queue to segregate the items won't accomplish this.
– joeb
Nov 21 '18 at 19:29
The poster requested a solution wherein the processes for the environment only appear in the UI for said environment. Using a queue to segregate the items won't accomplish this.
– joeb
Nov 21 '18 at 19:29
I agree with your assessment, but this is actually very good advice. I will probably implement the first answer as a temporary solution, and then set this as a longer term goal.
– Rob
Nov 26 '18 at 16:35
I agree with your assessment, but this is actually very good advice. I will probably implement the first answer as a temporary solution, and then set this as a longer term goal.
– Rob
Nov 26 '18 at 16:35
Perhaps we could combine these into a single response before I accept that takes this into account? I would like to acknowledge the effort you put into this answer.
– Rob
Nov 26 '18 at 16:36
Perhaps we could combine these into a single response before I accept that takes this into account? I would like to acknowledge the effort you put into this answer.
– Rob
Nov 26 '18 at 16:36
add a comment |
Have the files for each group put inside a subfolder and then set the dags_folder path to point to the appropriate subfolder for the server.
I appreciate the simplicity of this answer, but also reference my comment below.
– Rob
Nov 26 '18 at 16:36
add a comment |
Have the files for each group put inside a subfolder and then set the dags_folder path to point to the appropriate subfolder for the server.
I appreciate the simplicity of this answer, but also reference my comment below.
– Rob
Nov 26 '18 at 16:36
add a comment |
Have the files for each group put inside a subfolder and then set the dags_folder path to point to the appropriate subfolder for the server.
Have the files for each group put inside a subfolder and then set the dags_folder path to point to the appropriate subfolder for the server.
answered Nov 21 '18 at 1:41
joebjoeb
2,18611519
2,18611519
I appreciate the simplicity of this answer, but also reference my comment below.
– Rob
Nov 26 '18 at 16:36
add a comment |
I appreciate the simplicity of this answer, but also reference my comment below.
– Rob
Nov 26 '18 at 16:36
I appreciate the simplicity of this answer, but also reference my comment below.
– Rob
Nov 26 '18 at 16:36
I appreciate the simplicity of this answer, but also reference my comment below.
– Rob
Nov 26 '18 at 16:36
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53400870%2fset-up-airflow-for-multiple-environments%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
