How to pass Spark job properties to DataProcSparkOperator in Airflow?
I am trying to execute Spark jar on Dataproc using Airflow's DataProcSparkOperator. The jar is located on GCS, and I am creating Dataproc cluster on the fly and then executing this jar on the newly created Dataproc cluster.
I am able to execute this with DataProcSparkOperator of Airflow with default settings, but I am not able to configure Spark job properties (e.g. --master
, --deploy-mode
, --driver-memory
etc.).
From documentation of airflow didn't got any help. Also tried many things but didn't worked out.
Help is appreciated.
apache-spark airflow google-cloud-dataproc airflow-scheduler google-cloud-composer
add a comment |
I am trying to execute Spark jar on Dataproc using Airflow's DataProcSparkOperator. The jar is located on GCS, and I am creating Dataproc cluster on the fly and then executing this jar on the newly created Dataproc cluster.
I am able to execute this with DataProcSparkOperator of Airflow with default settings, but I am not able to configure Spark job properties (e.g. --master
, --deploy-mode
, --driver-memory
etc.).
From documentation of airflow didn't got any help. Also tried many things but didn't worked out.
Help is appreciated.
apache-spark airflow google-cloud-dataproc airflow-scheduler google-cloud-composer
add a comment |
I am trying to execute Spark jar on Dataproc using Airflow's DataProcSparkOperator. The jar is located on GCS, and I am creating Dataproc cluster on the fly and then executing this jar on the newly created Dataproc cluster.
I am able to execute this with DataProcSparkOperator of Airflow with default settings, but I am not able to configure Spark job properties (e.g. --master
, --deploy-mode
, --driver-memory
etc.).
From documentation of airflow didn't got any help. Also tried many things but didn't worked out.
Help is appreciated.
apache-spark airflow google-cloud-dataproc airflow-scheduler google-cloud-composer
I am trying to execute Spark jar on Dataproc using Airflow's DataProcSparkOperator. The jar is located on GCS, and I am creating Dataproc cluster on the fly and then executing this jar on the newly created Dataproc cluster.
I am able to execute this with DataProcSparkOperator of Airflow with default settings, but I am not able to configure Spark job properties (e.g. --master
, --deploy-mode
, --driver-memory
etc.).
From documentation of airflow didn't got any help. Also tried many things but didn't worked out.
Help is appreciated.
apache-spark airflow google-cloud-dataproc airflow-scheduler google-cloud-composer
apache-spark airflow google-cloud-dataproc airflow-scheduler google-cloud-composer
edited Jan 2 at 9:55
Igor Dvorzhak
1,020615
1,020615
asked Jan 1 at 17:31


Abhijit MehetreAbhijit Mehetre
1814
1814
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
To configure Spark job through DataProcSparkOperator you need to use dataproc_spark_properties
parameter.
For example, you can set deployMode
like this:
DataProcSparkOperator(
dataproc_spark_properties={ 'spark.submit.deployMode': 'cluster' })
In this answer you can find more details.
1
Thanks Igor Dvorzhak for the quick response. It worked and saved my day..!!!
– Abhijit Mehetre
Jan 7 at 12:17
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53997513%2fhow-to-pass-spark-job-properties-to-dataprocsparkoperator-in-airflow%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
To configure Spark job through DataProcSparkOperator you need to use dataproc_spark_properties
parameter.
For example, you can set deployMode
like this:
DataProcSparkOperator(
dataproc_spark_properties={ 'spark.submit.deployMode': 'cluster' })
In this answer you can find more details.
1
Thanks Igor Dvorzhak for the quick response. It worked and saved my day..!!!
– Abhijit Mehetre
Jan 7 at 12:17
add a comment |
To configure Spark job through DataProcSparkOperator you need to use dataproc_spark_properties
parameter.
For example, you can set deployMode
like this:
DataProcSparkOperator(
dataproc_spark_properties={ 'spark.submit.deployMode': 'cluster' })
In this answer you can find more details.
1
Thanks Igor Dvorzhak for the quick response. It worked and saved my day..!!!
– Abhijit Mehetre
Jan 7 at 12:17
add a comment |
To configure Spark job through DataProcSparkOperator you need to use dataproc_spark_properties
parameter.
For example, you can set deployMode
like this:
DataProcSparkOperator(
dataproc_spark_properties={ 'spark.submit.deployMode': 'cluster' })
In this answer you can find more details.
To configure Spark job through DataProcSparkOperator you need to use dataproc_spark_properties
parameter.
For example, you can set deployMode
like this:
DataProcSparkOperator(
dataproc_spark_properties={ 'spark.submit.deployMode': 'cluster' })
In this answer you can find more details.
answered Jan 1 at 18:35
Igor DvorzhakIgor Dvorzhak
1,020615
1,020615
1
Thanks Igor Dvorzhak for the quick response. It worked and saved my day..!!!
– Abhijit Mehetre
Jan 7 at 12:17
add a comment |
1
Thanks Igor Dvorzhak for the quick response. It worked and saved my day..!!!
– Abhijit Mehetre
Jan 7 at 12:17
1
1
Thanks Igor Dvorzhak for the quick response. It worked and saved my day..!!!
– Abhijit Mehetre
Jan 7 at 12:17
Thanks Igor Dvorzhak for the quick response. It worked and saved my day..!!!
– Abhijit Mehetre
Jan 7 at 12:17
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53997513%2fhow-to-pass-spark-job-properties-to-dataprocsparkoperator-in-airflow%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown