How to read a CSV file and then save it as JSON in Spark Scala?
I am trying to read a CSV file that has around 7 million rows, and 22 columns.
How to save it as a JSON file after reading the CSV in a Spark Dataframe?
scala apache-spark apache-spark-sql
add a comment |
I am trying to read a CSV file that has around 7 million rows, and 22 columns.
How to save it as a JSON file after reading the CSV in a Spark Dataframe?
scala apache-spark apache-spark-sql
add a comment |
I am trying to read a CSV file that has around 7 million rows, and 22 columns.
How to save it as a JSON file after reading the CSV in a Spark Dataframe?
scala apache-spark apache-spark-sql
I am trying to read a CSV file that has around 7 million rows, and 22 columns.
How to save it as a JSON file after reading the CSV in a Spark Dataframe?
scala apache-spark apache-spark-sql
scala apache-spark apache-spark-sql
edited Nov 22 '18 at 16:54
James Z
11.2k71935
11.2k71935
asked Nov 22 '18 at 8:58
Sayan SahooSayan Sahoo
226
226
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Read a CSV
file as a dataframe
val spark = SparkSession.builder().master("local[2]").appname("test").getOrCreate
val df = spark.read.csv("path to csv")
Now you can perform some operation to df
and save as JSON
df.write.json("output path")
Hope this helps!
I tried to do that, but it is showing SparkException, IOException. And in error it is showing "Job is aborted while writing the rows". I don't know why. Can you help? I'm new to Spark, that is why finding it difficult to understand.
– Sayan Sahoo
Nov 22 '18 at 9:40
Why did not you shared what issue you faced, what you tried, can you share the error log?
– Shankar Koirala
Nov 22 '18 at 10:00
ERROR Utils: Aborting task java.io.IOException: (null) entry in command string: null chmod 0644 D:sample.json_temporary_temporaryattempt_20181122150723_0003_m_000000_0part-00000-448b77ae-c17d-45fe-bba0-a6495fd5c6bd-c000.json at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:762) at org.apache.hadoop.util.Shell.execCommand(Shell.java:859) at org.apache.hadoop.util.Shell.execCommand(Shell.java:842) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:661)
– Sayan Sahoo
Nov 22 '18 at 10:26
did you already checked stackoverflow.com/questions/48010634/…?
– Shankar Koirala
Nov 22 '18 at 12:52
Thank you, the issue is resolved now. :)
– Sayan Sahoo
Nov 22 '18 at 13:35
|
show 1 more comment
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53427152%2fhow-to-read-a-csv-file-and-then-save-it-as-json-in-spark-scala%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Read a CSV
file as a dataframe
val spark = SparkSession.builder().master("local[2]").appname("test").getOrCreate
val df = spark.read.csv("path to csv")
Now you can perform some operation to df
and save as JSON
df.write.json("output path")
Hope this helps!
I tried to do that, but it is showing SparkException, IOException. And in error it is showing "Job is aborted while writing the rows". I don't know why. Can you help? I'm new to Spark, that is why finding it difficult to understand.
– Sayan Sahoo
Nov 22 '18 at 9:40
Why did not you shared what issue you faced, what you tried, can you share the error log?
– Shankar Koirala
Nov 22 '18 at 10:00
ERROR Utils: Aborting task java.io.IOException: (null) entry in command string: null chmod 0644 D:sample.json_temporary_temporaryattempt_20181122150723_0003_m_000000_0part-00000-448b77ae-c17d-45fe-bba0-a6495fd5c6bd-c000.json at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:762) at org.apache.hadoop.util.Shell.execCommand(Shell.java:859) at org.apache.hadoop.util.Shell.execCommand(Shell.java:842) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:661)
– Sayan Sahoo
Nov 22 '18 at 10:26
did you already checked stackoverflow.com/questions/48010634/…?
– Shankar Koirala
Nov 22 '18 at 12:52
Thank you, the issue is resolved now. :)
– Sayan Sahoo
Nov 22 '18 at 13:35
|
show 1 more comment
Read a CSV
file as a dataframe
val spark = SparkSession.builder().master("local[2]").appname("test").getOrCreate
val df = spark.read.csv("path to csv")
Now you can perform some operation to df
and save as JSON
df.write.json("output path")
Hope this helps!
I tried to do that, but it is showing SparkException, IOException. And in error it is showing "Job is aborted while writing the rows". I don't know why. Can you help? I'm new to Spark, that is why finding it difficult to understand.
– Sayan Sahoo
Nov 22 '18 at 9:40
Why did not you shared what issue you faced, what you tried, can you share the error log?
– Shankar Koirala
Nov 22 '18 at 10:00
ERROR Utils: Aborting task java.io.IOException: (null) entry in command string: null chmod 0644 D:sample.json_temporary_temporaryattempt_20181122150723_0003_m_000000_0part-00000-448b77ae-c17d-45fe-bba0-a6495fd5c6bd-c000.json at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:762) at org.apache.hadoop.util.Shell.execCommand(Shell.java:859) at org.apache.hadoop.util.Shell.execCommand(Shell.java:842) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:661)
– Sayan Sahoo
Nov 22 '18 at 10:26
did you already checked stackoverflow.com/questions/48010634/…?
– Shankar Koirala
Nov 22 '18 at 12:52
Thank you, the issue is resolved now. :)
– Sayan Sahoo
Nov 22 '18 at 13:35
|
show 1 more comment
Read a CSV
file as a dataframe
val spark = SparkSession.builder().master("local[2]").appname("test").getOrCreate
val df = spark.read.csv("path to csv")
Now you can perform some operation to df
and save as JSON
df.write.json("output path")
Hope this helps!
Read a CSV
file as a dataframe
val spark = SparkSession.builder().master("local[2]").appname("test").getOrCreate
val df = spark.read.csv("path to csv")
Now you can perform some operation to df
and save as JSON
df.write.json("output path")
Hope this helps!
answered Nov 22 '18 at 9:18
Shankar KoiralaShankar Koirala
11.8k31641
11.8k31641
I tried to do that, but it is showing SparkException, IOException. And in error it is showing "Job is aborted while writing the rows". I don't know why. Can you help? I'm new to Spark, that is why finding it difficult to understand.
– Sayan Sahoo
Nov 22 '18 at 9:40
Why did not you shared what issue you faced, what you tried, can you share the error log?
– Shankar Koirala
Nov 22 '18 at 10:00
ERROR Utils: Aborting task java.io.IOException: (null) entry in command string: null chmod 0644 D:sample.json_temporary_temporaryattempt_20181122150723_0003_m_000000_0part-00000-448b77ae-c17d-45fe-bba0-a6495fd5c6bd-c000.json at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:762) at org.apache.hadoop.util.Shell.execCommand(Shell.java:859) at org.apache.hadoop.util.Shell.execCommand(Shell.java:842) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:661)
– Sayan Sahoo
Nov 22 '18 at 10:26
did you already checked stackoverflow.com/questions/48010634/…?
– Shankar Koirala
Nov 22 '18 at 12:52
Thank you, the issue is resolved now. :)
– Sayan Sahoo
Nov 22 '18 at 13:35
|
show 1 more comment
I tried to do that, but it is showing SparkException, IOException. And in error it is showing "Job is aborted while writing the rows". I don't know why. Can you help? I'm new to Spark, that is why finding it difficult to understand.
– Sayan Sahoo
Nov 22 '18 at 9:40
Why did not you shared what issue you faced, what you tried, can you share the error log?
– Shankar Koirala
Nov 22 '18 at 10:00
ERROR Utils: Aborting task java.io.IOException: (null) entry in command string: null chmod 0644 D:sample.json_temporary_temporaryattempt_20181122150723_0003_m_000000_0part-00000-448b77ae-c17d-45fe-bba0-a6495fd5c6bd-c000.json at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:762) at org.apache.hadoop.util.Shell.execCommand(Shell.java:859) at org.apache.hadoop.util.Shell.execCommand(Shell.java:842) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:661)
– Sayan Sahoo
Nov 22 '18 at 10:26
did you already checked stackoverflow.com/questions/48010634/…?
– Shankar Koirala
Nov 22 '18 at 12:52
Thank you, the issue is resolved now. :)
– Sayan Sahoo
Nov 22 '18 at 13:35
I tried to do that, but it is showing SparkException, IOException. And in error it is showing "Job is aborted while writing the rows". I don't know why. Can you help? I'm new to Spark, that is why finding it difficult to understand.
– Sayan Sahoo
Nov 22 '18 at 9:40
I tried to do that, but it is showing SparkException, IOException. And in error it is showing "Job is aborted while writing the rows". I don't know why. Can you help? I'm new to Spark, that is why finding it difficult to understand.
– Sayan Sahoo
Nov 22 '18 at 9:40
Why did not you shared what issue you faced, what you tried, can you share the error log?
– Shankar Koirala
Nov 22 '18 at 10:00
Why did not you shared what issue you faced, what you tried, can you share the error log?
– Shankar Koirala
Nov 22 '18 at 10:00
ERROR Utils: Aborting task java.io.IOException: (null) entry in command string: null chmod 0644 D:sample.json_temporary_temporaryattempt_20181122150723_0003_m_000000_0part-00000-448b77ae-c17d-45fe-bba0-a6495fd5c6bd-c000.json at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:762) at org.apache.hadoop.util.Shell.execCommand(Shell.java:859) at org.apache.hadoop.util.Shell.execCommand(Shell.java:842) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:661)
– Sayan Sahoo
Nov 22 '18 at 10:26
ERROR Utils: Aborting task java.io.IOException: (null) entry in command string: null chmod 0644 D:sample.json_temporary_temporaryattempt_20181122150723_0003_m_000000_0part-00000-448b77ae-c17d-45fe-bba0-a6495fd5c6bd-c000.json at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:762) at org.apache.hadoop.util.Shell.execCommand(Shell.java:859) at org.apache.hadoop.util.Shell.execCommand(Shell.java:842) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:661)
– Sayan Sahoo
Nov 22 '18 at 10:26
did you already checked stackoverflow.com/questions/48010634/…?
– Shankar Koirala
Nov 22 '18 at 12:52
did you already checked stackoverflow.com/questions/48010634/…?
– Shankar Koirala
Nov 22 '18 at 12:52
Thank you, the issue is resolved now. :)
– Sayan Sahoo
Nov 22 '18 at 13:35
Thank you, the issue is resolved now. :)
– Sayan Sahoo
Nov 22 '18 at 13:35
|
show 1 more comment
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53427152%2fhow-to-read-a-csv-file-and-then-save-it-as-json-in-spark-scala%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown