How to read a CSV file and then save it as JSON in Spark Scala?












-3















I am trying to read a CSV file that has around 7 million rows, and 22 columns.



How to save it as a JSON file after reading the CSV in a Spark Dataframe?










share|improve this question





























    -3















    I am trying to read a CSV file that has around 7 million rows, and 22 columns.



    How to save it as a JSON file after reading the CSV in a Spark Dataframe?










    share|improve this question



























      -3












      -3








      -3








      I am trying to read a CSV file that has around 7 million rows, and 22 columns.



      How to save it as a JSON file after reading the CSV in a Spark Dataframe?










      share|improve this question
















      I am trying to read a CSV file that has around 7 million rows, and 22 columns.



      How to save it as a JSON file after reading the CSV in a Spark Dataframe?







      scala apache-spark apache-spark-sql






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 22 '18 at 16:54









      James Z

      11.2k71935




      11.2k71935










      asked Nov 22 '18 at 8:58









      Sayan SahooSayan Sahoo

      226




      226
























          1 Answer
          1






          active

          oldest

          votes


















          0














          Read a CSV file as a dataframe



          val spark = SparkSession.builder().master("local[2]").appname("test").getOrCreate
          val df = spark.read.csv("path to csv")


          Now you can perform some operation to df and save as JSON



          df.write.json("output path")


          Hope this helps!






          share|improve this answer
























          • I tried to do that, but it is showing SparkException, IOException. And in error it is showing "Job is aborted while writing the rows". I don't know why. Can you help? I'm new to Spark, that is why finding it difficult to understand.

            – Sayan Sahoo
            Nov 22 '18 at 9:40











          • Why did not you shared what issue you faced, what you tried, can you share the error log?

            – Shankar Koirala
            Nov 22 '18 at 10:00











          • ERROR Utils: Aborting task java.io.IOException: (null) entry in command string: null chmod 0644 D:sample.json_temporary_temporaryattempt_20181122150723_0003_m_000000_0part-00000-448b77ae-c17d-45fe-bba0-a6495fd5c6bd-c000.json at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:762) at org.apache.hadoop.util.Shell.execCommand(Shell.java:859) at org.apache.hadoop.util.Shell.execCommand(Shell.java:842) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:661)

            – Sayan Sahoo
            Nov 22 '18 at 10:26













          • did you already checked stackoverflow.com/questions/48010634/…?

            – Shankar Koirala
            Nov 22 '18 at 12:52











          • Thank you, the issue is resolved now. :)

            – Sayan Sahoo
            Nov 22 '18 at 13:35











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53427152%2fhow-to-read-a-csv-file-and-then-save-it-as-json-in-spark-scala%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0














          Read a CSV file as a dataframe



          val spark = SparkSession.builder().master("local[2]").appname("test").getOrCreate
          val df = spark.read.csv("path to csv")


          Now you can perform some operation to df and save as JSON



          df.write.json("output path")


          Hope this helps!






          share|improve this answer
























          • I tried to do that, but it is showing SparkException, IOException. And in error it is showing "Job is aborted while writing the rows". I don't know why. Can you help? I'm new to Spark, that is why finding it difficult to understand.

            – Sayan Sahoo
            Nov 22 '18 at 9:40











          • Why did not you shared what issue you faced, what you tried, can you share the error log?

            – Shankar Koirala
            Nov 22 '18 at 10:00











          • ERROR Utils: Aborting task java.io.IOException: (null) entry in command string: null chmod 0644 D:sample.json_temporary_temporaryattempt_20181122150723_0003_m_000000_0part-00000-448b77ae-c17d-45fe-bba0-a6495fd5c6bd-c000.json at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:762) at org.apache.hadoop.util.Shell.execCommand(Shell.java:859) at org.apache.hadoop.util.Shell.execCommand(Shell.java:842) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:661)

            – Sayan Sahoo
            Nov 22 '18 at 10:26













          • did you already checked stackoverflow.com/questions/48010634/…?

            – Shankar Koirala
            Nov 22 '18 at 12:52











          • Thank you, the issue is resolved now. :)

            – Sayan Sahoo
            Nov 22 '18 at 13:35
















          0














          Read a CSV file as a dataframe



          val spark = SparkSession.builder().master("local[2]").appname("test").getOrCreate
          val df = spark.read.csv("path to csv")


          Now you can perform some operation to df and save as JSON



          df.write.json("output path")


          Hope this helps!






          share|improve this answer
























          • I tried to do that, but it is showing SparkException, IOException. And in error it is showing "Job is aborted while writing the rows". I don't know why. Can you help? I'm new to Spark, that is why finding it difficult to understand.

            – Sayan Sahoo
            Nov 22 '18 at 9:40











          • Why did not you shared what issue you faced, what you tried, can you share the error log?

            – Shankar Koirala
            Nov 22 '18 at 10:00











          • ERROR Utils: Aborting task java.io.IOException: (null) entry in command string: null chmod 0644 D:sample.json_temporary_temporaryattempt_20181122150723_0003_m_000000_0part-00000-448b77ae-c17d-45fe-bba0-a6495fd5c6bd-c000.json at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:762) at org.apache.hadoop.util.Shell.execCommand(Shell.java:859) at org.apache.hadoop.util.Shell.execCommand(Shell.java:842) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:661)

            – Sayan Sahoo
            Nov 22 '18 at 10:26













          • did you already checked stackoverflow.com/questions/48010634/…?

            – Shankar Koirala
            Nov 22 '18 at 12:52











          • Thank you, the issue is resolved now. :)

            – Sayan Sahoo
            Nov 22 '18 at 13:35














          0












          0








          0







          Read a CSV file as a dataframe



          val spark = SparkSession.builder().master("local[2]").appname("test").getOrCreate
          val df = spark.read.csv("path to csv")


          Now you can perform some operation to df and save as JSON



          df.write.json("output path")


          Hope this helps!






          share|improve this answer













          Read a CSV file as a dataframe



          val spark = SparkSession.builder().master("local[2]").appname("test").getOrCreate
          val df = spark.read.csv("path to csv")


          Now you can perform some operation to df and save as JSON



          df.write.json("output path")


          Hope this helps!







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 22 '18 at 9:18









          Shankar KoiralaShankar Koirala

          11.8k31641




          11.8k31641













          • I tried to do that, but it is showing SparkException, IOException. And in error it is showing "Job is aborted while writing the rows". I don't know why. Can you help? I'm new to Spark, that is why finding it difficult to understand.

            – Sayan Sahoo
            Nov 22 '18 at 9:40











          • Why did not you shared what issue you faced, what you tried, can you share the error log?

            – Shankar Koirala
            Nov 22 '18 at 10:00











          • ERROR Utils: Aborting task java.io.IOException: (null) entry in command string: null chmod 0644 D:sample.json_temporary_temporaryattempt_20181122150723_0003_m_000000_0part-00000-448b77ae-c17d-45fe-bba0-a6495fd5c6bd-c000.json at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:762) at org.apache.hadoop.util.Shell.execCommand(Shell.java:859) at org.apache.hadoop.util.Shell.execCommand(Shell.java:842) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:661)

            – Sayan Sahoo
            Nov 22 '18 at 10:26













          • did you already checked stackoverflow.com/questions/48010634/…?

            – Shankar Koirala
            Nov 22 '18 at 12:52











          • Thank you, the issue is resolved now. :)

            – Sayan Sahoo
            Nov 22 '18 at 13:35



















          • I tried to do that, but it is showing SparkException, IOException. And in error it is showing "Job is aborted while writing the rows". I don't know why. Can you help? I'm new to Spark, that is why finding it difficult to understand.

            – Sayan Sahoo
            Nov 22 '18 at 9:40











          • Why did not you shared what issue you faced, what you tried, can you share the error log?

            – Shankar Koirala
            Nov 22 '18 at 10:00











          • ERROR Utils: Aborting task java.io.IOException: (null) entry in command string: null chmod 0644 D:sample.json_temporary_temporaryattempt_20181122150723_0003_m_000000_0part-00000-448b77ae-c17d-45fe-bba0-a6495fd5c6bd-c000.json at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:762) at org.apache.hadoop.util.Shell.execCommand(Shell.java:859) at org.apache.hadoop.util.Shell.execCommand(Shell.java:842) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:661)

            – Sayan Sahoo
            Nov 22 '18 at 10:26













          • did you already checked stackoverflow.com/questions/48010634/…?

            – Shankar Koirala
            Nov 22 '18 at 12:52











          • Thank you, the issue is resolved now. :)

            – Sayan Sahoo
            Nov 22 '18 at 13:35

















          I tried to do that, but it is showing SparkException, IOException. And in error it is showing "Job is aborted while writing the rows". I don't know why. Can you help? I'm new to Spark, that is why finding it difficult to understand.

          – Sayan Sahoo
          Nov 22 '18 at 9:40





          I tried to do that, but it is showing SparkException, IOException. And in error it is showing "Job is aborted while writing the rows". I don't know why. Can you help? I'm new to Spark, that is why finding it difficult to understand.

          – Sayan Sahoo
          Nov 22 '18 at 9:40













          Why did not you shared what issue you faced, what you tried, can you share the error log?

          – Shankar Koirala
          Nov 22 '18 at 10:00





          Why did not you shared what issue you faced, what you tried, can you share the error log?

          – Shankar Koirala
          Nov 22 '18 at 10:00













          ERROR Utils: Aborting task java.io.IOException: (null) entry in command string: null chmod 0644 D:sample.json_temporary_temporaryattempt_20181122150723_0003_m_000000_0part-00000-448b77ae-c17d-45fe-bba0-a6495fd5c6bd-c000.json at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:762) at org.apache.hadoop.util.Shell.execCommand(Shell.java:859) at org.apache.hadoop.util.Shell.execCommand(Shell.java:842) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:661)

          – Sayan Sahoo
          Nov 22 '18 at 10:26







          ERROR Utils: Aborting task java.io.IOException: (null) entry in command string: null chmod 0644 D:sample.json_temporary_temporaryattempt_20181122150723_0003_m_000000_0part-00000-448b77ae-c17d-45fe-bba0-a6495fd5c6bd-c000.json at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:762) at org.apache.hadoop.util.Shell.execCommand(Shell.java:859) at org.apache.hadoop.util.Shell.execCommand(Shell.java:842) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:661)

          – Sayan Sahoo
          Nov 22 '18 at 10:26















          did you already checked stackoverflow.com/questions/48010634/…?

          – Shankar Koirala
          Nov 22 '18 at 12:52





          did you already checked stackoverflow.com/questions/48010634/…?

          – Shankar Koirala
          Nov 22 '18 at 12:52













          Thank you, the issue is resolved now. :)

          – Sayan Sahoo
          Nov 22 '18 at 13:35





          Thank you, the issue is resolved now. :)

          – Sayan Sahoo
          Nov 22 '18 at 13:35




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53427152%2fhow-to-read-a-csv-file-and-then-save-it-as-json-in-spark-scala%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Can a sorcerer learn a 5th-level spell early by creating spell slots using the Font of Magic feature?

          Does disintegrating a polymorphed enemy still kill it after the 2018 errata?

          A Topological Invariant for $pi_3(U(n))$