Spark Java Heap Space





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







3















I have one problem with spark , when i tried to generate the model i get one exception with java heap space that i can't solve .
i tried to put this values on the VM options -Xmx4g but nothing happens.
i tried too adding this parameters to spark config but again nothing happend .
Java version : 7
Spark Version : 2.1.0



 SparkConf conf = newSparkConf().setAppName("myAPP").setMaster("local");
conf = (conf.setMaster("local[*]"));
SparkContext sc = new SparkContext(conf);


JavaRDD<LabeledPoint> data = MLUtils.loadLibSVMFile(sc, path).toJavaRDD();

// Split initial RDD into two... [60% training data, 40% testing data].
JavaRDD<LabeledPoint> splits =
data.randomSplit(new double{0.6, 0.4}, 11L);
JavaRDD<LabeledPoint> training = splits[0].cache();
JavaRDD<LabeledPoint> test = splits[1];

// Run training algorithm to build the model.
final LogisticRegressionModel model = new LogisticRegressionWithLBFGS()
.setNumClasses(2)
.run(training.rdd());

// Clear the prediction threshold so the model will return probabilities
model.clearThreshold();


// Compute raw scores on the test set.
JavaRDD<Tuple2<Object, Object>> predictionAndLabels = test.map(
new Function<LabeledPoint, Tuple2<Object, Object>>() {
@Override
public Tuple2<Object, Object> call(LabeledPoint p) {
Double prediction = model.predict(p.features());
return new Tuple2<Object, Object>(prediction, p.label());
}
}
);

// Get evaluation metrics.
BinaryClassificationMetrics metrics =
new BinaryClassificationMetrics(predictionAndLabels.rdd());


Error



18/05/02 13:06:49 INFO DAGScheduler: Job 1 finished: first at GeneralizedLinearAlgorithm.scala:206, took 0,038806 s
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at org.apache.spark.mllib.linalg.Vectors$.zeros(Vectors.scala:340)
at org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:222)
at Principal.main(Principal.java:114)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)









share|improve this question































    3















    I have one problem with spark , when i tried to generate the model i get one exception with java heap space that i can't solve .
    i tried to put this values on the VM options -Xmx4g but nothing happens.
    i tried too adding this parameters to spark config but again nothing happend .
    Java version : 7
    Spark Version : 2.1.0



     SparkConf conf = newSparkConf().setAppName("myAPP").setMaster("local");
    conf = (conf.setMaster("local[*]"));
    SparkContext sc = new SparkContext(conf);


    JavaRDD<LabeledPoint> data = MLUtils.loadLibSVMFile(sc, path).toJavaRDD();

    // Split initial RDD into two... [60% training data, 40% testing data].
    JavaRDD<LabeledPoint> splits =
    data.randomSplit(new double{0.6, 0.4}, 11L);
    JavaRDD<LabeledPoint> training = splits[0].cache();
    JavaRDD<LabeledPoint> test = splits[1];

    // Run training algorithm to build the model.
    final LogisticRegressionModel model = new LogisticRegressionWithLBFGS()
    .setNumClasses(2)
    .run(training.rdd());

    // Clear the prediction threshold so the model will return probabilities
    model.clearThreshold();


    // Compute raw scores on the test set.
    JavaRDD<Tuple2<Object, Object>> predictionAndLabels = test.map(
    new Function<LabeledPoint, Tuple2<Object, Object>>() {
    @Override
    public Tuple2<Object, Object> call(LabeledPoint p) {
    Double prediction = model.predict(p.features());
    return new Tuple2<Object, Object>(prediction, p.label());
    }
    }
    );

    // Get evaluation metrics.
    BinaryClassificationMetrics metrics =
    new BinaryClassificationMetrics(predictionAndLabels.rdd());


    Error



    18/05/02 13:06:49 INFO DAGScheduler: Job 1 finished: first at GeneralizedLinearAlgorithm.scala:206, took 0,038806 s
    Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at org.apache.spark.mllib.linalg.Vectors$.zeros(Vectors.scala:340)
    at org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:222)
    at Principal.main(Principal.java:114)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)









    share|improve this question



























      3












      3








      3








      I have one problem with spark , when i tried to generate the model i get one exception with java heap space that i can't solve .
      i tried to put this values on the VM options -Xmx4g but nothing happens.
      i tried too adding this parameters to spark config but again nothing happend .
      Java version : 7
      Spark Version : 2.1.0



       SparkConf conf = newSparkConf().setAppName("myAPP").setMaster("local");
      conf = (conf.setMaster("local[*]"));
      SparkContext sc = new SparkContext(conf);


      JavaRDD<LabeledPoint> data = MLUtils.loadLibSVMFile(sc, path).toJavaRDD();

      // Split initial RDD into two... [60% training data, 40% testing data].
      JavaRDD<LabeledPoint> splits =
      data.randomSplit(new double{0.6, 0.4}, 11L);
      JavaRDD<LabeledPoint> training = splits[0].cache();
      JavaRDD<LabeledPoint> test = splits[1];

      // Run training algorithm to build the model.
      final LogisticRegressionModel model = new LogisticRegressionWithLBFGS()
      .setNumClasses(2)
      .run(training.rdd());

      // Clear the prediction threshold so the model will return probabilities
      model.clearThreshold();


      // Compute raw scores on the test set.
      JavaRDD<Tuple2<Object, Object>> predictionAndLabels = test.map(
      new Function<LabeledPoint, Tuple2<Object, Object>>() {
      @Override
      public Tuple2<Object, Object> call(LabeledPoint p) {
      Double prediction = model.predict(p.features());
      return new Tuple2<Object, Object>(prediction, p.label());
      }
      }
      );

      // Get evaluation metrics.
      BinaryClassificationMetrics metrics =
      new BinaryClassificationMetrics(predictionAndLabels.rdd());


      Error



      18/05/02 13:06:49 INFO DAGScheduler: Job 1 finished: first at GeneralizedLinearAlgorithm.scala:206, took 0,038806 s
      Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
      at org.apache.spark.mllib.linalg.Vectors$.zeros(Vectors.scala:340)
      at org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:222)
      at Principal.main(Principal.java:114)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:606)
      at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)









      share|improve this question
















      I have one problem with spark , when i tried to generate the model i get one exception with java heap space that i can't solve .
      i tried to put this values on the VM options -Xmx4g but nothing happens.
      i tried too adding this parameters to spark config but again nothing happend .
      Java version : 7
      Spark Version : 2.1.0



       SparkConf conf = newSparkConf().setAppName("myAPP").setMaster("local");
      conf = (conf.setMaster("local[*]"));
      SparkContext sc = new SparkContext(conf);


      JavaRDD<LabeledPoint> data = MLUtils.loadLibSVMFile(sc, path).toJavaRDD();

      // Split initial RDD into two... [60% training data, 40% testing data].
      JavaRDD<LabeledPoint> splits =
      data.randomSplit(new double{0.6, 0.4}, 11L);
      JavaRDD<LabeledPoint> training = splits[0].cache();
      JavaRDD<LabeledPoint> test = splits[1];

      // Run training algorithm to build the model.
      final LogisticRegressionModel model = new LogisticRegressionWithLBFGS()
      .setNumClasses(2)
      .run(training.rdd());

      // Clear the prediction threshold so the model will return probabilities
      model.clearThreshold();


      // Compute raw scores on the test set.
      JavaRDD<Tuple2<Object, Object>> predictionAndLabels = test.map(
      new Function<LabeledPoint, Tuple2<Object, Object>>() {
      @Override
      public Tuple2<Object, Object> call(LabeledPoint p) {
      Double prediction = model.predict(p.features());
      return new Tuple2<Object, Object>(prediction, p.label());
      }
      }
      );

      // Get evaluation metrics.
      BinaryClassificationMetrics metrics =
      new BinaryClassificationMetrics(predictionAndLabels.rdd());


      Error



      18/05/02 13:06:49 INFO DAGScheduler: Job 1 finished: first at GeneralizedLinearAlgorithm.scala:206, took 0,038806 s
      Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
      at org.apache.spark.mllib.linalg.Vectors$.zeros(Vectors.scala:340)
      at org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:222)
      at Principal.main(Principal.java:114)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:606)
      at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)






      java apache-spark heap space






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Dec 27 '18 at 9:58









      Nikhil

      399614




      399614










      asked May 2 '18 at 13:34









      HallionHallion

      161




      161
























          1 Answer
          1






          active

          oldest

          votes


















          1














          I suffered from this issue a lot, we use dynamic resource allocation and I thought it will utilize my cluster resources to best fit the application.



          But the truth is, the dynamic resource allocation doesn't set the driver memory and it keeps it to its default value which is 1g.



          I have resolved it by setting spark.driver.memory to a number that suits my driver's memory (for 32gb ram I set it to 18gb)



          you can set it using spark submit command as follows:



          spark-submit --conf spark.driver.memory=18gb ....cont


          Very important note, this property will not be taken into consideration if you set it from code, according to spark documentation:




          Spark properties mainly can be divided into two kinds: one is related
          to deploy, like “spark.driver.memory”, “spark.executor.instances”,
          this kind of properties may not be affected when setting
          programmatically through SparkConf in runtime, or the behavior is
          depending on which cluster manager and deploy mode you choose, so it
          would be suggested to set through configuration file or spark-submit
          command line options; another is mainly related to Spark runtime
          control, like “spark.task.maxFailures”, this kind of properties can be
          set in either way.







          share|improve this answer
























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f50135974%2fspark-java-heap-space%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1














            I suffered from this issue a lot, we use dynamic resource allocation and I thought it will utilize my cluster resources to best fit the application.



            But the truth is, the dynamic resource allocation doesn't set the driver memory and it keeps it to its default value which is 1g.



            I have resolved it by setting spark.driver.memory to a number that suits my driver's memory (for 32gb ram I set it to 18gb)



            you can set it using spark submit command as follows:



            spark-submit --conf spark.driver.memory=18gb ....cont


            Very important note, this property will not be taken into consideration if you set it from code, according to spark documentation:




            Spark properties mainly can be divided into two kinds: one is related
            to deploy, like “spark.driver.memory”, “spark.executor.instances”,
            this kind of properties may not be affected when setting
            programmatically through SparkConf in runtime, or the behavior is
            depending on which cluster manager and deploy mode you choose, so it
            would be suggested to set through configuration file or spark-submit
            command line options; another is mainly related to Spark runtime
            control, like “spark.task.maxFailures”, this kind of properties can be
            set in either way.







            share|improve this answer




























              1














              I suffered from this issue a lot, we use dynamic resource allocation and I thought it will utilize my cluster resources to best fit the application.



              But the truth is, the dynamic resource allocation doesn't set the driver memory and it keeps it to its default value which is 1g.



              I have resolved it by setting spark.driver.memory to a number that suits my driver's memory (for 32gb ram I set it to 18gb)



              you can set it using spark submit command as follows:



              spark-submit --conf spark.driver.memory=18gb ....cont


              Very important note, this property will not be taken into consideration if you set it from code, according to spark documentation:




              Spark properties mainly can be divided into two kinds: one is related
              to deploy, like “spark.driver.memory”, “spark.executor.instances”,
              this kind of properties may not be affected when setting
              programmatically through SparkConf in runtime, or the behavior is
              depending on which cluster manager and deploy mode you choose, so it
              would be suggested to set through configuration file or spark-submit
              command line options; another is mainly related to Spark runtime
              control, like “spark.task.maxFailures”, this kind of properties can be
              set in either way.







              share|improve this answer


























                1












                1








                1







                I suffered from this issue a lot, we use dynamic resource allocation and I thought it will utilize my cluster resources to best fit the application.



                But the truth is, the dynamic resource allocation doesn't set the driver memory and it keeps it to its default value which is 1g.



                I have resolved it by setting spark.driver.memory to a number that suits my driver's memory (for 32gb ram I set it to 18gb)



                you can set it using spark submit command as follows:



                spark-submit --conf spark.driver.memory=18gb ....cont


                Very important note, this property will not be taken into consideration if you set it from code, according to spark documentation:




                Spark properties mainly can be divided into two kinds: one is related
                to deploy, like “spark.driver.memory”, “spark.executor.instances”,
                this kind of properties may not be affected when setting
                programmatically through SparkConf in runtime, or the behavior is
                depending on which cluster manager and deploy mode you choose, so it
                would be suggested to set through configuration file or spark-submit
                command line options; another is mainly related to Spark runtime
                control, like “spark.task.maxFailures”, this kind of properties can be
                set in either way.







                share|improve this answer













                I suffered from this issue a lot, we use dynamic resource allocation and I thought it will utilize my cluster resources to best fit the application.



                But the truth is, the dynamic resource allocation doesn't set the driver memory and it keeps it to its default value which is 1g.



                I have resolved it by setting spark.driver.memory to a number that suits my driver's memory (for 32gb ram I set it to 18gb)



                you can set it using spark submit command as follows:



                spark-submit --conf spark.driver.memory=18gb ....cont


                Very important note, this property will not be taken into consideration if you set it from code, according to spark documentation:




                Spark properties mainly can be divided into two kinds: one is related
                to deploy, like “spark.driver.memory”, “spark.executor.instances”,
                this kind of properties may not be affected when setting
                programmatically through SparkConf in runtime, or the behavior is
                depending on which cluster manager and deploy mode you choose, so it
                would be suggested to set through configuration file or spark-submit
                command line options; another is mainly related to Spark runtime
                control, like “spark.task.maxFailures”, this kind of properties can be
                set in either way.








                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Jan 3 at 13:09









                Abdulhafeth SartawiAbdulhafeth Sartawi

                405314




                405314
































                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f50135974%2fspark-java-heap-space%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    MongoDB - Not Authorized To Execute Command

                    How to fix TextFormField cause rebuild widget in Flutter

                    Npm cannot find a required file even through it is in the searched directory