Spark ALS: Facing issues with Garbage collector:java.lang.OutOfMemoryError: GC Overhead Limit Exceeded...












0
















This question already has an answer here:




  • Spark java.lang.OutOfMemoryError: Java heap space

    9 answers



  • Checkpointing In ALS Spark Scala

    1 answer



  • Spark ML ALS collaborative filtering always fail if the iteration more than 20 [duplicate]

    1 answer



  • Spark gives a StackOverflowError when training using ALS

    1 answer



  • StackOverflow-error when applying pyspark ALS's “recommendProductsForUsers” (although cluster of >300GB Ram available)

    1 answer




I am running Spark 2.3 on a single machine of 4 cores, 16 GB Ram.



I want to train an implicit model using ALS with Users, Items, and Ratings for a 1GB dataset. Training with Rank 5, Iterations 10



I am using 1 executor, 2 executor cores, 12 GB executor memory, 1 driver node, and 1GB driver memory. Also, I have set the number of partitions to 30. I have even set persist() to Disk_Only, for reducing the load on memory.



I have also mentioned the Checkpoint directory, sc.setCheckpointDir('/tmp').



Still, I am facing the issue of GC overhead limit exceeded. How should I resolve it?



I am attaching the stack trace of error:



18/11/22 08:03:36 ERROR executor.Executor: Exception in task 0.0 in stage 6.0 (TID 36)
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
at java.lang.StringBuilder.append(StringBuilder.java:190)
at java.io.ObjectInputStream$BlockDataInputStream.readUTFSpan(ObjectInputStream.java:3496)
at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3404)
at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:3216)
at java.io.ObjectInputStream.readString(ObjectInputStream.java:1896)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1558)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1561)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
18/11/22 08:03:36 ERROR executor.Executor: Exception in task 1.0 in stage 6.0 (TID 37)
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
at java.lang.StringBuilder.append(StringBuilder.java:190)
at java.io.ObjectInputStream$BlockDataInputStream.readUTFSpan(ObjectInputStream.java:3496)
at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3404)
at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:3216)
at java.io.ObjectInputStream.readString(ObjectInputStream.java:1896)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1558)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1561)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
18/11/22 08:03:36 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker for task 37,5,main]
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
at java.lang.StringBuilder.append(StringBuilder.java:190)
at java.io.ObjectInputStream$BlockDataInputStream.readUTFSpan(ObjectInputStream.java:3496)
at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3404)
at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:3216)
at java.io.ObjectInputStream.readString(ObjectInputStream.java:1896)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1558)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1561)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
18/11/22 08:03:36 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker for task 36,5,main]
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
at java.lang.StringBuilder.append(StringBuilder.java:190)
at java.io.ObjectInputStream$BlockDataInputStream.readUTFSpan(ObjectInputStream.java:3496)
at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3404)
at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:3216)
at java.io.ObjectInputStream.readString(ObjectInputStream.java:1896)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1558)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1561)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
18/11/22 08:03:36 ERROR scheduler.TaskSetManager: Task 0 in stage 6.0 failed 1 times; aborting job
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 6.0 failed 1 times, most recent failure: Lost task 0.0 in stage 6.0 (TID 36, localhost, executor driver): java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
at java.lang.StringBuilder.append(StringBuilder.java:190)
at java.io.ObjectInputStream$BlockDataInputStream.readUTFSpan(ObjectInputStream.java:3496)
at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3404)
at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:3216)
at java.io.ObjectInputStream.readString(ObjectInputStream.java:1896)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1558)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1561)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1599)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1587)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1586)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1586)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1820)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1769)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1758)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2027)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2048)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2067)
at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1358)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.take(RDD.scala:1331)
at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply$mcZ$sp(RDD.scala:1466)
at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply(RDD.scala:1466)
at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply(RDD.scala:1466)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.isEmpty(RDD.scala:1465)
at org.apache.spark.ml.recommendation.ALS$.train(ALS.scala:918)
at org.apache.spark.ml.recommendation.ALS.fit(ALS.scala:674)
at Saavn.Clustering.Test.main(Test.java:93)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:892)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
at java.lang.StringBuilder.append(StringBuilder.java:190)
at java.io.ObjectInputStream$BlockDataInputStream.readUTFSpan(ObjectInputStream.java:3496)
at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3404)
at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:3216)
at java.io.ObjectInputStream.readString(ObjectInputStream.java:1896)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1558)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1561)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
[root@ip-172-31-94-232 saavn]#
[root@ip-172-31-94-232 saavn]# spark-shell
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:123)
at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:123)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.deploy.SparkSubmitArguments.mergeDefaultSparkProperties(SparkSubmitArguments.scala:123)
at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:109)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:114)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more


I have already tried setting the Checkpoint, but still, I am facing the same issue.



SC.setCheckpointDir("hdfs:///user/ec2-user/als");

ALS als = new ALS().setRank(5)
.setMaxIter(10)
.setRegParam(0.01)
.setImplicitPrefs(true)
.setUserCol("UserID")
.setItemCol("SongID")
.setRatingCol("SongFreq").setCheckpointInterval(2);









share|improve this question















marked as duplicate by Stephen C garbage-collection
Users with the  garbage-collection badge can single-handedly close garbage-collection questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 22 '18 at 9:50


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.























    0
















    This question already has an answer here:




    • Spark java.lang.OutOfMemoryError: Java heap space

      9 answers



    • Checkpointing In ALS Spark Scala

      1 answer



    • Spark ML ALS collaborative filtering always fail if the iteration more than 20 [duplicate]

      1 answer



    • Spark gives a StackOverflowError when training using ALS

      1 answer



    • StackOverflow-error when applying pyspark ALS's “recommendProductsForUsers” (although cluster of >300GB Ram available)

      1 answer




    I am running Spark 2.3 on a single machine of 4 cores, 16 GB Ram.



    I want to train an implicit model using ALS with Users, Items, and Ratings for a 1GB dataset. Training with Rank 5, Iterations 10



    I am using 1 executor, 2 executor cores, 12 GB executor memory, 1 driver node, and 1GB driver memory. Also, I have set the number of partitions to 30. I have even set persist() to Disk_Only, for reducing the load on memory.



    I have also mentioned the Checkpoint directory, sc.setCheckpointDir('/tmp').



    Still, I am facing the issue of GC overhead limit exceeded. How should I resolve it?



    I am attaching the stack trace of error:



    18/11/22 08:03:36 ERROR executor.Executor: Exception in task 0.0 in stage 6.0 (TID 36)
    java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.Arrays.copyOf(Arrays.java:3332)
    at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
    at java.lang.StringBuilder.append(StringBuilder.java:190)
    at java.io.ObjectInputStream$BlockDataInputStream.readUTFSpan(ObjectInputStream.java:3496)
    at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3404)
    at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:3216)
    at java.io.ObjectInputStream.readString(ObjectInputStream.java:1896)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1558)
    at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1561)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
    at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
    18/11/22 08:03:36 ERROR executor.Executor: Exception in task 1.0 in stage 6.0 (TID 37)
    java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.Arrays.copyOf(Arrays.java:3332)
    at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
    at java.lang.StringBuilder.append(StringBuilder.java:190)
    at java.io.ObjectInputStream$BlockDataInputStream.readUTFSpan(ObjectInputStream.java:3496)
    at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3404)
    at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:3216)
    at java.io.ObjectInputStream.readString(ObjectInputStream.java:1896)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1558)
    at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1561)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
    at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
    18/11/22 08:03:36 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker for task 37,5,main]
    java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.Arrays.copyOf(Arrays.java:3332)
    at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
    at java.lang.StringBuilder.append(StringBuilder.java:190)
    at java.io.ObjectInputStream$BlockDataInputStream.readUTFSpan(ObjectInputStream.java:3496)
    at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3404)
    at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:3216)
    at java.io.ObjectInputStream.readString(ObjectInputStream.java:1896)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1558)
    at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1561)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
    at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
    18/11/22 08:03:36 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker for task 36,5,main]
    java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.Arrays.copyOf(Arrays.java:3332)
    at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
    at java.lang.StringBuilder.append(StringBuilder.java:190)
    at java.io.ObjectInputStream$BlockDataInputStream.readUTFSpan(ObjectInputStream.java:3496)
    at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3404)
    at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:3216)
    at java.io.ObjectInputStream.readString(ObjectInputStream.java:1896)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1558)
    at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1561)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
    at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
    18/11/22 08:03:36 ERROR scheduler.TaskSetManager: Task 0 in stage 6.0 failed 1 times; aborting job
    Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 6.0 failed 1 times, most recent failure: Lost task 0.0 in stage 6.0 (TID 36, localhost, executor driver): java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.Arrays.copyOf(Arrays.java:3332)
    at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
    at java.lang.StringBuilder.append(StringBuilder.java:190)
    at java.io.ObjectInputStream$BlockDataInputStream.readUTFSpan(ObjectInputStream.java:3496)
    at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3404)
    at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:3216)
    at java.io.ObjectInputStream.readString(ObjectInputStream.java:1896)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1558)
    at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1561)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
    at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)

    Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1599)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1587)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1586)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1586)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
    at scala.Option.foreach(Option.scala:257)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1820)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1769)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1758)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
    at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2027)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2048)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2067)
    at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1358)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
    at org.apache.spark.rdd.RDD.take(RDD.scala:1331)
    at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply$mcZ$sp(RDD.scala:1466)
    at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply(RDD.scala:1466)
    at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply(RDD.scala:1466)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
    at org.apache.spark.rdd.RDD.isEmpty(RDD.scala:1465)
    at org.apache.spark.ml.recommendation.ALS$.train(ALS.scala:918)
    at org.apache.spark.ml.recommendation.ALS.fit(ALS.scala:674)
    at Saavn.Clustering.Test.main(Test.java:93)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:892)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.Arrays.copyOf(Arrays.java:3332)
    at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
    at java.lang.StringBuilder.append(StringBuilder.java:190)
    at java.io.ObjectInputStream$BlockDataInputStream.readUTFSpan(ObjectInputStream.java:3496)
    at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3404)
    at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:3216)
    at java.io.ObjectInputStream.readString(ObjectInputStream.java:1896)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1558)
    at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1561)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
    at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
    [root@ip-172-31-94-232 saavn]#
    [root@ip-172-31-94-232 saavn]# spark-shell
    Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
    at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:123)
    at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:123)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.deploy.SparkSubmitArguments.mergeDefaultSparkProperties(SparkSubmitArguments.scala:123)
    at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:109)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:114)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 7 more


    I have already tried setting the Checkpoint, but still, I am facing the same issue.



    SC.setCheckpointDir("hdfs:///user/ec2-user/als");

    ALS als = new ALS().setRank(5)
    .setMaxIter(10)
    .setRegParam(0.01)
    .setImplicitPrefs(true)
    .setUserCol("UserID")
    .setItemCol("SongID")
    .setRatingCol("SongFreq").setCheckpointInterval(2);









    share|improve this question















    marked as duplicate by Stephen C garbage-collection
    Users with the  garbage-collection badge can single-handedly close garbage-collection questions as duplicates and reopen them as needed.

    StackExchange.ready(function() {
    if (StackExchange.options.isMobile) return;

    $('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
    var $hover = $(this).addClass('hover-bound'),
    $msg = $hover.siblings('.dupe-hammer-message');

    $hover.hover(
    function() {
    $hover.showInfoMessage('', {
    messageElement: $msg.clone().show(),
    transient: false,
    position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
    dismissable: false,
    relativeToBody: true
    });
    },
    function() {
    StackExchange.helpers.removeMessages();
    }
    );
    });
    });
    Nov 22 '18 at 9:50


    This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.





















      0












      0








      0









      This question already has an answer here:




      • Spark java.lang.OutOfMemoryError: Java heap space

        9 answers



      • Checkpointing In ALS Spark Scala

        1 answer



      • Spark ML ALS collaborative filtering always fail if the iteration more than 20 [duplicate]

        1 answer



      • Spark gives a StackOverflowError when training using ALS

        1 answer



      • StackOverflow-error when applying pyspark ALS's “recommendProductsForUsers” (although cluster of >300GB Ram available)

        1 answer




      I am running Spark 2.3 on a single machine of 4 cores, 16 GB Ram.



      I want to train an implicit model using ALS with Users, Items, and Ratings for a 1GB dataset. Training with Rank 5, Iterations 10



      I am using 1 executor, 2 executor cores, 12 GB executor memory, 1 driver node, and 1GB driver memory. Also, I have set the number of partitions to 30. I have even set persist() to Disk_Only, for reducing the load on memory.



      I have also mentioned the Checkpoint directory, sc.setCheckpointDir('/tmp').



      Still, I am facing the issue of GC overhead limit exceeded. How should I resolve it?



      I am attaching the stack trace of error:



      18/11/22 08:03:36 ERROR executor.Executor: Exception in task 0.0 in stage 6.0 (TID 36)
      java.lang.OutOfMemoryError: GC overhead limit exceeded
      at java.util.Arrays.copyOf(Arrays.java:3332)
      at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
      at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
      at java.lang.StringBuilder.append(StringBuilder.java:190)
      at java.io.ObjectInputStream$BlockDataInputStream.readUTFSpan(ObjectInputStream.java:3496)
      at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3404)
      at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:3216)
      at java.io.ObjectInputStream.readString(ObjectInputStream.java:1896)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1558)
      at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1561)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
      18/11/22 08:03:36 ERROR executor.Executor: Exception in task 1.0 in stage 6.0 (TID 37)
      java.lang.OutOfMemoryError: GC overhead limit exceeded
      at java.util.Arrays.copyOf(Arrays.java:3332)
      at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
      at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
      at java.lang.StringBuilder.append(StringBuilder.java:190)
      at java.io.ObjectInputStream$BlockDataInputStream.readUTFSpan(ObjectInputStream.java:3496)
      at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3404)
      at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:3216)
      at java.io.ObjectInputStream.readString(ObjectInputStream.java:1896)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1558)
      at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1561)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
      18/11/22 08:03:36 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker for task 37,5,main]
      java.lang.OutOfMemoryError: GC overhead limit exceeded
      at java.util.Arrays.copyOf(Arrays.java:3332)
      at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
      at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
      at java.lang.StringBuilder.append(StringBuilder.java:190)
      at java.io.ObjectInputStream$BlockDataInputStream.readUTFSpan(ObjectInputStream.java:3496)
      at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3404)
      at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:3216)
      at java.io.ObjectInputStream.readString(ObjectInputStream.java:1896)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1558)
      at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1561)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
      18/11/22 08:03:36 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker for task 36,5,main]
      java.lang.OutOfMemoryError: GC overhead limit exceeded
      at java.util.Arrays.copyOf(Arrays.java:3332)
      at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
      at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
      at java.lang.StringBuilder.append(StringBuilder.java:190)
      at java.io.ObjectInputStream$BlockDataInputStream.readUTFSpan(ObjectInputStream.java:3496)
      at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3404)
      at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:3216)
      at java.io.ObjectInputStream.readString(ObjectInputStream.java:1896)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1558)
      at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1561)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
      18/11/22 08:03:36 ERROR scheduler.TaskSetManager: Task 0 in stage 6.0 failed 1 times; aborting job
      Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 6.0 failed 1 times, most recent failure: Lost task 0.0 in stage 6.0 (TID 36, localhost, executor driver): java.lang.OutOfMemoryError: GC overhead limit exceeded
      at java.util.Arrays.copyOf(Arrays.java:3332)
      at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
      at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
      at java.lang.StringBuilder.append(StringBuilder.java:190)
      at java.io.ObjectInputStream$BlockDataInputStream.readUTFSpan(ObjectInputStream.java:3496)
      at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3404)
      at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:3216)
      at java.io.ObjectInputStream.readString(ObjectInputStream.java:1896)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1558)
      at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1561)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)

      Driver stacktrace:
      at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1599)
      at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1587)
      at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1586)
      at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
      at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
      at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1586)
      at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
      at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
      at scala.Option.foreach(Option.scala:257)
      at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)
      at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1820)
      at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1769)
      at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1758)
      at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
      at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
      at org.apache.spark.SparkContext.runJob(SparkContext.scala:2027)
      at org.apache.spark.SparkContext.runJob(SparkContext.scala:2048)
      at org.apache.spark.SparkContext.runJob(SparkContext.scala:2067)
      at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1358)
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
      at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
      at org.apache.spark.rdd.RDD.take(RDD.scala:1331)
      at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply$mcZ$sp(RDD.scala:1466)
      at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply(RDD.scala:1466)
      at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply(RDD.scala:1466)
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
      at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
      at org.apache.spark.rdd.RDD.isEmpty(RDD.scala:1465)
      at org.apache.spark.ml.recommendation.ALS$.train(ALS.scala:918)
      at org.apache.spark.ml.recommendation.ALS.fit(ALS.scala:674)
      at Saavn.Clustering.Test.main(Test.java:93)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:498)
      at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
      at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:892)
      at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
      at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
      at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
      at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
      Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
      at java.util.Arrays.copyOf(Arrays.java:3332)
      at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
      at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
      at java.lang.StringBuilder.append(StringBuilder.java:190)
      at java.io.ObjectInputStream$BlockDataInputStream.readUTFSpan(ObjectInputStream.java:3496)
      at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3404)
      at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:3216)
      at java.io.ObjectInputStream.readString(ObjectInputStream.java:1896)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1558)
      at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1561)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
      [root@ip-172-31-94-232 saavn]#
      [root@ip-172-31-94-232 saavn]# spark-shell
      Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
      at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:123)
      at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:123)
      at scala.Option.getOrElse(Option.scala:120)
      at org.apache.spark.deploy.SparkSubmitArguments.mergeDefaultSparkProperties(SparkSubmitArguments.scala:123)
      at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:109)
      at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:114)
      at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
      Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
      at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
      at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
      at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
      at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
      ... 7 more


      I have already tried setting the Checkpoint, but still, I am facing the same issue.



      SC.setCheckpointDir("hdfs:///user/ec2-user/als");

      ALS als = new ALS().setRank(5)
      .setMaxIter(10)
      .setRegParam(0.01)
      .setImplicitPrefs(true)
      .setUserCol("UserID")
      .setItemCol("SongID")
      .setRatingCol("SongFreq").setCheckpointInterval(2);









      share|improve this question

















      This question already has an answer here:




      • Spark java.lang.OutOfMemoryError: Java heap space

        9 answers



      • Checkpointing In ALS Spark Scala

        1 answer



      • Spark ML ALS collaborative filtering always fail if the iteration more than 20 [duplicate]

        1 answer



      • Spark gives a StackOverflowError when training using ALS

        1 answer



      • StackOverflow-error when applying pyspark ALS's “recommendProductsForUsers” (although cluster of >300GB Ram available)

        1 answer




      I am running Spark 2.3 on a single machine of 4 cores, 16 GB Ram.



      I want to train an implicit model using ALS with Users, Items, and Ratings for a 1GB dataset. Training with Rank 5, Iterations 10



      I am using 1 executor, 2 executor cores, 12 GB executor memory, 1 driver node, and 1GB driver memory. Also, I have set the number of partitions to 30. I have even set persist() to Disk_Only, for reducing the load on memory.



      I have also mentioned the Checkpoint directory, sc.setCheckpointDir('/tmp').



      Still, I am facing the issue of GC overhead limit exceeded. How should I resolve it?



      I am attaching the stack trace of error:



      18/11/22 08:03:36 ERROR executor.Executor: Exception in task 0.0 in stage 6.0 (TID 36)
      java.lang.OutOfMemoryError: GC overhead limit exceeded
      at java.util.Arrays.copyOf(Arrays.java:3332)
      at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
      at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
      at java.lang.StringBuilder.append(StringBuilder.java:190)
      at java.io.ObjectInputStream$BlockDataInputStream.readUTFSpan(ObjectInputStream.java:3496)
      at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3404)
      at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:3216)
      at java.io.ObjectInputStream.readString(ObjectInputStream.java:1896)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1558)
      at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1561)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
      18/11/22 08:03:36 ERROR executor.Executor: Exception in task 1.0 in stage 6.0 (TID 37)
      java.lang.OutOfMemoryError: GC overhead limit exceeded
      at java.util.Arrays.copyOf(Arrays.java:3332)
      at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
      at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
      at java.lang.StringBuilder.append(StringBuilder.java:190)
      at java.io.ObjectInputStream$BlockDataInputStream.readUTFSpan(ObjectInputStream.java:3496)
      at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3404)
      at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:3216)
      at java.io.ObjectInputStream.readString(ObjectInputStream.java:1896)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1558)
      at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1561)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
      18/11/22 08:03:36 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker for task 37,5,main]
      java.lang.OutOfMemoryError: GC overhead limit exceeded
      at java.util.Arrays.copyOf(Arrays.java:3332)
      at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
      at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
      at java.lang.StringBuilder.append(StringBuilder.java:190)
      at java.io.ObjectInputStream$BlockDataInputStream.readUTFSpan(ObjectInputStream.java:3496)
      at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3404)
      at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:3216)
      at java.io.ObjectInputStream.readString(ObjectInputStream.java:1896)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1558)
      at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1561)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
      18/11/22 08:03:36 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker for task 36,5,main]
      java.lang.OutOfMemoryError: GC overhead limit exceeded
      at java.util.Arrays.copyOf(Arrays.java:3332)
      at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
      at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
      at java.lang.StringBuilder.append(StringBuilder.java:190)
      at java.io.ObjectInputStream$BlockDataInputStream.readUTFSpan(ObjectInputStream.java:3496)
      at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3404)
      at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:3216)
      at java.io.ObjectInputStream.readString(ObjectInputStream.java:1896)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1558)
      at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1561)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
      18/11/22 08:03:36 ERROR scheduler.TaskSetManager: Task 0 in stage 6.0 failed 1 times; aborting job
      Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 6.0 failed 1 times, most recent failure: Lost task 0.0 in stage 6.0 (TID 36, localhost, executor driver): java.lang.OutOfMemoryError: GC overhead limit exceeded
      at java.util.Arrays.copyOf(Arrays.java:3332)
      at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
      at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
      at java.lang.StringBuilder.append(StringBuilder.java:190)
      at java.io.ObjectInputStream$BlockDataInputStream.readUTFSpan(ObjectInputStream.java:3496)
      at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3404)
      at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:3216)
      at java.io.ObjectInputStream.readString(ObjectInputStream.java:1896)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1558)
      at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1561)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)

      Driver stacktrace:
      at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1599)
      at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1587)
      at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1586)
      at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
      at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
      at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1586)
      at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
      at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
      at scala.Option.foreach(Option.scala:257)
      at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)
      at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1820)
      at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1769)
      at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1758)
      at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
      at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
      at org.apache.spark.SparkContext.runJob(SparkContext.scala:2027)
      at org.apache.spark.SparkContext.runJob(SparkContext.scala:2048)
      at org.apache.spark.SparkContext.runJob(SparkContext.scala:2067)
      at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1358)
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
      at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
      at org.apache.spark.rdd.RDD.take(RDD.scala:1331)
      at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply$mcZ$sp(RDD.scala:1466)
      at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply(RDD.scala:1466)
      at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply(RDD.scala:1466)
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
      at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
      at org.apache.spark.rdd.RDD.isEmpty(RDD.scala:1465)
      at org.apache.spark.ml.recommendation.ALS$.train(ALS.scala:918)
      at org.apache.spark.ml.recommendation.ALS.fit(ALS.scala:674)
      at Saavn.Clustering.Test.main(Test.java:93)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:498)
      at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
      at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:892)
      at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
      at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
      at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
      at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
      Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
      at java.util.Arrays.copyOf(Arrays.java:3332)
      at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
      at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
      at java.lang.StringBuilder.append(StringBuilder.java:190)
      at java.io.ObjectInputStream$BlockDataInputStream.readUTFSpan(ObjectInputStream.java:3496)
      at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3404)
      at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:3216)
      at java.io.ObjectInputStream.readString(ObjectInputStream.java:1896)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1558)
      at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1561)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
      at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
      [root@ip-172-31-94-232 saavn]#
      [root@ip-172-31-94-232 saavn]# spark-shell
      Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
      at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:123)
      at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:123)
      at scala.Option.getOrElse(Option.scala:120)
      at org.apache.spark.deploy.SparkSubmitArguments.mergeDefaultSparkProperties(SparkSubmitArguments.scala:123)
      at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:109)
      at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:114)
      at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
      Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
      at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
      at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
      at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
      at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
      ... 7 more


      I have already tried setting the Checkpoint, but still, I am facing the same issue.



      SC.setCheckpointDir("hdfs:///user/ec2-user/als");

      ALS als = new ALS().setRank(5)
      .setMaxIter(10)
      .setRegParam(0.01)
      .setImplicitPrefs(true)
      .setUserCol("UserID")
      .setItemCol("SongID")
      .setRatingCol("SongFreq").setCheckpointInterval(2);




      This question already has an answer here:




      • Spark java.lang.OutOfMemoryError: Java heap space

        9 answers



      • Checkpointing In ALS Spark Scala

        1 answer



      • Spark ML ALS collaborative filtering always fail if the iteration more than 20 [duplicate]

        1 answer



      • Spark gives a StackOverflowError when training using ALS

        1 answer



      • StackOverflow-error when applying pyspark ALS's “recommendProductsForUsers” (although cluster of >300GB Ram available)

        1 answer








      java apache-spark garbage-collection apache-spark-sql apache-spark-mllib






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 22 '18 at 12:39









      user6910411

      34.7k1082105




      34.7k1082105










      asked Nov 22 '18 at 8:39









      Varun GuptaVarun Gupta

      52




      52




      marked as duplicate by Stephen C garbage-collection
      Users with the  garbage-collection badge can single-handedly close garbage-collection questions as duplicates and reopen them as needed.

      StackExchange.ready(function() {
      if (StackExchange.options.isMobile) return;

      $('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
      var $hover = $(this).addClass('hover-bound'),
      $msg = $hover.siblings('.dupe-hammer-message');

      $hover.hover(
      function() {
      $hover.showInfoMessage('', {
      messageElement: $msg.clone().show(),
      transient: false,
      position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
      dismissable: false,
      relativeToBody: true
      });
      },
      function() {
      StackExchange.helpers.removeMessages();
      }
      );
      });
      });
      Nov 22 '18 at 9:50


      This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.









      marked as duplicate by Stephen C garbage-collection
      Users with the  garbage-collection badge can single-handedly close garbage-collection questions as duplicates and reopen them as needed.

      StackExchange.ready(function() {
      if (StackExchange.options.isMobile) return;

      $('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
      var $hover = $(this).addClass('hover-bound'),
      $msg = $hover.siblings('.dupe-hammer-message');

      $hover.hover(
      function() {
      $hover.showInfoMessage('', {
      messageElement: $msg.clone().show(),
      transient: false,
      position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
      dismissable: false,
      relativeToBody: true
      });
      },
      function() {
      StackExchange.helpers.removeMessages();
      }
      );
      });
      });
      Nov 22 '18 at 9:50


      This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.


























          0






          active

          oldest

          votes

















          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes

          Popular posts from this blog

          MongoDB - Not Authorized To Execute Command

          How to fix TextFormField cause rebuild widget in Flutter

          in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith