How can I use spark to writeStream data from a kafka topic into hdfs?












0















I have been trying to get this code to work for hours:



val spark = SparkSession.builder() 
.appName("Consumer")
.getOrCreate()

spark.readStream
.format("kafka")
.option("kafka.bootstrap.servers", url)
.option("subscribe", topic)
.load()
.select("value")
.writeStream
.format(fileFormat)
.option("path", filePath)
.option("checkpointLocation", "/tmp/checkpoint")
.start()
.awaitTermination()


it gives this exception:



Logical Plan: 
Project [value#8]
+- StreamingExecutionRelation KafkaV2[Subscribe[MyTopic]], [key#7, value#8, topic#9, partition#10, offset#11L, timestamp#12, timestampType#13]

at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:295)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:189)
Caused by: java.lang.ClassCastException: org.apache.spark.sql.execution.streaming.SerializedOffset cannot be cast to org.apache.spark.sql.sources.v2.reader.streaming.Offset
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1$$anonfun$apply$9.apply(MicroBatchExecution.scala:405)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1$$anonfun$apply$9.apply(MicroBatchExecution.scala:390)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at org.apache.spark.sql.execution.streaming.StreamProgress.foreach(StreamProgress.scala:25)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at org.apache.spark.sql.execution.streaming.StreamProgress.flatMap(StreamProgress.scala:25)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply(MicroBatchExecution.scala:390)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply(MicroBatchExecution.scala:390)
at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:271)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch(MicroBatchExecution.scala:389)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:133)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:121)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:121)
at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:271)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:121)
at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:117)
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:279)


I don't understand what's going on, I am simply trying to write data from a kafka topic into HDFS using spark streaming. Why is this so hard? And how can I do it?



I got the batching version to work just fine:



spark.read 
.format("kafka")
.option("kafka.bootstrap.servers", url)
.option("subscribe", topic)
.load()
.selectExpr("CAST(value AS String)")
.write
.format(fileFormat)
.save(filePath)









share|improve this question

























  • Kafka Connect already does this, and is included in Kafka 0.10 and higher... Why write any code to do this?? confluent.io/connector/kafka-connect-hdfs

    – cricket_007
    Nov 20 '18 at 23:08
















0















I have been trying to get this code to work for hours:



val spark = SparkSession.builder() 
.appName("Consumer")
.getOrCreate()

spark.readStream
.format("kafka")
.option("kafka.bootstrap.servers", url)
.option("subscribe", topic)
.load()
.select("value")
.writeStream
.format(fileFormat)
.option("path", filePath)
.option("checkpointLocation", "/tmp/checkpoint")
.start()
.awaitTermination()


it gives this exception:



Logical Plan: 
Project [value#8]
+- StreamingExecutionRelation KafkaV2[Subscribe[MyTopic]], [key#7, value#8, topic#9, partition#10, offset#11L, timestamp#12, timestampType#13]

at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:295)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:189)
Caused by: java.lang.ClassCastException: org.apache.spark.sql.execution.streaming.SerializedOffset cannot be cast to org.apache.spark.sql.sources.v2.reader.streaming.Offset
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1$$anonfun$apply$9.apply(MicroBatchExecution.scala:405)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1$$anonfun$apply$9.apply(MicroBatchExecution.scala:390)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at org.apache.spark.sql.execution.streaming.StreamProgress.foreach(StreamProgress.scala:25)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at org.apache.spark.sql.execution.streaming.StreamProgress.flatMap(StreamProgress.scala:25)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply(MicroBatchExecution.scala:390)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply(MicroBatchExecution.scala:390)
at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:271)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch(MicroBatchExecution.scala:389)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:133)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:121)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:121)
at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:271)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:121)
at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:117)
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:279)


I don't understand what's going on, I am simply trying to write data from a kafka topic into HDFS using spark streaming. Why is this so hard? And how can I do it?



I got the batching version to work just fine:



spark.read 
.format("kafka")
.option("kafka.bootstrap.servers", url)
.option("subscribe", topic)
.load()
.selectExpr("CAST(value AS String)")
.write
.format(fileFormat)
.save(filePath)









share|improve this question

























  • Kafka Connect already does this, and is included in Kafka 0.10 and higher... Why write any code to do this?? confluent.io/connector/kafka-connect-hdfs

    – cricket_007
    Nov 20 '18 at 23:08














0












0








0








I have been trying to get this code to work for hours:



val spark = SparkSession.builder() 
.appName("Consumer")
.getOrCreate()

spark.readStream
.format("kafka")
.option("kafka.bootstrap.servers", url)
.option("subscribe", topic)
.load()
.select("value")
.writeStream
.format(fileFormat)
.option("path", filePath)
.option("checkpointLocation", "/tmp/checkpoint")
.start()
.awaitTermination()


it gives this exception:



Logical Plan: 
Project [value#8]
+- StreamingExecutionRelation KafkaV2[Subscribe[MyTopic]], [key#7, value#8, topic#9, partition#10, offset#11L, timestamp#12, timestampType#13]

at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:295)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:189)
Caused by: java.lang.ClassCastException: org.apache.spark.sql.execution.streaming.SerializedOffset cannot be cast to org.apache.spark.sql.sources.v2.reader.streaming.Offset
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1$$anonfun$apply$9.apply(MicroBatchExecution.scala:405)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1$$anonfun$apply$9.apply(MicroBatchExecution.scala:390)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at org.apache.spark.sql.execution.streaming.StreamProgress.foreach(StreamProgress.scala:25)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at org.apache.spark.sql.execution.streaming.StreamProgress.flatMap(StreamProgress.scala:25)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply(MicroBatchExecution.scala:390)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply(MicroBatchExecution.scala:390)
at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:271)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch(MicroBatchExecution.scala:389)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:133)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:121)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:121)
at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:271)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:121)
at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:117)
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:279)


I don't understand what's going on, I am simply trying to write data from a kafka topic into HDFS using spark streaming. Why is this so hard? And how can I do it?



I got the batching version to work just fine:



spark.read 
.format("kafka")
.option("kafka.bootstrap.servers", url)
.option("subscribe", topic)
.load()
.selectExpr("CAST(value AS String)")
.write
.format(fileFormat)
.save(filePath)









share|improve this question
















I have been trying to get this code to work for hours:



val spark = SparkSession.builder() 
.appName("Consumer")
.getOrCreate()

spark.readStream
.format("kafka")
.option("kafka.bootstrap.servers", url)
.option("subscribe", topic)
.load()
.select("value")
.writeStream
.format(fileFormat)
.option("path", filePath)
.option("checkpointLocation", "/tmp/checkpoint")
.start()
.awaitTermination()


it gives this exception:



Logical Plan: 
Project [value#8]
+- StreamingExecutionRelation KafkaV2[Subscribe[MyTopic]], [key#7, value#8, topic#9, partition#10, offset#11L, timestamp#12, timestampType#13]

at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:295)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:189)
Caused by: java.lang.ClassCastException: org.apache.spark.sql.execution.streaming.SerializedOffset cannot be cast to org.apache.spark.sql.sources.v2.reader.streaming.Offset
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1$$anonfun$apply$9.apply(MicroBatchExecution.scala:405)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1$$anonfun$apply$9.apply(MicroBatchExecution.scala:390)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at org.apache.spark.sql.execution.streaming.StreamProgress.foreach(StreamProgress.scala:25)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at org.apache.spark.sql.execution.streaming.StreamProgress.flatMap(StreamProgress.scala:25)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply(MicroBatchExecution.scala:390)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply(MicroBatchExecution.scala:390)
at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:271)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch(MicroBatchExecution.scala:389)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:133)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:121)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:121)
at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:271)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:121)
at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:117)
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:279)


I don't understand what's going on, I am simply trying to write data from a kafka topic into HDFS using spark streaming. Why is this so hard? And how can I do it?



I got the batching version to work just fine:



spark.read 
.format("kafka")
.option("kafka.bootstrap.servers", url)
.option("subscribe", topic)
.load()
.selectExpr("CAST(value AS String)")
.write
.format(fileFormat)
.save(filePath)






scala apache-spark hadoop apache-kafka hdfs






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 20 '18 at 23:07









cricket_007

81.2k1142111




81.2k1142111










asked Nov 20 '18 at 20:24









hey_youhey_you

216111




216111













  • Kafka Connect already does this, and is included in Kafka 0.10 and higher... Why write any code to do this?? confluent.io/connector/kafka-connect-hdfs

    – cricket_007
    Nov 20 '18 at 23:08



















  • Kafka Connect already does this, and is included in Kafka 0.10 and higher... Why write any code to do this?? confluent.io/connector/kafka-connect-hdfs

    – cricket_007
    Nov 20 '18 at 23:08

















Kafka Connect already does this, and is included in Kafka 0.10 and higher... Why write any code to do this?? confluent.io/connector/kafka-connect-hdfs

– cricket_007
Nov 20 '18 at 23:08





Kafka Connect already does this, and is included in Kafka 0.10 and higher... Why write any code to do this?? confluent.io/connector/kafka-connect-hdfs

– cricket_007
Nov 20 '18 at 23:08












2 Answers
2






active

oldest

votes


















0














@happy You are encountering a known bug in structured streaming https://issues.apache.org/jira/browse/SPARK-25257



This is because the offset from disk is never deserialized and the fix will be merged in coming release






share|improve this answer
























  • But it's fixed in Spark 2.4?

    – cricket_007
    Nov 21 '18 at 5:09











  • What fileFormat are you using ?

    – bhavin tandel
    Nov 21 '18 at 7:16











  • @cricket_007 I changed my spark version to 2.3.2 and everything started working!

    – hey_you
    Nov 21 '18 at 14:29



















0














Everything started working after I changed my version of spark to 2.3.2.






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53400975%2fhow-can-i-use-spark-to-writestream-data-from-a-kafka-topic-into-hdfs%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    @happy You are encountering a known bug in structured streaming https://issues.apache.org/jira/browse/SPARK-25257



    This is because the offset from disk is never deserialized and the fix will be merged in coming release






    share|improve this answer
























    • But it's fixed in Spark 2.4?

      – cricket_007
      Nov 21 '18 at 5:09











    • What fileFormat are you using ?

      – bhavin tandel
      Nov 21 '18 at 7:16











    • @cricket_007 I changed my spark version to 2.3.2 and everything started working!

      – hey_you
      Nov 21 '18 at 14:29
















    0














    @happy You are encountering a known bug in structured streaming https://issues.apache.org/jira/browse/SPARK-25257



    This is because the offset from disk is never deserialized and the fix will be merged in coming release






    share|improve this answer
























    • But it's fixed in Spark 2.4?

      – cricket_007
      Nov 21 '18 at 5:09











    • What fileFormat are you using ?

      – bhavin tandel
      Nov 21 '18 at 7:16











    • @cricket_007 I changed my spark version to 2.3.2 and everything started working!

      – hey_you
      Nov 21 '18 at 14:29














    0












    0








    0







    @happy You are encountering a known bug in structured streaming https://issues.apache.org/jira/browse/SPARK-25257



    This is because the offset from disk is never deserialized and the fix will be merged in coming release






    share|improve this answer













    @happy You are encountering a known bug in structured streaming https://issues.apache.org/jira/browse/SPARK-25257



    This is because the offset from disk is never deserialized and the fix will be merged in coming release







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Nov 20 '18 at 23:26









    bhavin tandelbhavin tandel

    415




    415













    • But it's fixed in Spark 2.4?

      – cricket_007
      Nov 21 '18 at 5:09











    • What fileFormat are you using ?

      – bhavin tandel
      Nov 21 '18 at 7:16











    • @cricket_007 I changed my spark version to 2.3.2 and everything started working!

      – hey_you
      Nov 21 '18 at 14:29



















    • But it's fixed in Spark 2.4?

      – cricket_007
      Nov 21 '18 at 5:09











    • What fileFormat are you using ?

      – bhavin tandel
      Nov 21 '18 at 7:16











    • @cricket_007 I changed my spark version to 2.3.2 and everything started working!

      – hey_you
      Nov 21 '18 at 14:29

















    But it's fixed in Spark 2.4?

    – cricket_007
    Nov 21 '18 at 5:09





    But it's fixed in Spark 2.4?

    – cricket_007
    Nov 21 '18 at 5:09













    What fileFormat are you using ?

    – bhavin tandel
    Nov 21 '18 at 7:16





    What fileFormat are you using ?

    – bhavin tandel
    Nov 21 '18 at 7:16













    @cricket_007 I changed my spark version to 2.3.2 and everything started working!

    – hey_you
    Nov 21 '18 at 14:29





    @cricket_007 I changed my spark version to 2.3.2 and everything started working!

    – hey_you
    Nov 21 '18 at 14:29













    0














    Everything started working after I changed my version of spark to 2.3.2.






    share|improve this answer




























      0














      Everything started working after I changed my version of spark to 2.3.2.






      share|improve this answer


























        0












        0








        0







        Everything started working after I changed my version of spark to 2.3.2.






        share|improve this answer













        Everything started working after I changed my version of spark to 2.3.2.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 21 '18 at 14:30









        hey_youhey_you

        216111




        216111






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53400975%2fhow-can-i-use-spark-to-writestream-data-from-a-kafka-topic-into-hdfs%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            MongoDB - Not Authorized To Execute Command

            How to fix TextFormField cause rebuild widget in Flutter

            in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith