How to add List[String] values to a single column in Dataframe





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







0















I have a dataframe, I have a list of values (possibly list string) and I want to create a new column in my dataframe and add those list values as column values to this new column. I tried



val x = List("def", "cook", "abc")
val c_df = null
x.foldLeft(c_df)((df, column) => df.withColumn("newcolumnname" , lit(column)))


but it throws StackOverflow exception, I also tried iterating over list of string values and adding to dataframe but result value is a list of dataframe but all i want is a single dataframe.



Please help!



here is the sample input and output dataframe:
enter image description here










share|improve this question

























  • can you provide a sample output of how you expect your dataframe to look like? Also are you attempting to create a new dataframe or expanding on an existing one?

    – Assaf Mendelson
    Jan 3 at 8:44











  • I am trying to add new column to an existing dataframe, just need a way to add list of string to that new column i added

    – Jimmy Maguel
    Jan 3 at 8:48













  • Hi Jimmy, what I understood is you would like to add the list of values to existing DataFrame in a separate column? Is it correct?

    – neeraj bhadani
    Jan 3 at 8:58











  • @neerajbhadani, thatz exactly what I want, can u plz help

    – Jimmy Maguel
    Jan 3 at 9:01











  • do you want to have a new column per value? Or just a single column with 3 rows?

    – AKSW
    Jan 3 at 9:09


















0















I have a dataframe, I have a list of values (possibly list string) and I want to create a new column in my dataframe and add those list values as column values to this new column. I tried



val x = List("def", "cook", "abc")
val c_df = null
x.foldLeft(c_df)((df, column) => df.withColumn("newcolumnname" , lit(column)))


but it throws StackOverflow exception, I also tried iterating over list of string values and adding to dataframe but result value is a list of dataframe but all i want is a single dataframe.



Please help!



here is the sample input and output dataframe:
enter image description here










share|improve this question

























  • can you provide a sample output of how you expect your dataframe to look like? Also are you attempting to create a new dataframe or expanding on an existing one?

    – Assaf Mendelson
    Jan 3 at 8:44











  • I am trying to add new column to an existing dataframe, just need a way to add list of string to that new column i added

    – Jimmy Maguel
    Jan 3 at 8:48













  • Hi Jimmy, what I understood is you would like to add the list of values to existing DataFrame in a separate column? Is it correct?

    – neeraj bhadani
    Jan 3 at 8:58











  • @neerajbhadani, thatz exactly what I want, can u plz help

    – Jimmy Maguel
    Jan 3 at 9:01











  • do you want to have a new column per value? Or just a single column with 3 rows?

    – AKSW
    Jan 3 at 9:09














0












0








0








I have a dataframe, I have a list of values (possibly list string) and I want to create a new column in my dataframe and add those list values as column values to this new column. I tried



val x = List("def", "cook", "abc")
val c_df = null
x.foldLeft(c_df)((df, column) => df.withColumn("newcolumnname" , lit(column)))


but it throws StackOverflow exception, I also tried iterating over list of string values and adding to dataframe but result value is a list of dataframe but all i want is a single dataframe.



Please help!



here is the sample input and output dataframe:
enter image description here










share|improve this question
















I have a dataframe, I have a list of values (possibly list string) and I want to create a new column in my dataframe and add those list values as column values to this new column. I tried



val x = List("def", "cook", "abc")
val c_df = null
x.foldLeft(c_df)((df, column) => df.withColumn("newcolumnname" , lit(column)))


but it throws StackOverflow exception, I also tried iterating over list of string values and adding to dataframe but result value is a list of dataframe but all i want is a single dataframe.



Please help!



here is the sample input and output dataframe:
enter image description here







scala apache-spark






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 3 at 10:02







Jimmy Maguel

















asked Jan 3 at 8:36









Jimmy MaguelJimmy Maguel

117




117













  • can you provide a sample output of how you expect your dataframe to look like? Also are you attempting to create a new dataframe or expanding on an existing one?

    – Assaf Mendelson
    Jan 3 at 8:44











  • I am trying to add new column to an existing dataframe, just need a way to add list of string to that new column i added

    – Jimmy Maguel
    Jan 3 at 8:48













  • Hi Jimmy, what I understood is you would like to add the list of values to existing DataFrame in a separate column? Is it correct?

    – neeraj bhadani
    Jan 3 at 8:58











  • @neerajbhadani, thatz exactly what I want, can u plz help

    – Jimmy Maguel
    Jan 3 at 9:01











  • do you want to have a new column per value? Or just a single column with 3 rows?

    – AKSW
    Jan 3 at 9:09



















  • can you provide a sample output of how you expect your dataframe to look like? Also are you attempting to create a new dataframe or expanding on an existing one?

    – Assaf Mendelson
    Jan 3 at 8:44











  • I am trying to add new column to an existing dataframe, just need a way to add list of string to that new column i added

    – Jimmy Maguel
    Jan 3 at 8:48













  • Hi Jimmy, what I understood is you would like to add the list of values to existing DataFrame in a separate column? Is it correct?

    – neeraj bhadani
    Jan 3 at 8:58











  • @neerajbhadani, thatz exactly what I want, can u plz help

    – Jimmy Maguel
    Jan 3 at 9:01











  • do you want to have a new column per value? Or just a single column with 3 rows?

    – AKSW
    Jan 3 at 9:09

















can you provide a sample output of how you expect your dataframe to look like? Also are you attempting to create a new dataframe or expanding on an existing one?

– Assaf Mendelson
Jan 3 at 8:44





can you provide a sample output of how you expect your dataframe to look like? Also are you attempting to create a new dataframe or expanding on an existing one?

– Assaf Mendelson
Jan 3 at 8:44













I am trying to add new column to an existing dataframe, just need a way to add list of string to that new column i added

– Jimmy Maguel
Jan 3 at 8:48







I am trying to add new column to an existing dataframe, just need a way to add list of string to that new column i added

– Jimmy Maguel
Jan 3 at 8:48















Hi Jimmy, what I understood is you would like to add the list of values to existing DataFrame in a separate column? Is it correct?

– neeraj bhadani
Jan 3 at 8:58





Hi Jimmy, what I understood is you would like to add the list of values to existing DataFrame in a separate column? Is it correct?

– neeraj bhadani
Jan 3 at 8:58













@neerajbhadani, thatz exactly what I want, can u plz help

– Jimmy Maguel
Jan 3 at 9:01





@neerajbhadani, thatz exactly what I want, can u plz help

– Jimmy Maguel
Jan 3 at 9:01













do you want to have a new column per value? Or just a single column with 3 rows?

– AKSW
Jan 3 at 9:09





do you want to have a new column per value? Or just a single column with 3 rows?

– AKSW
Jan 3 at 9:09












2 Answers
2






active

oldest

votes


















0














You can try below code.




  1. Create First DataFrame with Index.



from pyspark.sql.functions import *
from pyspark.sql import Window
w = Window.orderBy("Col2")
df = spark.createDataFrame([("a", 10), ("b", 20), ("c", 30)], ["Col1", "Col2"])
df1 = df.withColumn("index", row_number().over(w))
df1.show()




  1. Create Another DataFrame from List of Values.



from pyspark.sql.types import * newdf = spark.createDataFrame(['x','y', 'z'], StringType()) newdf.show()




  1. Add Index column to DF created from List of values in step 2.



 w = Window.orderBy("value")
df2 = newdf.withColumn("index", row_number().over(w))
df2.show()




  1. Join the DataFrame df1 and df2 based on index.



df1.join(df2, "index").show()







share|improve this answer



















  • 1





    its in python, do u have scala version plz

    – Jimmy Maguel
    Jan 3 at 9:44



















0














There is a function array in Spark 1.4 or later that takes an array of Columns and returns a new Column. Function lit takes a Scala value and returns a Column type.



import spark.implicits._
val df = Seq(1, 2, 3).toDF("col1")
df.withColumn("new_col", array(lit("def"), lit("cook"), lit("abc"))).show

+----+----------------+
|col1| new_col|
+----+----------------+
| 1|[def, cook, abc]|
| 2|[def, cook, abc]|
| 3|[def, cook, abc]|
+----+----------------+


In Spark 2.2.0, there is a function typedLit that takes Scala types and returns a Column type. this function can handle parameterized scala types e.g.: List, Seq and Map.



val newDF = df.withColumn("new_col", typedLit(List("def", "cook", "abc")))
newDF.show()
newDF.printSchema()

+----+----------------+
|col1| new_col|
+----+----------------+
| 1|[def, cook, abc]|
| 2|[def, cook, abc]|
| 3|[def, cook, abc]|
+----+----------------+

root
|-- col1: integer (nullable = false)
|-- new_col: array (nullable = false)
| |-- element: string (containsNull = true)


This is what you wanted to do ? You can add when to conditionally add different set of lists to each row.






share|improve this answer
























  • instead of duplicating array, i wanted in col 1 value as def, in col2 cook and in col3 abc only

    – Jimmy Maguel
    Jan 3 at 10:28











  • in new_col data is getting duplicated in ur answer

    – Jimmy Maguel
    Jan 3 at 11:17











  • Adding to @ryandam answer: df.withColumn("new_col", typedLit(List("def", "cook", "abc"))).withColumn("col1", new_col(0)).withColumn("col2", new_col(1)).withColumn("col3", new_col(2)).drop("new_col")

    – Mohd Avais
    Jan 4 at 10:59














Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54018795%2fhow-to-add-liststring-values-to-a-single-column-in-dataframe%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














You can try below code.




  1. Create First DataFrame with Index.



from pyspark.sql.functions import *
from pyspark.sql import Window
w = Window.orderBy("Col2")
df = spark.createDataFrame([("a", 10), ("b", 20), ("c", 30)], ["Col1", "Col2"])
df1 = df.withColumn("index", row_number().over(w))
df1.show()




  1. Create Another DataFrame from List of Values.



from pyspark.sql.types import * newdf = spark.createDataFrame(['x','y', 'z'], StringType()) newdf.show()




  1. Add Index column to DF created from List of values in step 2.



 w = Window.orderBy("value")
df2 = newdf.withColumn("index", row_number().over(w))
df2.show()




  1. Join the DataFrame df1 and df2 based on index.



df1.join(df2, "index").show()







share|improve this answer



















  • 1





    its in python, do u have scala version plz

    – Jimmy Maguel
    Jan 3 at 9:44
















0














You can try below code.




  1. Create First DataFrame with Index.



from pyspark.sql.functions import *
from pyspark.sql import Window
w = Window.orderBy("Col2")
df = spark.createDataFrame([("a", 10), ("b", 20), ("c", 30)], ["Col1", "Col2"])
df1 = df.withColumn("index", row_number().over(w))
df1.show()




  1. Create Another DataFrame from List of Values.



from pyspark.sql.types import * newdf = spark.createDataFrame(['x','y', 'z'], StringType()) newdf.show()




  1. Add Index column to DF created from List of values in step 2.



 w = Window.orderBy("value")
df2 = newdf.withColumn("index", row_number().over(w))
df2.show()




  1. Join the DataFrame df1 and df2 based on index.



df1.join(df2, "index").show()







share|improve this answer



















  • 1





    its in python, do u have scala version plz

    – Jimmy Maguel
    Jan 3 at 9:44














0












0








0







You can try below code.




  1. Create First DataFrame with Index.



from pyspark.sql.functions import *
from pyspark.sql import Window
w = Window.orderBy("Col2")
df = spark.createDataFrame([("a", 10), ("b", 20), ("c", 30)], ["Col1", "Col2"])
df1 = df.withColumn("index", row_number().over(w))
df1.show()




  1. Create Another DataFrame from List of Values.



from pyspark.sql.types import * newdf = spark.createDataFrame(['x','y', 'z'], StringType()) newdf.show()




  1. Add Index column to DF created from List of values in step 2.



 w = Window.orderBy("value")
df2 = newdf.withColumn("index", row_number().over(w))
df2.show()




  1. Join the DataFrame df1 and df2 based on index.



df1.join(df2, "index").show()







share|improve this answer













You can try below code.




  1. Create First DataFrame with Index.



from pyspark.sql.functions import *
from pyspark.sql import Window
w = Window.orderBy("Col2")
df = spark.createDataFrame([("a", 10), ("b", 20), ("c", 30)], ["Col1", "Col2"])
df1 = df.withColumn("index", row_number().over(w))
df1.show()




  1. Create Another DataFrame from List of Values.



from pyspark.sql.types import * newdf = spark.createDataFrame(['x','y', 'z'], StringType()) newdf.show()




  1. Add Index column to DF created from List of values in step 2.



 w = Window.orderBy("value")
df2 = newdf.withColumn("index", row_number().over(w))
df2.show()




  1. Join the DataFrame df1 and df2 based on index.



df1.join(df2, "index").show()








share|improve this answer












share|improve this answer



share|improve this answer










answered Jan 3 at 9:42









neeraj bhadanineeraj bhadani

935313




935313








  • 1





    its in python, do u have scala version plz

    – Jimmy Maguel
    Jan 3 at 9:44














  • 1





    its in python, do u have scala version plz

    – Jimmy Maguel
    Jan 3 at 9:44








1




1





its in python, do u have scala version plz

– Jimmy Maguel
Jan 3 at 9:44





its in python, do u have scala version plz

– Jimmy Maguel
Jan 3 at 9:44













0














There is a function array in Spark 1.4 or later that takes an array of Columns and returns a new Column. Function lit takes a Scala value and returns a Column type.



import spark.implicits._
val df = Seq(1, 2, 3).toDF("col1")
df.withColumn("new_col", array(lit("def"), lit("cook"), lit("abc"))).show

+----+----------------+
|col1| new_col|
+----+----------------+
| 1|[def, cook, abc]|
| 2|[def, cook, abc]|
| 3|[def, cook, abc]|
+----+----------------+


In Spark 2.2.0, there is a function typedLit that takes Scala types and returns a Column type. this function can handle parameterized scala types e.g.: List, Seq and Map.



val newDF = df.withColumn("new_col", typedLit(List("def", "cook", "abc")))
newDF.show()
newDF.printSchema()

+----+----------------+
|col1| new_col|
+----+----------------+
| 1|[def, cook, abc]|
| 2|[def, cook, abc]|
| 3|[def, cook, abc]|
+----+----------------+

root
|-- col1: integer (nullable = false)
|-- new_col: array (nullable = false)
| |-- element: string (containsNull = true)


This is what you wanted to do ? You can add when to conditionally add different set of lists to each row.






share|improve this answer
























  • instead of duplicating array, i wanted in col 1 value as def, in col2 cook and in col3 abc only

    – Jimmy Maguel
    Jan 3 at 10:28











  • in new_col data is getting duplicated in ur answer

    – Jimmy Maguel
    Jan 3 at 11:17











  • Adding to @ryandam answer: df.withColumn("new_col", typedLit(List("def", "cook", "abc"))).withColumn("col1", new_col(0)).withColumn("col2", new_col(1)).withColumn("col3", new_col(2)).drop("new_col")

    – Mohd Avais
    Jan 4 at 10:59


















0














There is a function array in Spark 1.4 or later that takes an array of Columns and returns a new Column. Function lit takes a Scala value and returns a Column type.



import spark.implicits._
val df = Seq(1, 2, 3).toDF("col1")
df.withColumn("new_col", array(lit("def"), lit("cook"), lit("abc"))).show

+----+----------------+
|col1| new_col|
+----+----------------+
| 1|[def, cook, abc]|
| 2|[def, cook, abc]|
| 3|[def, cook, abc]|
+----+----------------+


In Spark 2.2.0, there is a function typedLit that takes Scala types and returns a Column type. this function can handle parameterized scala types e.g.: List, Seq and Map.



val newDF = df.withColumn("new_col", typedLit(List("def", "cook", "abc")))
newDF.show()
newDF.printSchema()

+----+----------------+
|col1| new_col|
+----+----------------+
| 1|[def, cook, abc]|
| 2|[def, cook, abc]|
| 3|[def, cook, abc]|
+----+----------------+

root
|-- col1: integer (nullable = false)
|-- new_col: array (nullable = false)
| |-- element: string (containsNull = true)


This is what you wanted to do ? You can add when to conditionally add different set of lists to each row.






share|improve this answer
























  • instead of duplicating array, i wanted in col 1 value as def, in col2 cook and in col3 abc only

    – Jimmy Maguel
    Jan 3 at 10:28











  • in new_col data is getting duplicated in ur answer

    – Jimmy Maguel
    Jan 3 at 11:17











  • Adding to @ryandam answer: df.withColumn("new_col", typedLit(List("def", "cook", "abc"))).withColumn("col1", new_col(0)).withColumn("col2", new_col(1)).withColumn("col3", new_col(2)).drop("new_col")

    – Mohd Avais
    Jan 4 at 10:59
















0












0








0







There is a function array in Spark 1.4 or later that takes an array of Columns and returns a new Column. Function lit takes a Scala value and returns a Column type.



import spark.implicits._
val df = Seq(1, 2, 3).toDF("col1")
df.withColumn("new_col", array(lit("def"), lit("cook"), lit("abc"))).show

+----+----------------+
|col1| new_col|
+----+----------------+
| 1|[def, cook, abc]|
| 2|[def, cook, abc]|
| 3|[def, cook, abc]|
+----+----------------+


In Spark 2.2.0, there is a function typedLit that takes Scala types and returns a Column type. this function can handle parameterized scala types e.g.: List, Seq and Map.



val newDF = df.withColumn("new_col", typedLit(List("def", "cook", "abc")))
newDF.show()
newDF.printSchema()

+----+----------------+
|col1| new_col|
+----+----------------+
| 1|[def, cook, abc]|
| 2|[def, cook, abc]|
| 3|[def, cook, abc]|
+----+----------------+

root
|-- col1: integer (nullable = false)
|-- new_col: array (nullable = false)
| |-- element: string (containsNull = true)


This is what you wanted to do ? You can add when to conditionally add different set of lists to each row.






share|improve this answer













There is a function array in Spark 1.4 or later that takes an array of Columns and returns a new Column. Function lit takes a Scala value and returns a Column type.



import spark.implicits._
val df = Seq(1, 2, 3).toDF("col1")
df.withColumn("new_col", array(lit("def"), lit("cook"), lit("abc"))).show

+----+----------------+
|col1| new_col|
+----+----------------+
| 1|[def, cook, abc]|
| 2|[def, cook, abc]|
| 3|[def, cook, abc]|
+----+----------------+


In Spark 2.2.0, there is a function typedLit that takes Scala types and returns a Column type. this function can handle parameterized scala types e.g.: List, Seq and Map.



val newDF = df.withColumn("new_col", typedLit(List("def", "cook", "abc")))
newDF.show()
newDF.printSchema()

+----+----------------+
|col1| new_col|
+----+----------------+
| 1|[def, cook, abc]|
| 2|[def, cook, abc]|
| 3|[def, cook, abc]|
+----+----------------+

root
|-- col1: integer (nullable = false)
|-- new_col: array (nullable = false)
| |-- element: string (containsNull = true)


This is what you wanted to do ? You can add when to conditionally add different set of lists to each row.







share|improve this answer












share|improve this answer



share|improve this answer










answered Jan 3 at 10:18









ryandamryandam

1346




1346













  • instead of duplicating array, i wanted in col 1 value as def, in col2 cook and in col3 abc only

    – Jimmy Maguel
    Jan 3 at 10:28











  • in new_col data is getting duplicated in ur answer

    – Jimmy Maguel
    Jan 3 at 11:17











  • Adding to @ryandam answer: df.withColumn("new_col", typedLit(List("def", "cook", "abc"))).withColumn("col1", new_col(0)).withColumn("col2", new_col(1)).withColumn("col3", new_col(2)).drop("new_col")

    – Mohd Avais
    Jan 4 at 10:59





















  • instead of duplicating array, i wanted in col 1 value as def, in col2 cook and in col3 abc only

    – Jimmy Maguel
    Jan 3 at 10:28











  • in new_col data is getting duplicated in ur answer

    – Jimmy Maguel
    Jan 3 at 11:17











  • Adding to @ryandam answer: df.withColumn("new_col", typedLit(List("def", "cook", "abc"))).withColumn("col1", new_col(0)).withColumn("col2", new_col(1)).withColumn("col3", new_col(2)).drop("new_col")

    – Mohd Avais
    Jan 4 at 10:59



















instead of duplicating array, i wanted in col 1 value as def, in col2 cook and in col3 abc only

– Jimmy Maguel
Jan 3 at 10:28





instead of duplicating array, i wanted in col 1 value as def, in col2 cook and in col3 abc only

– Jimmy Maguel
Jan 3 at 10:28













in new_col data is getting duplicated in ur answer

– Jimmy Maguel
Jan 3 at 11:17





in new_col data is getting duplicated in ur answer

– Jimmy Maguel
Jan 3 at 11:17













Adding to @ryandam answer: df.withColumn("new_col", typedLit(List("def", "cook", "abc"))).withColumn("col1", new_col(0)).withColumn("col2", new_col(1)).withColumn("col3", new_col(2)).drop("new_col")

– Mohd Avais
Jan 4 at 10:59







Adding to @ryandam answer: df.withColumn("new_col", typedLit(List("def", "cook", "abc"))).withColumn("col1", new_col(0)).withColumn("col2", new_col(1)).withColumn("col3", new_col(2)).drop("new_col")

– Mohd Avais
Jan 4 at 10:59




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54018795%2fhow-to-add-liststring-values-to-a-single-column-in-dataframe%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

MongoDB - Not Authorized To Execute Command

How to fix TextFormField cause rebuild widget in Flutter

in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith