How to add List[String] values to a single column in Dataframe
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I have a dataframe, I have a list of values (possibly list string) and I want to create a new column in my dataframe and add those list values as column values to this new column. I tried
val x = List("def", "cook", "abc")
val c_df = null
x.foldLeft(c_df)((df, column) => df.withColumn("newcolumnname" , lit(column)))
but it throws StackOverflow
exception, I also tried iterating over list of string values and adding to dataframe but result value is a list of dataframe but all i want is a single dataframe.
Please help!
here is the sample input and output dataframe:
scala apache-spark
|
show 13 more comments
I have a dataframe, I have a list of values (possibly list string) and I want to create a new column in my dataframe and add those list values as column values to this new column. I tried
val x = List("def", "cook", "abc")
val c_df = null
x.foldLeft(c_df)((df, column) => df.withColumn("newcolumnname" , lit(column)))
but it throws StackOverflow
exception, I also tried iterating over list of string values and adding to dataframe but result value is a list of dataframe but all i want is a single dataframe.
Please help!
here is the sample input and output dataframe:
scala apache-spark
can you provide a sample output of how you expect your dataframe to look like? Also are you attempting to create a new dataframe or expanding on an existing one?
– Assaf Mendelson
Jan 3 at 8:44
I am trying to add new column to an existing dataframe, just need a way to add list of string to that new column i added
– Jimmy Maguel
Jan 3 at 8:48
Hi Jimmy, what I understood is you would like to add the list of values to existing DataFrame in a separate column? Is it correct?
– neeraj bhadani
Jan 3 at 8:58
@neerajbhadani, thatz exactly what I want, can u plz help
– Jimmy Maguel
Jan 3 at 9:01
do you want to have a new column per value? Or just a single column with 3 rows?
– AKSW
Jan 3 at 9:09
|
show 13 more comments
I have a dataframe, I have a list of values (possibly list string) and I want to create a new column in my dataframe and add those list values as column values to this new column. I tried
val x = List("def", "cook", "abc")
val c_df = null
x.foldLeft(c_df)((df, column) => df.withColumn("newcolumnname" , lit(column)))
but it throws StackOverflow
exception, I also tried iterating over list of string values and adding to dataframe but result value is a list of dataframe but all i want is a single dataframe.
Please help!
here is the sample input and output dataframe:
scala apache-spark
I have a dataframe, I have a list of values (possibly list string) and I want to create a new column in my dataframe and add those list values as column values to this new column. I tried
val x = List("def", "cook", "abc")
val c_df = null
x.foldLeft(c_df)((df, column) => df.withColumn("newcolumnname" , lit(column)))
but it throws StackOverflow
exception, I also tried iterating over list of string values and adding to dataframe but result value is a list of dataframe but all i want is a single dataframe.
Please help!
here is the sample input and output dataframe:
scala apache-spark
scala apache-spark
edited Jan 3 at 10:02
Jimmy Maguel
asked Jan 3 at 8:36


Jimmy MaguelJimmy Maguel
117
117
can you provide a sample output of how you expect your dataframe to look like? Also are you attempting to create a new dataframe or expanding on an existing one?
– Assaf Mendelson
Jan 3 at 8:44
I am trying to add new column to an existing dataframe, just need a way to add list of string to that new column i added
– Jimmy Maguel
Jan 3 at 8:48
Hi Jimmy, what I understood is you would like to add the list of values to existing DataFrame in a separate column? Is it correct?
– neeraj bhadani
Jan 3 at 8:58
@neerajbhadani, thatz exactly what I want, can u plz help
– Jimmy Maguel
Jan 3 at 9:01
do you want to have a new column per value? Or just a single column with 3 rows?
– AKSW
Jan 3 at 9:09
|
show 13 more comments
can you provide a sample output of how you expect your dataframe to look like? Also are you attempting to create a new dataframe or expanding on an existing one?
– Assaf Mendelson
Jan 3 at 8:44
I am trying to add new column to an existing dataframe, just need a way to add list of string to that new column i added
– Jimmy Maguel
Jan 3 at 8:48
Hi Jimmy, what I understood is you would like to add the list of values to existing DataFrame in a separate column? Is it correct?
– neeraj bhadani
Jan 3 at 8:58
@neerajbhadani, thatz exactly what I want, can u plz help
– Jimmy Maguel
Jan 3 at 9:01
do you want to have a new column per value? Or just a single column with 3 rows?
– AKSW
Jan 3 at 9:09
can you provide a sample output of how you expect your dataframe to look like? Also are you attempting to create a new dataframe or expanding on an existing one?
– Assaf Mendelson
Jan 3 at 8:44
can you provide a sample output of how you expect your dataframe to look like? Also are you attempting to create a new dataframe or expanding on an existing one?
– Assaf Mendelson
Jan 3 at 8:44
I am trying to add new column to an existing dataframe, just need a way to add list of string to that new column i added
– Jimmy Maguel
Jan 3 at 8:48
I am trying to add new column to an existing dataframe, just need a way to add list of string to that new column i added
– Jimmy Maguel
Jan 3 at 8:48
Hi Jimmy, what I understood is you would like to add the list of values to existing DataFrame in a separate column? Is it correct?
– neeraj bhadani
Jan 3 at 8:58
Hi Jimmy, what I understood is you would like to add the list of values to existing DataFrame in a separate column? Is it correct?
– neeraj bhadani
Jan 3 at 8:58
@neerajbhadani, thatz exactly what I want, can u plz help
– Jimmy Maguel
Jan 3 at 9:01
@neerajbhadani, thatz exactly what I want, can u plz help
– Jimmy Maguel
Jan 3 at 9:01
do you want to have a new column per value? Or just a single column with 3 rows?
– AKSW
Jan 3 at 9:09
do you want to have a new column per value? Or just a single column with 3 rows?
– AKSW
Jan 3 at 9:09
|
show 13 more comments
2 Answers
2
active
oldest
votes
You can try below code.
- Create First DataFrame with Index.
from pyspark.sql.functions import *
from pyspark.sql import Window
w = Window.orderBy("Col2")
df = spark.createDataFrame([("a", 10), ("b", 20), ("c", 30)], ["Col1", "Col2"])
df1 = df.withColumn("index", row_number().over(w))
df1.show()
- Create Another DataFrame from List of Values.
from pyspark.sql.types import * newdf = spark.createDataFrame(['x','y', 'z'], StringType()) newdf.show()
- Add Index column to DF created from List of values in step 2.
w = Window.orderBy("value")
df2 = newdf.withColumn("index", row_number().over(w))
df2.show()
- Join the DataFrame df1 and df2 based on index.
df1.join(df2, "index").show()
1
its in python, do u have scala version plz
– Jimmy Maguel
Jan 3 at 9:44
add a comment |
There is a function array
in Spark 1.4 or later that takes an array of Column
s and returns a new Column
. Function lit
takes a Scala value and returns a Column
type.
import spark.implicits._
val df = Seq(1, 2, 3).toDF("col1")
df.withColumn("new_col", array(lit("def"), lit("cook"), lit("abc"))).show
+----+----------------+
|col1| new_col|
+----+----------------+
| 1|[def, cook, abc]|
| 2|[def, cook, abc]|
| 3|[def, cook, abc]|
+----+----------------+
In Spark 2.2.0, there is a function typedLit
that takes Scala types and returns a Column
type. this function can handle parameterized scala types e.g.: List, Seq and Map.
val newDF = df.withColumn("new_col", typedLit(List("def", "cook", "abc")))
newDF.show()
newDF.printSchema()
+----+----------------+
|col1| new_col|
+----+----------------+
| 1|[def, cook, abc]|
| 2|[def, cook, abc]|
| 3|[def, cook, abc]|
+----+----------------+
root
|-- col1: integer (nullable = false)
|-- new_col: array (nullable = false)
| |-- element: string (containsNull = true)
This is what you wanted to do ? You can add when
to conditionally add different set of lists to each row.
instead of duplicating array, i wanted in col 1 value as def, in col2 cook and in col3 abc only
– Jimmy Maguel
Jan 3 at 10:28
in new_col data is getting duplicated in ur answer
– Jimmy Maguel
Jan 3 at 11:17
Adding to @ryandam answer: df.withColumn("new_col", typedLit(List("def", "cook", "abc"))).withColumn("col1", new_col(0)).withColumn("col2", new_col(1)).withColumn("col3", new_col(2)).drop("new_col")
– Mohd Avais
Jan 4 at 10:59
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54018795%2fhow-to-add-liststring-values-to-a-single-column-in-dataframe%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can try below code.
- Create First DataFrame with Index.
from pyspark.sql.functions import *
from pyspark.sql import Window
w = Window.orderBy("Col2")
df = spark.createDataFrame([("a", 10), ("b", 20), ("c", 30)], ["Col1", "Col2"])
df1 = df.withColumn("index", row_number().over(w))
df1.show()
- Create Another DataFrame from List of Values.
from pyspark.sql.types import * newdf = spark.createDataFrame(['x','y', 'z'], StringType()) newdf.show()
- Add Index column to DF created from List of values in step 2.
w = Window.orderBy("value")
df2 = newdf.withColumn("index", row_number().over(w))
df2.show()
- Join the DataFrame df1 and df2 based on index.
df1.join(df2, "index").show()
1
its in python, do u have scala version plz
– Jimmy Maguel
Jan 3 at 9:44
add a comment |
You can try below code.
- Create First DataFrame with Index.
from pyspark.sql.functions import *
from pyspark.sql import Window
w = Window.orderBy("Col2")
df = spark.createDataFrame([("a", 10), ("b", 20), ("c", 30)], ["Col1", "Col2"])
df1 = df.withColumn("index", row_number().over(w))
df1.show()
- Create Another DataFrame from List of Values.
from pyspark.sql.types import * newdf = spark.createDataFrame(['x','y', 'z'], StringType()) newdf.show()
- Add Index column to DF created from List of values in step 2.
w = Window.orderBy("value")
df2 = newdf.withColumn("index", row_number().over(w))
df2.show()
- Join the DataFrame df1 and df2 based on index.
df1.join(df2, "index").show()
1
its in python, do u have scala version plz
– Jimmy Maguel
Jan 3 at 9:44
add a comment |
You can try below code.
- Create First DataFrame with Index.
from pyspark.sql.functions import *
from pyspark.sql import Window
w = Window.orderBy("Col2")
df = spark.createDataFrame([("a", 10), ("b", 20), ("c", 30)], ["Col1", "Col2"])
df1 = df.withColumn("index", row_number().over(w))
df1.show()
- Create Another DataFrame from List of Values.
from pyspark.sql.types import * newdf = spark.createDataFrame(['x','y', 'z'], StringType()) newdf.show()
- Add Index column to DF created from List of values in step 2.
w = Window.orderBy("value")
df2 = newdf.withColumn("index", row_number().over(w))
df2.show()
- Join the DataFrame df1 and df2 based on index.
df1.join(df2, "index").show()
You can try below code.
- Create First DataFrame with Index.
from pyspark.sql.functions import *
from pyspark.sql import Window
w = Window.orderBy("Col2")
df = spark.createDataFrame([("a", 10), ("b", 20), ("c", 30)], ["Col1", "Col2"])
df1 = df.withColumn("index", row_number().over(w))
df1.show()
- Create Another DataFrame from List of Values.
from pyspark.sql.types import * newdf = spark.createDataFrame(['x','y', 'z'], StringType()) newdf.show()
- Add Index column to DF created from List of values in step 2.
w = Window.orderBy("value")
df2 = newdf.withColumn("index", row_number().over(w))
df2.show()
- Join the DataFrame df1 and df2 based on index.
df1.join(df2, "index").show()
answered Jan 3 at 9:42


neeraj bhadanineeraj bhadani
935313
935313
1
its in python, do u have scala version plz
– Jimmy Maguel
Jan 3 at 9:44
add a comment |
1
its in python, do u have scala version plz
– Jimmy Maguel
Jan 3 at 9:44
1
1
its in python, do u have scala version plz
– Jimmy Maguel
Jan 3 at 9:44
its in python, do u have scala version plz
– Jimmy Maguel
Jan 3 at 9:44
add a comment |
There is a function array
in Spark 1.4 or later that takes an array of Column
s and returns a new Column
. Function lit
takes a Scala value and returns a Column
type.
import spark.implicits._
val df = Seq(1, 2, 3).toDF("col1")
df.withColumn("new_col", array(lit("def"), lit("cook"), lit("abc"))).show
+----+----------------+
|col1| new_col|
+----+----------------+
| 1|[def, cook, abc]|
| 2|[def, cook, abc]|
| 3|[def, cook, abc]|
+----+----------------+
In Spark 2.2.0, there is a function typedLit
that takes Scala types and returns a Column
type. this function can handle parameterized scala types e.g.: List, Seq and Map.
val newDF = df.withColumn("new_col", typedLit(List("def", "cook", "abc")))
newDF.show()
newDF.printSchema()
+----+----------------+
|col1| new_col|
+----+----------------+
| 1|[def, cook, abc]|
| 2|[def, cook, abc]|
| 3|[def, cook, abc]|
+----+----------------+
root
|-- col1: integer (nullable = false)
|-- new_col: array (nullable = false)
| |-- element: string (containsNull = true)
This is what you wanted to do ? You can add when
to conditionally add different set of lists to each row.
instead of duplicating array, i wanted in col 1 value as def, in col2 cook and in col3 abc only
– Jimmy Maguel
Jan 3 at 10:28
in new_col data is getting duplicated in ur answer
– Jimmy Maguel
Jan 3 at 11:17
Adding to @ryandam answer: df.withColumn("new_col", typedLit(List("def", "cook", "abc"))).withColumn("col1", new_col(0)).withColumn("col2", new_col(1)).withColumn("col3", new_col(2)).drop("new_col")
– Mohd Avais
Jan 4 at 10:59
add a comment |
There is a function array
in Spark 1.4 or later that takes an array of Column
s and returns a new Column
. Function lit
takes a Scala value and returns a Column
type.
import spark.implicits._
val df = Seq(1, 2, 3).toDF("col1")
df.withColumn("new_col", array(lit("def"), lit("cook"), lit("abc"))).show
+----+----------------+
|col1| new_col|
+----+----------------+
| 1|[def, cook, abc]|
| 2|[def, cook, abc]|
| 3|[def, cook, abc]|
+----+----------------+
In Spark 2.2.0, there is a function typedLit
that takes Scala types and returns a Column
type. this function can handle parameterized scala types e.g.: List, Seq and Map.
val newDF = df.withColumn("new_col", typedLit(List("def", "cook", "abc")))
newDF.show()
newDF.printSchema()
+----+----------------+
|col1| new_col|
+----+----------------+
| 1|[def, cook, abc]|
| 2|[def, cook, abc]|
| 3|[def, cook, abc]|
+----+----------------+
root
|-- col1: integer (nullable = false)
|-- new_col: array (nullable = false)
| |-- element: string (containsNull = true)
This is what you wanted to do ? You can add when
to conditionally add different set of lists to each row.
instead of duplicating array, i wanted in col 1 value as def, in col2 cook and in col3 abc only
– Jimmy Maguel
Jan 3 at 10:28
in new_col data is getting duplicated in ur answer
– Jimmy Maguel
Jan 3 at 11:17
Adding to @ryandam answer: df.withColumn("new_col", typedLit(List("def", "cook", "abc"))).withColumn("col1", new_col(0)).withColumn("col2", new_col(1)).withColumn("col3", new_col(2)).drop("new_col")
– Mohd Avais
Jan 4 at 10:59
add a comment |
There is a function array
in Spark 1.4 or later that takes an array of Column
s and returns a new Column
. Function lit
takes a Scala value and returns a Column
type.
import spark.implicits._
val df = Seq(1, 2, 3).toDF("col1")
df.withColumn("new_col", array(lit("def"), lit("cook"), lit("abc"))).show
+----+----------------+
|col1| new_col|
+----+----------------+
| 1|[def, cook, abc]|
| 2|[def, cook, abc]|
| 3|[def, cook, abc]|
+----+----------------+
In Spark 2.2.0, there is a function typedLit
that takes Scala types and returns a Column
type. this function can handle parameterized scala types e.g.: List, Seq and Map.
val newDF = df.withColumn("new_col", typedLit(List("def", "cook", "abc")))
newDF.show()
newDF.printSchema()
+----+----------------+
|col1| new_col|
+----+----------------+
| 1|[def, cook, abc]|
| 2|[def, cook, abc]|
| 3|[def, cook, abc]|
+----+----------------+
root
|-- col1: integer (nullable = false)
|-- new_col: array (nullable = false)
| |-- element: string (containsNull = true)
This is what you wanted to do ? You can add when
to conditionally add different set of lists to each row.
There is a function array
in Spark 1.4 or later that takes an array of Column
s and returns a new Column
. Function lit
takes a Scala value and returns a Column
type.
import spark.implicits._
val df = Seq(1, 2, 3).toDF("col1")
df.withColumn("new_col", array(lit("def"), lit("cook"), lit("abc"))).show
+----+----------------+
|col1| new_col|
+----+----------------+
| 1|[def, cook, abc]|
| 2|[def, cook, abc]|
| 3|[def, cook, abc]|
+----+----------------+
In Spark 2.2.0, there is a function typedLit
that takes Scala types and returns a Column
type. this function can handle parameterized scala types e.g.: List, Seq and Map.
val newDF = df.withColumn("new_col", typedLit(List("def", "cook", "abc")))
newDF.show()
newDF.printSchema()
+----+----------------+
|col1| new_col|
+----+----------------+
| 1|[def, cook, abc]|
| 2|[def, cook, abc]|
| 3|[def, cook, abc]|
+----+----------------+
root
|-- col1: integer (nullable = false)
|-- new_col: array (nullable = false)
| |-- element: string (containsNull = true)
This is what you wanted to do ? You can add when
to conditionally add different set of lists to each row.
answered Jan 3 at 10:18
ryandamryandam
1346
1346
instead of duplicating array, i wanted in col 1 value as def, in col2 cook and in col3 abc only
– Jimmy Maguel
Jan 3 at 10:28
in new_col data is getting duplicated in ur answer
– Jimmy Maguel
Jan 3 at 11:17
Adding to @ryandam answer: df.withColumn("new_col", typedLit(List("def", "cook", "abc"))).withColumn("col1", new_col(0)).withColumn("col2", new_col(1)).withColumn("col3", new_col(2)).drop("new_col")
– Mohd Avais
Jan 4 at 10:59
add a comment |
instead of duplicating array, i wanted in col 1 value as def, in col2 cook and in col3 abc only
– Jimmy Maguel
Jan 3 at 10:28
in new_col data is getting duplicated in ur answer
– Jimmy Maguel
Jan 3 at 11:17
Adding to @ryandam answer: df.withColumn("new_col", typedLit(List("def", "cook", "abc"))).withColumn("col1", new_col(0)).withColumn("col2", new_col(1)).withColumn("col3", new_col(2)).drop("new_col")
– Mohd Avais
Jan 4 at 10:59
instead of duplicating array, i wanted in col 1 value as def, in col2 cook and in col3 abc only
– Jimmy Maguel
Jan 3 at 10:28
instead of duplicating array, i wanted in col 1 value as def, in col2 cook and in col3 abc only
– Jimmy Maguel
Jan 3 at 10:28
in new_col data is getting duplicated in ur answer
– Jimmy Maguel
Jan 3 at 11:17
in new_col data is getting duplicated in ur answer
– Jimmy Maguel
Jan 3 at 11:17
Adding to @ryandam answer: df.withColumn("new_col", typedLit(List("def", "cook", "abc"))).withColumn("col1", new_col(0)).withColumn("col2", new_col(1)).withColumn("col3", new_col(2)).drop("new_col")
– Mohd Avais
Jan 4 at 10:59
Adding to @ryandam answer: df.withColumn("new_col", typedLit(List("def", "cook", "abc"))).withColumn("col1", new_col(0)).withColumn("col2", new_col(1)).withColumn("col3", new_col(2)).drop("new_col")
– Mohd Avais
Jan 4 at 10:59
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54018795%2fhow-to-add-liststring-values-to-a-single-column-in-dataframe%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
can you provide a sample output of how you expect your dataframe to look like? Also are you attempting to create a new dataframe or expanding on an existing one?
– Assaf Mendelson
Jan 3 at 8:44
I am trying to add new column to an existing dataframe, just need a way to add list of string to that new column i added
– Jimmy Maguel
Jan 3 at 8:48
Hi Jimmy, what I understood is you would like to add the list of values to existing DataFrame in a separate column? Is it correct?
– neeraj bhadani
Jan 3 at 8:58
@neerajbhadani, thatz exactly what I want, can u plz help
– Jimmy Maguel
Jan 3 at 9:01
do you want to have a new column per value? Or just a single column with 3 rows?
– AKSW
Jan 3 at 9:09