How to add List[String] values to a single column in Dataframe

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

I have a dataframe, I have a list of values (possibly list string) and I want to create a new column in my dataframe and add those list values as column values to this new column. I tried

val x = List("def", "cook", "abc")

val c_df = null

x.foldLeft(c_df)((df, column) => df.withColumn("newcolumnname" , lit(column)))

but it throws StackOverflow exception, I also tried iterating over list of string values and adding to dataframe but result value is a list of dataframe but all i want is a single dataframe.

Please help!

here is the sample input and output dataframe:
enter image description here

edited Jan 3 at 10:02

asked Jan 3 at 8:36

Jimmy Maguel

117

can you provide a sample output of how you expect your dataframe to look like? Also are you attempting to create a new dataframe or expanding on an existing one?

– Assaf Mendelson
Jan 3 at 8:44

I am trying to add new column to an existing dataframe, just need a way to add list of string to that new column i added

– Jimmy Maguel
Jan 3 at 8:48

Hi Jimmy, what I understood is you would like to add the list of values to existing DataFrame in a separate column? Is it correct?

– neeraj bhadani
Jan 3 at 8:58

@neerajbhadani, thatz exactly what I want, can u plz help

– Jimmy Maguel
Jan 3 at 9:01

do you want to have a new column per value? Or just a single column with 3 rows?

– AKSW
Jan 3 at 9:09

|
show 13 more comments

I have a dataframe, I have a list of values (possibly list string) and I want to create a new column in my dataframe and add those list values as column values to this new column. I tried

val x = List("def", "cook", "abc")

val c_df = null

x.foldLeft(c_df)((df, column) => df.withColumn("newcolumnname" , lit(column)))

but it throws StackOverflow exception, I also tried iterating over list of string values and adding to dataframe but result value is a list of dataframe but all i want is a single dataframe.

Please help!

here is the sample input and output dataframe:
enter image description here

edited Jan 3 at 10:02

asked Jan 3 at 8:36

Jimmy Maguel

117

can you provide a sample output of how you expect your dataframe to look like? Also are you attempting to create a new dataframe or expanding on an existing one?

– Assaf Mendelson
Jan 3 at 8:44

I am trying to add new column to an existing dataframe, just need a way to add list of string to that new column i added

– Jimmy Maguel
Jan 3 at 8:48

Hi Jimmy, what I understood is you would like to add the list of values to existing DataFrame in a separate column? Is it correct?

– neeraj bhadani
Jan 3 at 8:58

@neerajbhadani, thatz exactly what I want, can u plz help

– Jimmy Maguel
Jan 3 at 9:01

do you want to have a new column per value? Or just a single column with 3 rows?

– AKSW
Jan 3 at 9:09

|
show 13 more comments

I have a dataframe, I have a list of values (possibly list string) and I want to create a new column in my dataframe and add those list values as column values to this new column. I tried

val x = List("def", "cook", "abc")

val c_df = null

x.foldLeft(c_df)((df, column) => df.withColumn("newcolumnname" , lit(column)))

but it throws StackOverflow exception, I also tried iterating over list of string values and adding to dataframe but result value is a list of dataframe but all i want is a single dataframe.

Please help!

here is the sample input and output dataframe:
enter image description here

edited Jan 3 at 10:02

asked Jan 3 at 8:36

Jimmy Maguel

117

I have a dataframe, I have a list of values (possibly list string) and I want to create a new column in my dataframe and add those list values as column values to this new column. I tried

val x = List("def", "cook", "abc")

val c_df = null

x.foldLeft(c_df)((df, column) => df.withColumn("newcolumnname" , lit(column)))

but it throws StackOverflow exception, I also tried iterating over list of string values and adding to dataframe but result value is a list of dataframe but all i want is a single dataframe.

Please help!

here is the sample input and output dataframe:
enter image description here

scala apache-spark

edited Jan 3 at 10:02

asked Jan 3 at 8:36

Jimmy Maguel

117

edited Jan 3 at 10:02

asked Jan 3 at 8:36

Jimmy Maguel

117

edited Jan 3 at 10:02

asked Jan 3 at 8:36

Jimmy Maguel

117

asked Jan 3 at 8:36

Jimmy Maguel

117

asked Jan 3 at 8:36

Jimmy Maguel

117

can you provide a sample output of how you expect your dataframe to look like? Also are you attempting to create a new dataframe or expanding on an existing one?

– Assaf Mendelson
Jan 3 at 8:44

I am trying to add new column to an existing dataframe, just need a way to add list of string to that new column i added

– Jimmy Maguel
Jan 3 at 8:48

Hi Jimmy, what I understood is you would like to add the list of values to existing DataFrame in a separate column? Is it correct?

– neeraj bhadani
Jan 3 at 8:58

@neerajbhadani, thatz exactly what I want, can u plz help

– Jimmy Maguel
Jan 3 at 9:01

do you want to have a new column per value? Or just a single column with 3 rows?

– AKSW
Jan 3 at 9:09

|
show 13 more comments

can you provide a sample output of how you expect your dataframe to look like? Also are you attempting to create a new dataframe or expanding on an existing one?

– Assaf Mendelson
Jan 3 at 8:44

I am trying to add new column to an existing dataframe, just need a way to add list of string to that new column i added

– Jimmy Maguel
Jan 3 at 8:48

Hi Jimmy, what I understood is you would like to add the list of values to existing DataFrame in a separate column? Is it correct?

– neeraj bhadani
Jan 3 at 8:58

@neerajbhadani, thatz exactly what I want, can u plz help

– Jimmy Maguel
Jan 3 at 9:01

do you want to have a new column per value? Or just a single column with 3 rows?

– AKSW
Jan 3 at 9:09

can you provide a sample output of how you expect your dataframe to look like? Also are you attempting to create a new dataframe or expanding on an existing one?

– Assaf Mendelson
Jan 3 at 8:44

I am trying to add new column to an existing dataframe, just need a way to add list of string to that new column i added

– Jimmy Maguel
Jan 3 at 8:48

Hi Jimmy, what I understood is you would like to add the list of values to existing DataFrame in a separate column? Is it correct?

– neeraj bhadani
Jan 3 at 8:58

@neerajbhadani, thatz exactly what I want, can u plz help

– Jimmy Maguel
Jan 3 at 9:01

do you want to have a new column per value? Or just a single column with 3 rows?

– AKSW
Jan 3 at 9:09

|
show 13 more comments

2 Answers
2

active

oldest

votes

You can try below code.

Create First DataFrame with Index.

from pyspark.sql.functions import *

from pyspark.sql import Window

w = Window.orderBy("Col2")

df = spark.createDataFrame([("a", 10), ("b", 20), ("c",  30)], ["Col1", "Col2"])

df1 = df.withColumn("index", row_number().over(w))

df1.show()

Create Another DataFrame from List of Values.

from pyspark.sql.types import * newdf = spark.createDataFrame(['x','y', 'z'], StringType()) newdf.show()

Add Index column to DF created from List of values in step 2.

 w = Window.orderBy("value")

df2 = newdf.withColumn("index", row_number().over(w))

df2.show()

Join the DataFrame df1 and df2 based on index.

df1.join(df2, "index").show()

answered Jan 3 at 9:42

neeraj bhadani

935313

1

its in python, do u have scala version plz

– Jimmy Maguel
Jan 3 at 9:44

add a comment |

There is a function array in Spark 1.4 or later that takes an array of Columns and returns a new Column. Function lit takes a Scala value and returns a Column type.

import spark.implicits._

val df = Seq(1, 2, 3).toDF("col1")

df.withColumn("new_col", array(lit("def"), lit("cook"), lit("abc"))).show



+----+----------------+

|col1|         new_col|

+----+----------------+

|   1|[def, cook, abc]|

|   2|[def, cook, abc]|

|   3|[def, cook, abc]|

+----+----------------+

In Spark 2.2.0, there is a function typedLit that takes Scala types and returns a Column type. this function can handle parameterized scala types e.g.: List, Seq and Map.

val newDF = df.withColumn("new_col", typedLit(List("def", "cook", "abc")))

newDF.show()

newDF.printSchema()



+----+----------------+

|col1|         new_col|

+----+----------------+

|   1|[def, cook, abc]|

|   2|[def, cook, abc]|

|   3|[def, cook, abc]|

+----+----------------+



root

 |-- col1: integer (nullable = false)

 |-- new_col: array (nullable = false)

 |    |-- element: string (containsNull = true)

This is what you wanted to do ? You can add when to conditionally add different set of lists to each row.

answered Jan 3 at 10:18

ryandam

1346

instead of duplicating array, i wanted in col 1 value as def, in col2 cook and in col3 abc only

– Jimmy Maguel
Jan 3 at 10:28

in new_col data is getting duplicated in ur answer

– Jimmy Maguel
Jan 3 at 11:17

Adding to @ryandam answer: df.withColumn("new_col", typedLit(List("def", "cook", "abc"))).withColumn("col1", new_col(0)).withColumn("col2", new_col(1)).withColumn("col3", new_col(2)).drop("new_col")

– Mohd Avais
Jan 4 at 10:59

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54018795%2fhow-to-add-liststring-values-to-a-single-column-in-dataframe%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

You can try below code.

Create First DataFrame with Index.

from pyspark.sql.functions import *

from pyspark.sql import Window

w = Window.orderBy("Col2")

df = spark.createDataFrame([("a", 10), ("b", 20), ("c",  30)], ["Col1", "Col2"])

df1 = df.withColumn("index", row_number().over(w))

df1.show()

Create Another DataFrame from List of Values.

from pyspark.sql.types import * newdf = spark.createDataFrame(['x','y', 'z'], StringType()) newdf.show()

Add Index column to DF created from List of values in step 2.

 w = Window.orderBy("value")

df2 = newdf.withColumn("index", row_number().over(w))

df2.show()

Join the DataFrame df1 and df2 based on index.

df1.join(df2, "index").show()

answered Jan 3 at 9:42

neeraj bhadani

935313

1

its in python, do u have scala version plz

– Jimmy Maguel
Jan 3 at 9:44

add a comment |

You can try below code.

Create First DataFrame with Index.

from pyspark.sql.functions import *

from pyspark.sql import Window

w = Window.orderBy("Col2")

df = spark.createDataFrame([("a", 10), ("b", 20), ("c",  30)], ["Col1", "Col2"])

df1 = df.withColumn("index", row_number().over(w))

df1.show()

Create Another DataFrame from List of Values.

from pyspark.sql.types import * newdf = spark.createDataFrame(['x','y', 'z'], StringType()) newdf.show()

Add Index column to DF created from List of values in step 2.

 w = Window.orderBy("value")

df2 = newdf.withColumn("index", row_number().over(w))

df2.show()

Join the DataFrame df1 and df2 based on index.

df1.join(df2, "index").show()

answered Jan 3 at 9:42

neeraj bhadani

935313

1

its in python, do u have scala version plz

– Jimmy Maguel
Jan 3 at 9:44

add a comment |

You can try below code.

Create First DataFrame with Index.

from pyspark.sql.functions import *

from pyspark.sql import Window

w = Window.orderBy("Col2")

df = spark.createDataFrame([("a", 10), ("b", 20), ("c",  30)], ["Col1", "Col2"])

df1 = df.withColumn("index", row_number().over(w))

df1.show()

Create Another DataFrame from List of Values.

from pyspark.sql.types import * newdf = spark.createDataFrame(['x','y', 'z'], StringType()) newdf.show()

Add Index column to DF created from List of values in step 2.

 w = Window.orderBy("value")

df2 = newdf.withColumn("index", row_number().over(w))

df2.show()

Join the DataFrame df1 and df2 based on index.

df1.join(df2, "index").show()

answered Jan 3 at 9:42

neeraj bhadani

935313

You can try below code.

Create First DataFrame with Index.

from pyspark.sql.functions import *

from pyspark.sql import Window

w = Window.orderBy("Col2")

df = spark.createDataFrame([("a", 10), ("b", 20), ("c",  30)], ["Col1", "Col2"])

df1 = df.withColumn("index", row_number().over(w))

df1.show()

Create Another DataFrame from List of Values.

from pyspark.sql.types import * newdf = spark.createDataFrame(['x','y', 'z'], StringType()) newdf.show()

Add Index column to DF created from List of values in step 2.

 w = Window.orderBy("value")

df2 = newdf.withColumn("index", row_number().over(w))

df2.show()

Join the DataFrame df1 and df2 based on index.

df1.join(df2, "index").show()

answered Jan 3 at 9:42

neeraj bhadani

935313

answered Jan 3 at 9:42

neeraj bhadani

935313

answered Jan 3 at 9:42

neeraj bhadani

935313

answered Jan 3 at 9:42

neeraj bhadani

935313

1

its in python, do u have scala version plz

– Jimmy Maguel
Jan 3 at 9:44

add a comment |

1

its in python, do u have scala version plz

– Jimmy Maguel
Jan 3 at 9:44

its in python, do u have scala version plz

– Jimmy Maguel
Jan 3 at 9:44

add a comment |

There is a function array in Spark 1.4 or later that takes an array of Columns and returns a new Column. Function lit takes a Scala value and returns a Column type.

import spark.implicits._

val df = Seq(1, 2, 3).toDF("col1")

df.withColumn("new_col", array(lit("def"), lit("cook"), lit("abc"))).show



+----+----------------+

|col1|         new_col|

+----+----------------+

|   1|[def, cook, abc]|

|   2|[def, cook, abc]|

|   3|[def, cook, abc]|

+----+----------------+

In Spark 2.2.0, there is a function typedLit that takes Scala types and returns a Column type. this function can handle parameterized scala types e.g.: List, Seq and Map.

val newDF = df.withColumn("new_col", typedLit(List("def", "cook", "abc")))

newDF.show()

newDF.printSchema()



+----+----------------+

|col1|         new_col|

+----+----------------+

|   1|[def, cook, abc]|

|   2|[def, cook, abc]|

|   3|[def, cook, abc]|

+----+----------------+



root

 |-- col1: integer (nullable = false)

 |-- new_col: array (nullable = false)

 |    |-- element: string (containsNull = true)

This is what you wanted to do ? You can add when to conditionally add different set of lists to each row.

answered Jan 3 at 10:18

ryandam

1346

instead of duplicating array, i wanted in col 1 value as def, in col2 cook and in col3 abc only

– Jimmy Maguel
Jan 3 at 10:28

in new_col data is getting duplicated in ur answer

– Jimmy Maguel
Jan 3 at 11:17

Adding to @ryandam answer: df.withColumn("new_col", typedLit(List("def", "cook", "abc"))).withColumn("col1", new_col(0)).withColumn("col2", new_col(1)).withColumn("col3", new_col(2)).drop("new_col")

– Mohd Avais
Jan 4 at 10:59

add a comment |

There is a function array in Spark 1.4 or later that takes an array of Columns and returns a new Column. Function lit takes a Scala value and returns a Column type.

import spark.implicits._

val df = Seq(1, 2, 3).toDF("col1")

df.withColumn("new_col", array(lit("def"), lit("cook"), lit("abc"))).show



+----+----------------+

|col1|         new_col|

+----+----------------+

|   1|[def, cook, abc]|

|   2|[def, cook, abc]|

|   3|[def, cook, abc]|

+----+----------------+

In Spark 2.2.0, there is a function typedLit that takes Scala types and returns a Column type. this function can handle parameterized scala types e.g.: List, Seq and Map.

val newDF = df.withColumn("new_col", typedLit(List("def", "cook", "abc")))

newDF.show()

newDF.printSchema()



+----+----------------+

|col1|         new_col|

+----+----------------+

|   1|[def, cook, abc]|

|   2|[def, cook, abc]|

|   3|[def, cook, abc]|

+----+----------------+



root

 |-- col1: integer (nullable = false)

 |-- new_col: array (nullable = false)

 |    |-- element: string (containsNull = true)

This is what you wanted to do ? You can add when to conditionally add different set of lists to each row.

answered Jan 3 at 10:18

ryandam

1346

instead of duplicating array, i wanted in col 1 value as def, in col2 cook and in col3 abc only

– Jimmy Maguel
Jan 3 at 10:28

in new_col data is getting duplicated in ur answer

– Jimmy Maguel
Jan 3 at 11:17

Adding to @ryandam answer: df.withColumn("new_col", typedLit(List("def", "cook", "abc"))).withColumn("col1", new_col(0)).withColumn("col2", new_col(1)).withColumn("col3", new_col(2)).drop("new_col")

– Mohd Avais
Jan 4 at 10:59

add a comment |

There is a function array in Spark 1.4 or later that takes an array of Columns and returns a new Column. Function lit takes a Scala value and returns a Column type.

import spark.implicits._

val df = Seq(1, 2, 3).toDF("col1")

df.withColumn("new_col", array(lit("def"), lit("cook"), lit("abc"))).show



+----+----------------+

|col1|         new_col|

+----+----------------+

|   1|[def, cook, abc]|

|   2|[def, cook, abc]|

|   3|[def, cook, abc]|

+----+----------------+

In Spark 2.2.0, there is a function typedLit that takes Scala types and returns a Column type. this function can handle parameterized scala types e.g.: List, Seq and Map.

val newDF = df.withColumn("new_col", typedLit(List("def", "cook", "abc")))

newDF.show()

newDF.printSchema()



+----+----------------+

|col1|         new_col|

+----+----------------+

|   1|[def, cook, abc]|

|   2|[def, cook, abc]|

|   3|[def, cook, abc]|

+----+----------------+



root

 |-- col1: integer (nullable = false)

 |-- new_col: array (nullable = false)

 |    |-- element: string (containsNull = true)

This is what you wanted to do ? You can add when to conditionally add different set of lists to each row.

answered Jan 3 at 10:18

ryandam

1346

There is a function array in Spark 1.4 or later that takes an array of Columns and returns a new Column. Function lit takes a Scala value and returns a Column type.

import spark.implicits._

val df = Seq(1, 2, 3).toDF("col1")

df.withColumn("new_col", array(lit("def"), lit("cook"), lit("abc"))).show



+----+----------------+

|col1|         new_col|

+----+----------------+

|   1|[def, cook, abc]|

|   2|[def, cook, abc]|

|   3|[def, cook, abc]|

+----+----------------+

In Spark 2.2.0, there is a function typedLit that takes Scala types and returns a Column type. this function can handle parameterized scala types e.g.: List, Seq and Map.

val newDF = df.withColumn("new_col", typedLit(List("def", "cook", "abc")))

newDF.show()

newDF.printSchema()



+----+----------------+

|col1|         new_col|

+----+----------------+

|   1|[def, cook, abc]|

|   2|[def, cook, abc]|

|   3|[def, cook, abc]|

+----+----------------+



root

 |-- col1: integer (nullable = false)

 |-- new_col: array (nullable = false)

 |    |-- element: string (containsNull = true)

This is what you wanted to do ? You can add when to conditionally add different set of lists to each row.

answered Jan 3 at 10:18

ryandam

1346

answered Jan 3 at 10:18

ryandam

1346

answered Jan 3 at 10:18

ryandam

1346

answered Jan 3 at 10:18

ryandam

1346

instead of duplicating array, i wanted in col 1 value as def, in col2 cook and in col3 abc only

– Jimmy Maguel
Jan 3 at 10:28

in new_col data is getting duplicated in ur answer

– Jimmy Maguel
Jan 3 at 11:17

Adding to @ryandam answer: df.withColumn("new_col", typedLit(List("def", "cook", "abc"))).withColumn("col1", new_col(0)).withColumn("col2", new_col(1)).withColumn("col3", new_col(2)).drop("new_col")

– Mohd Avais
Jan 4 at 10:59

add a comment |

instead of duplicating array, i wanted in col 1 value as def, in col2 cook and in col3 abc only

– Jimmy Maguel
Jan 3 at 10:28

in new_col data is getting duplicated in ur answer

– Jimmy Maguel
Jan 3 at 11:17

Adding to @ryandam answer: df.withColumn("new_col", typedLit(List("def", "cook", "abc"))).withColumn("col1", new_col(0)).withColumn("col2", new_col(1)).withColumn("col3", new_col(2)).drop("new_col")

– Mohd Avais
Jan 4 at 10:59

instead of duplicating array, i wanted in col 1 value as def, in col2 cook and in col3 abc only

– Jimmy Maguel
Jan 3 at 10:28

in new_col data is getting duplicated in ur answer

– Jimmy Maguel
Jan 3 at 11:17

Adding to @ryandam answer: df.withColumn("new_col", typedLit(List("def", "cook", "abc"))).withColumn("col1", new_col(0)).withColumn("col2", new_col(1)).withColumn("col3", new_col(2)).drop("new_col")

– Mohd Avais
Jan 4 at 10:59

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu