How do I unnest (explode) a column in a pandas DataFrame?

I have the following DataFrame where one of the columns is an object (list type cell):

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[1,2]]})

df

Out[458]: 

   A       B

0  1  [1, 2]

1  2  [1, 2]

My expected output is:

What should I do to achieve this?

Related question

pandas: When cell contents are lists, create a row for each element in the list

Good question and answer but only handle one column with list(In my answer the self-def function will work for multiple columns, also the accepted answer is use the most time consuming apply , which is not recommended, check more info When should I ever want to use pandas apply() in my code?)

edited Feb 2 at 0:58

asked Nov 9 '18 at 2:19

Wen-Ben

111k83266

3

Related, unnesting strings: stackoverflow.com/q/48197234/4909087

– coldspeed
Nov 12 '18 at 12:00

1

Couple of related posts: here, here, here, here, ...

– Cleb
Jan 4 at 12:43

This question is asked and answered 4 years after the original question and serves no purpose other then to divide possible answers to several questions instead of giving users a single place to find the solution.

– ImportanceOfBeingErnest
Feb 1 at 16:45

add a comment |

I have the following DataFrame where one of the columns is an object (list type cell):

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[1,2]]})

df

Out[458]: 

   A       B

0  1  [1, 2]

1  2  [1, 2]

My expected output is:

What should I do to achieve this?

Related question

pandas: When cell contents are lists, create a row for each element in the list

edited Feb 2 at 0:58

asked Nov 9 '18 at 2:19

Wen-Ben

111k83266

3

Related, unnesting strings: stackoverflow.com/q/48197234/4909087

– coldspeed
Nov 12 '18 at 12:00

1

Couple of related posts: here, here, here, here, ...

– Cleb
Jan 4 at 12:43

This question is asked and answered 4 years after the original question and serves no purpose other then to divide possible answers to several questions instead of giving users a single place to find the solution.

– ImportanceOfBeingErnest
Feb 1 at 16:45

add a comment |

I have the following DataFrame where one of the columns is an object (list type cell):

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[1,2]]})

df

Out[458]: 

   A       B

0  1  [1, 2]

1  2  [1, 2]

My expected output is:

What should I do to achieve this?

Related question

pandas: When cell contents are lists, create a row for each element in the list

edited Feb 2 at 0:58

asked Nov 9 '18 at 2:19

Wen-Ben

111k83266

I have the following DataFrame where one of the columns is an object (list type cell):

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[1,2]]})

df

Out[458]: 

   A       B

0  1  [1, 2]

1  2  [1, 2]

My expected output is:

What should I do to achieve this?

Related question

pandas: When cell contents are lists, create a row for each element in the list

python pandas dataframe

edited Feb 2 at 0:58

asked Nov 9 '18 at 2:19

Wen-Ben

111k83266

edited Feb 2 at 0:58

asked Nov 9 '18 at 2:19

Wen-Ben

111k83266

edited Feb 2 at 0:58

asked Nov 9 '18 at 2:19

Wen-Ben

111k83266

asked Nov 9 '18 at 2:19

Wen-Ben

111k83266

asked Nov 9 '18 at 2:19

Wen-Ben

111k83266

3

Related, unnesting strings: stackoverflow.com/q/48197234/4909087

– coldspeed
Nov 12 '18 at 12:00

1

Couple of related posts: here, here, here, here, ...

– Cleb
Jan 4 at 12:43

This question is asked and answered 4 years after the original question and serves no purpose other then to divide possible answers to several questions instead of giving users a single place to find the solution.

– ImportanceOfBeingErnest
Feb 1 at 16:45

add a comment |

3

Related, unnesting strings: stackoverflow.com/q/48197234/4909087

– coldspeed
Nov 12 '18 at 12:00

1

Couple of related posts: here, here, here, here, ...

– Cleb
Jan 4 at 12:43

This question is asked and answered 4 years after the original question and serves no purpose other then to divide possible answers to several questions instead of giving users a single place to find the solution.

– ImportanceOfBeingErnest
Feb 1 at 16:45

Related, unnesting strings: stackoverflow.com/q/48197234/4909087

– coldspeed
Nov 12 '18 at 12:00

Couple of related posts: here, here, here, here, ...

– Cleb
Jan 4 at 12:43

This question is asked and answered 4 years after the original question and serves no purpose other then to divide possible answers to several questions instead of giving users a single place to find the solution.

– ImportanceOfBeingErnest
Feb 1 at 16:45

add a comment |

5 Answers
5

active

oldest

votes

+100

As an user with both R and python, I have seen this type of question a couple of times.

In R, they have the built-in function from package tidyr called unnest. But in Python(pandas) there is no built-in function for this type of question.

I know object columns type always make the data hard to convert with a pandas' function. When I received the data like this , the first thing that came to mind was to 'flatten' or unnest the columns .

Method 1
apply + pd.Series (easy to understand but in terms of performance not recommended . )

df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0:'B'})

Out[463]: 

   A  B

0  1  1

1  1  2

0  2  1

1  2  2

Method 2 using repeat with DataFrame constructor , re-create your dataframe (good at performance, not good at multiple columns )

df=pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})

df

Out[465]: 

   A  B

0  1  1

0  1  2

1  2  1

1  2  2

Method 2.1 for example besides A we have A.1 .....A.n. If we still use the method(Method 2) above it is hard for us to re-create the columns one by one .

Solution : join or merge with the index after 'unnest' the single columns

s=pd.DataFrame({'B':np.concatenate(df.B.values)},index=df.index.repeat(df.B.str.len()))

s.join(df.drop('B',1),how='left')

Out[477]: 

   B  A

0  1  1

0  2  1

1  1  2

1  2  2

If you need the column order exactly the same as before, add reindex at the end.

s.join(df.drop('B',1),how='left').reindex(columns=df.columns)

Method 3 recreate the list

pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)

Out[488]: 

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

If more than two columns

s=pd.DataFrame([[x] + [z] for x, y in zip(df.index,df.B) for z in y])

s.merge(df,left_on=0,right_index=True)

Out[491]: 

   0  1  A       B

0  0  1  1  [1, 2]

1  0  2  1  [1, 2]

2  1  1  2  [1, 2]

3  1  2  2  [1, 2]

Method 4 using reindex or loc

df.reindex(df.index.repeat(df.B.str.len())).assign(B=np.concatenate(df.B.values))

Out[554]: 

   A  B

0  1  1

0  1  2

1  2  1

1  2  2



#df.loc[df.index.repeat(df.B.str.len())].assign(B=np.concatenate(df.B.values))

Method 5 when the list only contains unique values:

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]]})

from collections import ChainMap

d = dict(ChainMap(*map(dict.fromkeys, df['B'], df['A'])))

pd.DataFrame(list(d.items()),columns=df.columns[::-1])

Out[574]: 

   B  A

0  1  1

1  2  1

2  3  2

3  4  2

Method 6 using numpy for high performance:

newvalues=np.dstack((np.repeat(df.A.values,list(map(len,df.B.values))),np.concatenate(df.B.values)))

pd.DataFrame(data=newvalues[0],columns=df.columns)

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

Method 7 : using base function itertools cycle and chain: Pure python solution just for fun

from itertools import cycle,chain

l=df.values.tolist()

l1=[list(zip([x[0]], cycle(x[1])) if len([x[0]]) > len(x[1]) else list(zip(cycle([x[0]]), x[1]))) for x in l]

pd.DataFrame(list(chain.from_iterable(l1)),columns=df.columns)

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

Special case (two columns type object)

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]],'C':[[1,2],[3,4]]})

df

Out[592]: 

   A       B       C

0  1  [1, 2]  [1, 2]

1  2  [3, 4]  [3, 4]

Self-def function

def unnesting(df, explode):

    idx=df.index.repeat(df[explode[0]].str.len())

    df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)

    df1.index=idx

    return df1.join(df.drop(explode,1),how='left')



unnesting(df,['B','C'])

Out[609]: 

   B  C  A

0  1  1  1

0  2  2  1

1  3  3  2

1  4  4  2

Summary :

I am using pandas and python functions for this type of question. If you are worried about the speed of the above solutions, check user3483203's answer , since he is using numpy and most of the time numpy is faster . I recommend Cpython and numba if speed matters in your case.

edited Dec 26 '18 at 15:24

Jack Moody

640621

answered Nov 9 '18 at 2:20

Wen-Ben

111k83266

add a comment |

+100

Option 1

If all of the sublists in the other column are the same length, numpy can be an efficient option here:

vals = np.array(df.B.values.tolist())    

a = np.repeat(df.A, vals.shape[1])



pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)

Option 2

If the sublists have different length, you need an additional step:

vals = df.B.values.tolist()

rs = [len(r) for r in vals]    

a = np.repeat(df.A, rs)



pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)

Option 3

I took a shot at generalizing this to work to flatten N columns and tile M columns, I'll work later on making it more efficient:

df = pd.DataFrame({'A': [1,2,3], 'B': [[1,2], [1,2,3], [1]],

                   'C': [[1,2,3], [1,2], [1,2]], 'D': ['A', 'B', 'C']})

   A          B          C  D

0  1     [1, 2]  [1, 2, 3]  A

1  2  [1, 2, 3]     [1, 2]  B

2  3        [1]     [1, 2]  C

def unnest(df, tile, explode):

    vals = df[explode].sum(1)

    rs = [len(r) for r in vals]

    a = np.repeat(df[tile].values, rs, axis=0)

    b = np.concatenate(vals.values)

    d = np.column_stack((a, b))

    return pd.DataFrame(d, columns = tile +  ['_'.join(explode)])



unnest(df, ['A', 'D'], ['B', 'C'])

    A  D B_C

0   1  A   1

1   1  A   2

2   1  A   1

3   1  A   2

4   1  A   3

5   2  B   1

6   2  B   2

7   2  B   3

8   2  B   1

9   2  B   2

10  3  C   1

11  3  C   1

12  3  C   2

Functions

def wen1(df):

    return df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0: 'B'})



def wen2(df):

    return pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})



def wen3(df):

    s = pd.DataFrame({'B': np.concatenate(df.B.values)}, index=df.index.repeat(df.B.str.len()))

    return s.join(df.drop('B', 1), how='left')



def wen4(df):

    return pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)



def chris1(df):

    vals = np.array(df.B.values.tolist())

    a = np.repeat(df.A, vals.shape[1])

    return pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)



def chris2(df):

    vals = df.B.values.tolist()

    rs = [len(r) for r in vals]

    a = np.repeat(df.A.values, rs)

    return pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)

Timings

import pandas as pd

import matplotlib.pyplot as plt

import numpy as np

from timeit import timeit



res = pd.DataFrame(

       index=['wen1', 'wen2', 'wen3', 'wen4', 'chris1', 'chris2'],

       columns=[10, 50, 100, 500, 1000, 5000, 10000],

       dtype=float

)



for f in res.index:

    for c in res.columns:

        df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})

        df = pd.concat([df]*c)

        stmt = '{}(df)'.format(f)

        setp = 'from __main__ import df, {}'.format(f)

        res.at[f, c] = timeit(stmt, setp, number=50)



ax = res.div(res.min()).T.plot(loglog=True)

ax.set_xlabel("N")

ax.set_ylabel("time (relative)")

Performance

enter image description here

edited Nov 9 '18 at 4:15

answered Nov 9 '18 at 2:35

user3483203

31.2k82655

add a comment |

One alternative is to apply the meshgrid recipe over the rows of the columns to unnest:

import numpy as np

import pandas as pd





def unnest(frame, explode):

    def mesh(values):

        return np.array(np.meshgrid(*values)).T.reshape(-1, len(values))



    data = np.vstack(mesh(row) for row in frame[explode].values)

    return pd.DataFrame(data=data, columns=explode)





df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})

print(unnest(df, ['A', 'B']))  # base

print()



df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [3, 4]], 'C': [[1, 2], [3, 4]]})

print(unnest(df, ['A', 'B', 'C']))  # multiple columns

print()



df = pd.DataFrame({'A': [1, 2, 3], 'B': [[1, 2], [1, 2, 3], [1]],

                   'C': [[1, 2, 3], [1, 2], [1, 2]], 'D': ['A', 'B', 'C']})



print(unnest(df, ['A', 'B']))  # uneven length lists

print()

print(unnest(df, ['D', 'B']))  # different types

print()

Output

answered Dec 1 '18 at 1:31

Daniel Mesejo

18.4k21432

add a comment |

My 5 cents:

df[['B', 'B2']] = pd.DataFrame(df['B'].values.tolist())



df[['A', 'B']].append(df[['A', 'B2']].rename(columns={'B2': 'B'}),

                      ignore_index=True)

and another 5

df[['B1', 'B2']] = pd.DataFrame([*df['B']]) # if values.tolist() is too boring



(pd.wide_to_long(df.drop('B', 1), 'B', 'A', '')

 .reset_index(level=1, drop=True)

 .reset_index())

both resulting in the same

edited Dec 11 '18 at 3:50

answered Dec 11 '18 at 2:05

ayorgo

1,130714

add a comment |

-1

Something pretty not recommended (at least work in this case):

df=pd.concat([df]*2).sort_index()

it=iter(df['B'].tolist()[0]+df['B'].tolist()[0])

df['B']=df['B'].apply(lambda x:next(it))

concat + sort_index + iter + apply + next.

Now:

print(df)

Is:

If care about index:

df=df.reset_index(drop=True)

Now:

print(df)

Is:

answered Nov 9 '18 at 2:40

U9-Forward

15.3k41439

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53218931%2fhow-do-i-unnest-explode-a-column-in-a-pandas-dataframe%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

5 Answers
5

active

oldest

votes

5 Answers
5

active

oldest

votes

+100

As an user with both R and python, I have seen this type of question a couple of times.

In R, they have the built-in function from package tidyr called unnest. But in Python(pandas) there is no built-in function for this type of question.

Method 1
apply + pd.Series (easy to understand but in terms of performance not recommended . )

df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0:'B'})

Out[463]: 

   A  B

0  1  1

1  1  2

0  2  1

1  2  2

Method 2 using repeat with DataFrame constructor , re-create your dataframe (good at performance, not good at multiple columns )

df=pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})

df

Out[465]: 

   A  B

0  1  1

0  1  2

1  2  1

1  2  2

Method 2.1 for example besides A we have A.1 .....A.n. If we still use the method(Method 2) above it is hard for us to re-create the columns one by one .

Solution : join or merge with the index after 'unnest' the single columns

s=pd.DataFrame({'B':np.concatenate(df.B.values)},index=df.index.repeat(df.B.str.len()))

s.join(df.drop('B',1),how='left')

Out[477]: 

   B  A

0  1  1

0  2  1

1  1  2

1  2  2

If you need the column order exactly the same as before, add reindex at the end.

s.join(df.drop('B',1),how='left').reindex(columns=df.columns)

Method 3 recreate the list

pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)

Out[488]: 

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

If more than two columns

s=pd.DataFrame([[x] + [z] for x, y in zip(df.index,df.B) for z in y])

s.merge(df,left_on=0,right_index=True)

Out[491]: 

   0  1  A       B

0  0  1  1  [1, 2]

1  0  2  1  [1, 2]

2  1  1  2  [1, 2]

3  1  2  2  [1, 2]

Method 4 using reindex or loc

df.reindex(df.index.repeat(df.B.str.len())).assign(B=np.concatenate(df.B.values))

Out[554]: 

   A  B

0  1  1

0  1  2

1  2  1

1  2  2



#df.loc[df.index.repeat(df.B.str.len())].assign(B=np.concatenate(df.B.values))

Method 5 when the list only contains unique values:

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]]})

from collections import ChainMap

d = dict(ChainMap(*map(dict.fromkeys, df['B'], df['A'])))

pd.DataFrame(list(d.items()),columns=df.columns[::-1])

Out[574]: 

   B  A

0  1  1

1  2  1

2  3  2

3  4  2

Method 6 using numpy for high performance:

newvalues=np.dstack((np.repeat(df.A.values,list(map(len,df.B.values))),np.concatenate(df.B.values)))

pd.DataFrame(data=newvalues[0],columns=df.columns)

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

Method 7 : using base function itertools cycle and chain: Pure python solution just for fun

from itertools import cycle,chain

l=df.values.tolist()

l1=[list(zip([x[0]], cycle(x[1])) if len([x[0]]) > len(x[1]) else list(zip(cycle([x[0]]), x[1]))) for x in l]

pd.DataFrame(list(chain.from_iterable(l1)),columns=df.columns)

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

Special case (two columns type object)

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]],'C':[[1,2],[3,4]]})

df

Out[592]: 

   A       B       C

0  1  [1, 2]  [1, 2]

1  2  [3, 4]  [3, 4]

Self-def function

def unnesting(df, explode):

    idx=df.index.repeat(df[explode[0]].str.len())

    df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)

    df1.index=idx

    return df1.join(df.drop(explode,1),how='left')



unnesting(df,['B','C'])

Out[609]: 

   B  C  A

0  1  1  1

0  2  2  1

1  3  3  2

1  4  4  2

Summary :

edited Dec 26 '18 at 15:24

Jack Moody

640621

answered Nov 9 '18 at 2:20

Wen-Ben

111k83266

add a comment |

+100

As an user with both R and python, I have seen this type of question a couple of times.

In R, they have the built-in function from package tidyr called unnest. But in Python(pandas) there is no built-in function for this type of question.

Method 1
apply + pd.Series (easy to understand but in terms of performance not recommended . )

df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0:'B'})

Out[463]: 

   A  B

0  1  1

1  1  2

0  2  1

1  2  2

Method 2 using repeat with DataFrame constructor , re-create your dataframe (good at performance, not good at multiple columns )

df=pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})

df

Out[465]: 

   A  B

0  1  1

0  1  2

1  2  1

1  2  2

Method 2.1 for example besides A we have A.1 .....A.n. If we still use the method(Method 2) above it is hard for us to re-create the columns one by one .

Solution : join or merge with the index after 'unnest' the single columns

s=pd.DataFrame({'B':np.concatenate(df.B.values)},index=df.index.repeat(df.B.str.len()))

s.join(df.drop('B',1),how='left')

Out[477]: 

   B  A

0  1  1

0  2  1

1  1  2

1  2  2

If you need the column order exactly the same as before, add reindex at the end.

s.join(df.drop('B',1),how='left').reindex(columns=df.columns)

Method 3 recreate the list

pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)

Out[488]: 

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

If more than two columns

s=pd.DataFrame([[x] + [z] for x, y in zip(df.index,df.B) for z in y])

s.merge(df,left_on=0,right_index=True)

Out[491]: 

   0  1  A       B

0  0  1  1  [1, 2]

1  0  2  1  [1, 2]

2  1  1  2  [1, 2]

3  1  2  2  [1, 2]

Method 4 using reindex or loc

df.reindex(df.index.repeat(df.B.str.len())).assign(B=np.concatenate(df.B.values))

Out[554]: 

   A  B

0  1  1

0  1  2

1  2  1

1  2  2



#df.loc[df.index.repeat(df.B.str.len())].assign(B=np.concatenate(df.B.values))

Method 5 when the list only contains unique values:

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]]})

from collections import ChainMap

d = dict(ChainMap(*map(dict.fromkeys, df['B'], df['A'])))

pd.DataFrame(list(d.items()),columns=df.columns[::-1])

Out[574]: 

   B  A

0  1  1

1  2  1

2  3  2

3  4  2

Method 6 using numpy for high performance:

newvalues=np.dstack((np.repeat(df.A.values,list(map(len,df.B.values))),np.concatenate(df.B.values)))

pd.DataFrame(data=newvalues[0],columns=df.columns)

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

Method 7 : using base function itertools cycle and chain: Pure python solution just for fun

from itertools import cycle,chain

l=df.values.tolist()

l1=[list(zip([x[0]], cycle(x[1])) if len([x[0]]) > len(x[1]) else list(zip(cycle([x[0]]), x[1]))) for x in l]

pd.DataFrame(list(chain.from_iterable(l1)),columns=df.columns)

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

Special case (two columns type object)

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]],'C':[[1,2],[3,4]]})

df

Out[592]: 

   A       B       C

0  1  [1, 2]  [1, 2]

1  2  [3, 4]  [3, 4]

Self-def function

def unnesting(df, explode):

    idx=df.index.repeat(df[explode[0]].str.len())

    df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)

    df1.index=idx

    return df1.join(df.drop(explode,1),how='left')



unnesting(df,['B','C'])

Out[609]: 

   B  C  A

0  1  1  1

0  2  2  1

1  3  3  2

1  4  4  2

Summary :

edited Dec 26 '18 at 15:24

Jack Moody

640621

answered Nov 9 '18 at 2:20

Wen-Ben

111k83266

add a comment |

+100

As an user with both R and python, I have seen this type of question a couple of times.

In R, they have the built-in function from package tidyr called unnest. But in Python(pandas) there is no built-in function for this type of question.

Method 1
apply + pd.Series (easy to understand but in terms of performance not recommended . )

df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0:'B'})

Out[463]: 

   A  B

0  1  1

1  1  2

0  2  1

1  2  2

Method 2 using repeat with DataFrame constructor , re-create your dataframe (good at performance, not good at multiple columns )

df=pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})

df

Out[465]: 

   A  B

0  1  1

0  1  2

1  2  1

1  2  2

Method 2.1 for example besides A we have A.1 .....A.n. If we still use the method(Method 2) above it is hard for us to re-create the columns one by one .

Solution : join or merge with the index after 'unnest' the single columns

s=pd.DataFrame({'B':np.concatenate(df.B.values)},index=df.index.repeat(df.B.str.len()))

s.join(df.drop('B',1),how='left')

Out[477]: 

   B  A

0  1  1

0  2  1

1  1  2

1  2  2

If you need the column order exactly the same as before, add reindex at the end.

s.join(df.drop('B',1),how='left').reindex(columns=df.columns)

Method 3 recreate the list

pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)

Out[488]: 

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

If more than two columns

s=pd.DataFrame([[x] + [z] for x, y in zip(df.index,df.B) for z in y])

s.merge(df,left_on=0,right_index=True)

Out[491]: 

   0  1  A       B

0  0  1  1  [1, 2]

1  0  2  1  [1, 2]

2  1  1  2  [1, 2]

3  1  2  2  [1, 2]

Method 4 using reindex or loc

df.reindex(df.index.repeat(df.B.str.len())).assign(B=np.concatenate(df.B.values))

Out[554]: 

   A  B

0  1  1

0  1  2

1  2  1

1  2  2



#df.loc[df.index.repeat(df.B.str.len())].assign(B=np.concatenate(df.B.values))

Method 5 when the list only contains unique values:

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]]})

from collections import ChainMap

d = dict(ChainMap(*map(dict.fromkeys, df['B'], df['A'])))

pd.DataFrame(list(d.items()),columns=df.columns[::-1])

Out[574]: 

   B  A

0  1  1

1  2  1

2  3  2

3  4  2

Method 6 using numpy for high performance:

newvalues=np.dstack((np.repeat(df.A.values,list(map(len,df.B.values))),np.concatenate(df.B.values)))

pd.DataFrame(data=newvalues[0],columns=df.columns)

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

Method 7 : using base function itertools cycle and chain: Pure python solution just for fun

from itertools import cycle,chain

l=df.values.tolist()

l1=[list(zip([x[0]], cycle(x[1])) if len([x[0]]) > len(x[1]) else list(zip(cycle([x[0]]), x[1]))) for x in l]

pd.DataFrame(list(chain.from_iterable(l1)),columns=df.columns)

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

Special case (two columns type object)

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]],'C':[[1,2],[3,4]]})

df

Out[592]: 

   A       B       C

0  1  [1, 2]  [1, 2]

1  2  [3, 4]  [3, 4]

Self-def function

def unnesting(df, explode):

    idx=df.index.repeat(df[explode[0]].str.len())

    df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)

    df1.index=idx

    return df1.join(df.drop(explode,1),how='left')



unnesting(df,['B','C'])

Out[609]: 

   B  C  A

0  1  1  1

0  2  2  1

1  3  3  2

1  4  4  2

Summary :

edited Dec 26 '18 at 15:24

Jack Moody

640621

answered Nov 9 '18 at 2:20

Wen-Ben

111k83266

As an user with both R and python, I have seen this type of question a couple of times.

In R, they have the built-in function from package tidyr called unnest. But in Python(pandas) there is no built-in function for this type of question.

Method 1
apply + pd.Series (easy to understand but in terms of performance not recommended . )

df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0:'B'})

Out[463]: 

   A  B

0  1  1

1  1  2

0  2  1

1  2  2

Method 2 using repeat with DataFrame constructor , re-create your dataframe (good at performance, not good at multiple columns )

df=pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})

df

Out[465]: 

   A  B

0  1  1

0  1  2

1  2  1

1  2  2

Method 2.1 for example besides A we have A.1 .....A.n. If we still use the method(Method 2) above it is hard for us to re-create the columns one by one .

Solution : join or merge with the index after 'unnest' the single columns

s=pd.DataFrame({'B':np.concatenate(df.B.values)},index=df.index.repeat(df.B.str.len()))

s.join(df.drop('B',1),how='left')

Out[477]: 

   B  A

0  1  1

0  2  1

1  1  2

1  2  2

If you need the column order exactly the same as before, add reindex at the end.

s.join(df.drop('B',1),how='left').reindex(columns=df.columns)

Method 3 recreate the list

pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)

Out[488]: 

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

If more than two columns

s=pd.DataFrame([[x] + [z] for x, y in zip(df.index,df.B) for z in y])

s.merge(df,left_on=0,right_index=True)

Out[491]: 

   0  1  A       B

0  0  1  1  [1, 2]

1  0  2  1  [1, 2]

2  1  1  2  [1, 2]

3  1  2  2  [1, 2]

Method 4 using reindex or loc

df.reindex(df.index.repeat(df.B.str.len())).assign(B=np.concatenate(df.B.values))

Out[554]: 

   A  B

0  1  1

0  1  2

1  2  1

1  2  2



#df.loc[df.index.repeat(df.B.str.len())].assign(B=np.concatenate(df.B.values))

Method 5 when the list only contains unique values:

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]]})

from collections import ChainMap

d = dict(ChainMap(*map(dict.fromkeys, df['B'], df['A'])))

pd.DataFrame(list(d.items()),columns=df.columns[::-1])

Out[574]: 

   B  A

0  1  1

1  2  1

2  3  2

3  4  2

Method 6 using numpy for high performance:

newvalues=np.dstack((np.repeat(df.A.values,list(map(len,df.B.values))),np.concatenate(df.B.values)))

pd.DataFrame(data=newvalues[0],columns=df.columns)

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

Method 7 : using base function itertools cycle and chain: Pure python solution just for fun

from itertools import cycle,chain

l=df.values.tolist()

l1=[list(zip([x[0]], cycle(x[1])) if len([x[0]]) > len(x[1]) else list(zip(cycle([x[0]]), x[1]))) for x in l]

pd.DataFrame(list(chain.from_iterable(l1)),columns=df.columns)

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

Special case (two columns type object)

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]],'C':[[1,2],[3,4]]})

df

Out[592]: 

   A       B       C

0  1  [1, 2]  [1, 2]

1  2  [3, 4]  [3, 4]

Self-def function

def unnesting(df, explode):

    idx=df.index.repeat(df[explode[0]].str.len())

    df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)

    df1.index=idx

    return df1.join(df.drop(explode,1),how='left')



unnesting(df,['B','C'])

Out[609]: 

   B  C  A

0  1  1  1

0  2  2  1

1  3  3  2

1  4  4  2

Summary :

edited Dec 26 '18 at 15:24

Jack Moody

640621

answered Nov 9 '18 at 2:20

Wen-Ben

111k83266

edited Dec 26 '18 at 15:24

Jack Moody

640621

edited Dec 26 '18 at 15:24

Jack Moody

640621

edited Dec 26 '18 at 15:24

Jack Moody

640621

answered Nov 9 '18 at 2:20

Wen-Ben

111k83266

answered Nov 9 '18 at 2:20

Wen-Ben

111k83266

answered Nov 9 '18 at 2:20

Wen-Ben

111k83266

add a comment |

+100

Option 1

If all of the sublists in the other column are the same length, numpy can be an efficient option here:

vals = np.array(df.B.values.tolist())    

a = np.repeat(df.A, vals.shape[1])



pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)

Option 2

If the sublists have different length, you need an additional step:

vals = df.B.values.tolist()

rs = [len(r) for r in vals]    

a = np.repeat(df.A, rs)



pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)

Option 3

I took a shot at generalizing this to work to flatten N columns and tile M columns, I'll work later on making it more efficient:

df = pd.DataFrame({'A': [1,2,3], 'B': [[1,2], [1,2,3], [1]],

                   'C': [[1,2,3], [1,2], [1,2]], 'D': ['A', 'B', 'C']})

   A          B          C  D

0  1     [1, 2]  [1, 2, 3]  A

1  2  [1, 2, 3]     [1, 2]  B

2  3        [1]     [1, 2]  C

def unnest(df, tile, explode):

    vals = df[explode].sum(1)

    rs = [len(r) for r in vals]

    a = np.repeat(df[tile].values, rs, axis=0)

    b = np.concatenate(vals.values)

    d = np.column_stack((a, b))

    return pd.DataFrame(d, columns = tile +  ['_'.join(explode)])



unnest(df, ['A', 'D'], ['B', 'C'])

    A  D B_C

0   1  A   1

1   1  A   2

2   1  A   1

3   1  A   2

4   1  A   3

5   2  B   1

6   2  B   2

7   2  B   3

8   2  B   1

9   2  B   2

10  3  C   1

11  3  C   1

12  3  C   2

Functions

def wen1(df):

    return df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0: 'B'})



def wen2(df):

    return pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})



def wen3(df):

    s = pd.DataFrame({'B': np.concatenate(df.B.values)}, index=df.index.repeat(df.B.str.len()))

    return s.join(df.drop('B', 1), how='left')



def wen4(df):

    return pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)



def chris1(df):

    vals = np.array(df.B.values.tolist())

    a = np.repeat(df.A, vals.shape[1])

    return pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)



def chris2(df):

    vals = df.B.values.tolist()

    rs = [len(r) for r in vals]

    a = np.repeat(df.A.values, rs)

    return pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)

Timings

import pandas as pd

import matplotlib.pyplot as plt

import numpy as np

from timeit import timeit



res = pd.DataFrame(

       index=['wen1', 'wen2', 'wen3', 'wen4', 'chris1', 'chris2'],

       columns=[10, 50, 100, 500, 1000, 5000, 10000],

       dtype=float

)



for f in res.index:

    for c in res.columns:

        df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})

        df = pd.concat([df]*c)

        stmt = '{}(df)'.format(f)

        setp = 'from __main__ import df, {}'.format(f)

        res.at[f, c] = timeit(stmt, setp, number=50)



ax = res.div(res.min()).T.plot(loglog=True)

ax.set_xlabel("N")

ax.set_ylabel("time (relative)")

Performance

enter image description here

edited Nov 9 '18 at 4:15

answered Nov 9 '18 at 2:35

user3483203

31.2k82655

add a comment |

+100

Option 1

If all of the sublists in the other column are the same length, numpy can be an efficient option here:

vals = np.array(df.B.values.tolist())    

a = np.repeat(df.A, vals.shape[1])



pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)

Option 2

If the sublists have different length, you need an additional step:

vals = df.B.values.tolist()

rs = [len(r) for r in vals]    

a = np.repeat(df.A, rs)



pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)

Option 3

I took a shot at generalizing this to work to flatten N columns and tile M columns, I'll work later on making it more efficient:

df = pd.DataFrame({'A': [1,2,3], 'B': [[1,2], [1,2,3], [1]],

                   'C': [[1,2,3], [1,2], [1,2]], 'D': ['A', 'B', 'C']})

   A          B          C  D

0  1     [1, 2]  [1, 2, 3]  A

1  2  [1, 2, 3]     [1, 2]  B

2  3        [1]     [1, 2]  C

def unnest(df, tile, explode):

    vals = df[explode].sum(1)

    rs = [len(r) for r in vals]

    a = np.repeat(df[tile].values, rs, axis=0)

    b = np.concatenate(vals.values)

    d = np.column_stack((a, b))

    return pd.DataFrame(d, columns = tile +  ['_'.join(explode)])



unnest(df, ['A', 'D'], ['B', 'C'])

    A  D B_C

0   1  A   1

1   1  A   2

2   1  A   1

3   1  A   2

4   1  A   3

5   2  B   1

6   2  B   2

7   2  B   3

8   2  B   1

9   2  B   2

10  3  C   1

11  3  C   1

12  3  C   2

Functions

def wen1(df):

    return df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0: 'B'})



def wen2(df):

    return pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})



def wen3(df):

    s = pd.DataFrame({'B': np.concatenate(df.B.values)}, index=df.index.repeat(df.B.str.len()))

    return s.join(df.drop('B', 1), how='left')



def wen4(df):

    return pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)



def chris1(df):

    vals = np.array(df.B.values.tolist())

    a = np.repeat(df.A, vals.shape[1])

    return pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)



def chris2(df):

    vals = df.B.values.tolist()

    rs = [len(r) for r in vals]

    a = np.repeat(df.A.values, rs)

    return pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)

Timings

import pandas as pd

import matplotlib.pyplot as plt

import numpy as np

from timeit import timeit



res = pd.DataFrame(

       index=['wen1', 'wen2', 'wen3', 'wen4', 'chris1', 'chris2'],

       columns=[10, 50, 100, 500, 1000, 5000, 10000],

       dtype=float

)



for f in res.index:

    for c in res.columns:

        df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})

        df = pd.concat([df]*c)

        stmt = '{}(df)'.format(f)

        setp = 'from __main__ import df, {}'.format(f)

        res.at[f, c] = timeit(stmt, setp, number=50)



ax = res.div(res.min()).T.plot(loglog=True)

ax.set_xlabel("N")

ax.set_ylabel("time (relative)")

Performance

enter image description here

edited Nov 9 '18 at 4:15

answered Nov 9 '18 at 2:35

user3483203

31.2k82655

add a comment |

+100

Option 1

If all of the sublists in the other column are the same length, numpy can be an efficient option here:

vals = np.array(df.B.values.tolist())    

a = np.repeat(df.A, vals.shape[1])



pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)

Option 2

If the sublists have different length, you need an additional step:

vals = df.B.values.tolist()

rs = [len(r) for r in vals]    

a = np.repeat(df.A, rs)



pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)

Option 3

I took a shot at generalizing this to work to flatten N columns and tile M columns, I'll work later on making it more efficient:

df = pd.DataFrame({'A': [1,2,3], 'B': [[1,2], [1,2,3], [1]],

                   'C': [[1,2,3], [1,2], [1,2]], 'D': ['A', 'B', 'C']})

   A          B          C  D

0  1     [1, 2]  [1, 2, 3]  A

1  2  [1, 2, 3]     [1, 2]  B

2  3        [1]     [1, 2]  C

def unnest(df, tile, explode):

    vals = df[explode].sum(1)

    rs = [len(r) for r in vals]

    a = np.repeat(df[tile].values, rs, axis=0)

    b = np.concatenate(vals.values)

    d = np.column_stack((a, b))

    return pd.DataFrame(d, columns = tile +  ['_'.join(explode)])



unnest(df, ['A', 'D'], ['B', 'C'])

    A  D B_C

0   1  A   1

1   1  A   2

2   1  A   1

3   1  A   2

4   1  A   3

5   2  B   1

6   2  B   2

7   2  B   3

8   2  B   1

9   2  B   2

10  3  C   1

11  3  C   1

12  3  C   2

Functions

def wen1(df):

    return df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0: 'B'})



def wen2(df):

    return pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})



def wen3(df):

    s = pd.DataFrame({'B': np.concatenate(df.B.values)}, index=df.index.repeat(df.B.str.len()))

    return s.join(df.drop('B', 1), how='left')



def wen4(df):

    return pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)



def chris1(df):

    vals = np.array(df.B.values.tolist())

    a = np.repeat(df.A, vals.shape[1])

    return pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)



def chris2(df):

    vals = df.B.values.tolist()

    rs = [len(r) for r in vals]

    a = np.repeat(df.A.values, rs)

    return pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)

Timings

import pandas as pd

import matplotlib.pyplot as plt

import numpy as np

from timeit import timeit



res = pd.DataFrame(

       index=['wen1', 'wen2', 'wen3', 'wen4', 'chris1', 'chris2'],

       columns=[10, 50, 100, 500, 1000, 5000, 10000],

       dtype=float

)



for f in res.index:

    for c in res.columns:

        df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})

        df = pd.concat([df]*c)

        stmt = '{}(df)'.format(f)

        setp = 'from __main__ import df, {}'.format(f)

        res.at[f, c] = timeit(stmt, setp, number=50)



ax = res.div(res.min()).T.plot(loglog=True)

ax.set_xlabel("N")

ax.set_ylabel("time (relative)")

Performance

enter image description here

edited Nov 9 '18 at 4:15

answered Nov 9 '18 at 2:35

user3483203

31.2k82655

Option 1

If all of the sublists in the other column are the same length, numpy can be an efficient option here:

vals = np.array(df.B.values.tolist())    

a = np.repeat(df.A, vals.shape[1])



pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)

Option 2

If the sublists have different length, you need an additional step:

vals = df.B.values.tolist()

rs = [len(r) for r in vals]    

a = np.repeat(df.A, rs)



pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)

Option 3

I took a shot at generalizing this to work to flatten N columns and tile M columns, I'll work later on making it more efficient:

df = pd.DataFrame({'A': [1,2,3], 'B': [[1,2], [1,2,3], [1]],

                   'C': [[1,2,3], [1,2], [1,2]], 'D': ['A', 'B', 'C']})

   A          B          C  D

0  1     [1, 2]  [1, 2, 3]  A

1  2  [1, 2, 3]     [1, 2]  B

2  3        [1]     [1, 2]  C

def unnest(df, tile, explode):

    vals = df[explode].sum(1)

    rs = [len(r) for r in vals]

    a = np.repeat(df[tile].values, rs, axis=0)

    b = np.concatenate(vals.values)

    d = np.column_stack((a, b))

    return pd.DataFrame(d, columns = tile +  ['_'.join(explode)])



unnest(df, ['A', 'D'], ['B', 'C'])

    A  D B_C

0   1  A   1

1   1  A   2

2   1  A   1

3   1  A   2

4   1  A   3

5   2  B   1

6   2  B   2

7   2  B   3

8   2  B   1

9   2  B   2

10  3  C   1

11  3  C   1

12  3  C   2

Functions

def wen1(df):

    return df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0: 'B'})



def wen2(df):

    return pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})



def wen3(df):

    s = pd.DataFrame({'B': np.concatenate(df.B.values)}, index=df.index.repeat(df.B.str.len()))

    return s.join(df.drop('B', 1), how='left')



def wen4(df):

    return pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)



def chris1(df):

    vals = np.array(df.B.values.tolist())

    a = np.repeat(df.A, vals.shape[1])

    return pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)



def chris2(df):

    vals = df.B.values.tolist()

    rs = [len(r) for r in vals]

    a = np.repeat(df.A.values, rs)

    return pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)

Timings

import pandas as pd

import matplotlib.pyplot as plt

import numpy as np

from timeit import timeit



res = pd.DataFrame(

       index=['wen1', 'wen2', 'wen3', 'wen4', 'chris1', 'chris2'],

       columns=[10, 50, 100, 500, 1000, 5000, 10000],

       dtype=float

)



for f in res.index:

    for c in res.columns:

        df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})

        df = pd.concat([df]*c)

        stmt = '{}(df)'.format(f)

        setp = 'from __main__ import df, {}'.format(f)

        res.at[f, c] = timeit(stmt, setp, number=50)



ax = res.div(res.min()).T.plot(loglog=True)

ax.set_xlabel("N")

ax.set_ylabel("time (relative)")

Performance

enter image description here

edited Nov 9 '18 at 4:15

answered Nov 9 '18 at 2:35

user3483203

31.2k82655

edited Nov 9 '18 at 4:15

answered Nov 9 '18 at 2:35

user3483203

31.2k82655

answered Nov 9 '18 at 2:35

user3483203

31.2k82655

answered Nov 9 '18 at 2:35

user3483203

31.2k82655

add a comment |

One alternative is to apply the meshgrid recipe over the rows of the columns to unnest:

import numpy as np

import pandas as pd





def unnest(frame, explode):

    def mesh(values):

        return np.array(np.meshgrid(*values)).T.reshape(-1, len(values))



    data = np.vstack(mesh(row) for row in frame[explode].values)

    return pd.DataFrame(data=data, columns=explode)





df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})

print(unnest(df, ['A', 'B']))  # base

print()



df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [3, 4]], 'C': [[1, 2], [3, 4]]})

print(unnest(df, ['A', 'B', 'C']))  # multiple columns

print()



df = pd.DataFrame({'A': [1, 2, 3], 'B': [[1, 2], [1, 2, 3], [1]],

                   'C': [[1, 2, 3], [1, 2], [1, 2]], 'D': ['A', 'B', 'C']})



print(unnest(df, ['A', 'B']))  # uneven length lists

print()

print(unnest(df, ['D', 'B']))  # different types

print()

Output

answered Dec 1 '18 at 1:31

Daniel Mesejo

18.4k21432

add a comment |

One alternative is to apply the meshgrid recipe over the rows of the columns to unnest:

import numpy as np

import pandas as pd





def unnest(frame, explode):

    def mesh(values):

        return np.array(np.meshgrid(*values)).T.reshape(-1, len(values))



    data = np.vstack(mesh(row) for row in frame[explode].values)

    return pd.DataFrame(data=data, columns=explode)





df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})

print(unnest(df, ['A', 'B']))  # base

print()



df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [3, 4]], 'C': [[1, 2], [3, 4]]})

print(unnest(df, ['A', 'B', 'C']))  # multiple columns

print()



df = pd.DataFrame({'A': [1, 2, 3], 'B': [[1, 2], [1, 2, 3], [1]],

                   'C': [[1, 2, 3], [1, 2], [1, 2]], 'D': ['A', 'B', 'C']})



print(unnest(df, ['A', 'B']))  # uneven length lists

print()

print(unnest(df, ['D', 'B']))  # different types

print()

Output

answered Dec 1 '18 at 1:31

Daniel Mesejo

18.4k21432

add a comment |

One alternative is to apply the meshgrid recipe over the rows of the columns to unnest:

import numpy as np

import pandas as pd





def unnest(frame, explode):

    def mesh(values):

        return np.array(np.meshgrid(*values)).T.reshape(-1, len(values))



    data = np.vstack(mesh(row) for row in frame[explode].values)

    return pd.DataFrame(data=data, columns=explode)





df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})

print(unnest(df, ['A', 'B']))  # base

print()



df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [3, 4]], 'C': [[1, 2], [3, 4]]})

print(unnest(df, ['A', 'B', 'C']))  # multiple columns

print()



df = pd.DataFrame({'A': [1, 2, 3], 'B': [[1, 2], [1, 2, 3], [1]],

                   'C': [[1, 2, 3], [1, 2], [1, 2]], 'D': ['A', 'B', 'C']})



print(unnest(df, ['A', 'B']))  # uneven length lists

print()

print(unnest(df, ['D', 'B']))  # different types

print()

Output

answered Dec 1 '18 at 1:31

Daniel Mesejo

18.4k21432

One alternative is to apply the meshgrid recipe over the rows of the columns to unnest:

import numpy as np

import pandas as pd





def unnest(frame, explode):

    def mesh(values):

        return np.array(np.meshgrid(*values)).T.reshape(-1, len(values))



    data = np.vstack(mesh(row) for row in frame[explode].values)

    return pd.DataFrame(data=data, columns=explode)





df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})

print(unnest(df, ['A', 'B']))  # base

print()



df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [3, 4]], 'C': [[1, 2], [3, 4]]})

print(unnest(df, ['A', 'B', 'C']))  # multiple columns

print()



df = pd.DataFrame({'A': [1, 2, 3], 'B': [[1, 2], [1, 2, 3], [1]],

                   'C': [[1, 2, 3], [1, 2], [1, 2]], 'D': ['A', 'B', 'C']})



print(unnest(df, ['A', 'B']))  # uneven length lists

print()

print(unnest(df, ['D', 'B']))  # different types

print()

Output

answered Dec 1 '18 at 1:31

Daniel Mesejo

18.4k21432

answered Dec 1 '18 at 1:31

Daniel Mesejo

18.4k21432

answered Dec 1 '18 at 1:31

Daniel Mesejo

18.4k21432

answered Dec 1 '18 at 1:31

Daniel Mesejo

18.4k21432

add a comment |

My 5 cents:

df[['B', 'B2']] = pd.DataFrame(df['B'].values.tolist())



df[['A', 'B']].append(df[['A', 'B2']].rename(columns={'B2': 'B'}),

                      ignore_index=True)

and another 5

df[['B1', 'B2']] = pd.DataFrame([*df['B']]) # if values.tolist() is too boring



(pd.wide_to_long(df.drop('B', 1), 'B', 'A', '')

 .reset_index(level=1, drop=True)

 .reset_index())

both resulting in the same

edited Dec 11 '18 at 3:50

answered Dec 11 '18 at 2:05

ayorgo

1,130714

add a comment |

My 5 cents:

df[['B', 'B2']] = pd.DataFrame(df['B'].values.tolist())



df[['A', 'B']].append(df[['A', 'B2']].rename(columns={'B2': 'B'}),

                      ignore_index=True)

and another 5

df[['B1', 'B2']] = pd.DataFrame([*df['B']]) # if values.tolist() is too boring



(pd.wide_to_long(df.drop('B', 1), 'B', 'A', '')

 .reset_index(level=1, drop=True)

 .reset_index())

both resulting in the same

edited Dec 11 '18 at 3:50

answered Dec 11 '18 at 2:05

ayorgo

1,130714

add a comment |

My 5 cents:

df[['B', 'B2']] = pd.DataFrame(df['B'].values.tolist())



df[['A', 'B']].append(df[['A', 'B2']].rename(columns={'B2': 'B'}),

                      ignore_index=True)

and another 5

df[['B1', 'B2']] = pd.DataFrame([*df['B']]) # if values.tolist() is too boring



(pd.wide_to_long(df.drop('B', 1), 'B', 'A', '')

 .reset_index(level=1, drop=True)

 .reset_index())

both resulting in the same

edited Dec 11 '18 at 3:50

answered Dec 11 '18 at 2:05

ayorgo

1,130714

My 5 cents:

df[['B', 'B2']] = pd.DataFrame(df['B'].values.tolist())



df[['A', 'B']].append(df[['A', 'B2']].rename(columns={'B2': 'B'}),

                      ignore_index=True)

and another 5

df[['B1', 'B2']] = pd.DataFrame([*df['B']]) # if values.tolist() is too boring



(pd.wide_to_long(df.drop('B', 1), 'B', 'A', '')

 .reset_index(level=1, drop=True)

 .reset_index())

both resulting in the same

edited Dec 11 '18 at 3:50

answered Dec 11 '18 at 2:05

ayorgo

1,130714

edited Dec 11 '18 at 3:50

answered Dec 11 '18 at 2:05

ayorgo

1,130714

answered Dec 11 '18 at 2:05

ayorgo

1,130714

answered Dec 11 '18 at 2:05

ayorgo

1,130714

add a comment |

-1

Something pretty not recommended (at least work in this case):

df=pd.concat([df]*2).sort_index()

it=iter(df['B'].tolist()[0]+df['B'].tolist()[0])

df['B']=df['B'].apply(lambda x:next(it))

concat + sort_index + iter + apply + next.

Now:

print(df)

Is:

If care about index:

df=df.reset_index(drop=True)

Now:

print(df)

Is:

answered Nov 9 '18 at 2:40

U9-Forward

15.3k41439

add a comment |

-1

Something pretty not recommended (at least work in this case):

df=pd.concat([df]*2).sort_index()

it=iter(df['B'].tolist()[0]+df['B'].tolist()[0])

df['B']=df['B'].apply(lambda x:next(it))

concat + sort_index + iter + apply + next.

Now:

print(df)

Is:

If care about index:

df=df.reset_index(drop=True)

Now:

print(df)

Is:

answered Nov 9 '18 at 2:40

U9-Forward

15.3k41439

add a comment |

-1

Something pretty not recommended (at least work in this case):

df=pd.concat([df]*2).sort_index()

it=iter(df['B'].tolist()[0]+df['B'].tolist()[0])

df['B']=df['B'].apply(lambda x:next(it))

concat + sort_index + iter + apply + next.

Now:

print(df)

Is:

If care about index:

df=df.reset_index(drop=True)

Now:

print(df)

Is:

answered Nov 9 '18 at 2:40

U9-Forward

15.3k41439

Something pretty not recommended (at least work in this case):

df=pd.concat([df]*2).sort_index()

it=iter(df['B'].tolist()[0]+df['B'].tolist()[0])

df['B']=df['B'].apply(lambda x:next(it))

concat + sort_index + iter + apply + next.

Now:

print(df)

Is:

If care about index:

df=df.reset_index(drop=True)

Now:

print(df)

Is:

answered Nov 9 '18 at 2:40

U9-Forward

15.3k41439

answered Nov 9 '18 at 2:40

U9-Forward

15.3k41439

answered Nov 9 '18 at 2:40

U9-Forward

15.3k41439

answered Nov 9 '18 at 2:40

U9-Forward

15.3k41439

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu