Problem

I have some enormous dataframes pulled from equipment, which track multiple runs on said equipment, each recording multiple sensors (voltage, current, rpm, pressures... etc.)
I need to widen this data set for plotting and further analysis, but unfortunately the clocks on the sensors are not synchronised, so the different parameters are collected each with their own time stamp, and can vary in length (msec, so sometimes >10 rows).

I've attempted unstacking:

df.set_index(['index','start_time','param']).value.unstack().rename_axis(None, 1).reset_index()

pivotting:

df.pivot_table(values = 'value', index = ['index','start_time'], columns = 'param')

but the different lengths causes real problems (understandably).

I have code to convert based on date (i.e. individual run) or param into a dictionary of dfs, and can do analysis on each either run or param -- but there are ~100 sensors, and 18 months worth of runs(!) so would like to make sure there is no way to do what I want... which i think is some sort of multiple-outer join. Because of the differing lengths, it would need to fill blanks with NaN - which is fine - and find the max length of any param, to adjust the length of the date to.

Model dataset

Start

df_long = pd.DataFrame({"Date" : np.array([1]*5 + [2]*3 + [3]*4 + [4]*2 + [5]*4),

                "Param" : list('aaabbabbabccaaaacc'),

                "value": [0.1, 0.2, 0.2, 1, 4, 0.6, 0.5, 90, 0.9, 8.8, 4.1, 0.4, 0.5, 0.1, 0.1, 0.3, 3.4, 5.1],

               "time" : [1,2,3,1,2,1,1,2,1,1,1,2,1,2,1,2,1,2]

                })

Ideal output

df_wide = pd.DataFrame ({

"Date" : [1,1,1,2,2,3,3,4,4,5,5],

"a": [0.1,0.2,0.2,0.6,'NaN',0.9,'NaN',0.5,0.1,0.1,0.3],

"time-a": [1,2,3,1,'NaN',1,'NaN',1,2,1,2],

"b": [1,4,'NaN',0.5,90,8.8,'NaN','NaN','NaN','NaN','NaN'],

"time-b": [1,2,'NaN', 1,2,1,'NaN','NaN','NaN','NaN','NaN'],

"c": ['NaN','NaN','NaN','NaN','NaN',4.1,0.4,'NaN','NaN',3.4,5.1],

"time-c": ['NaN','NaN','NaN','NaN','NaN',1,2,'NaN','NaN',1,2]})

Any help greatly appreciated

edited Nov 20 '18 at 15:23

jpp

96.5k2158109

asked Nov 20 '18 at 10:36

BAC83

1159

In df_long the Date column is equal to 1 five times, but in df_wide it is only equal to 1 three times, is this part of the expected output, and if so according to what logic?

– user3471881
Nov 20 '18 at 11:15

As the db is widened, it also shortens - so the 5 1's in long are now only 3 1's for param a and 2 '1's for param b. This is standard unstacking behaviour; the issue with these is that both measurements per date and per param are all different lengths

– BAC83
Nov 20 '18 at 11:27

add a comment |

Problem

I've attempted unstacking:

df.set_index(['index','start_time','param']).value.unstack().rename_axis(None, 1).reset_index()

pivotting:

df.pivot_table(values = 'value', index = ['index','start_time'], columns = 'param')

but the different lengths causes real problems (understandably).

Model dataset

Start

df_long = pd.DataFrame({"Date" : np.array([1]*5 + [2]*3 + [3]*4 + [4]*2 + [5]*4),

                "Param" : list('aaabbabbabccaaaacc'),

                "value": [0.1, 0.2, 0.2, 1, 4, 0.6, 0.5, 90, 0.9, 8.8, 4.1, 0.4, 0.5, 0.1, 0.1, 0.3, 3.4, 5.1],

               "time" : [1,2,3,1,2,1,1,2,1,1,1,2,1,2,1,2,1,2]

                })

Ideal output

df_wide = pd.DataFrame ({

"Date" : [1,1,1,2,2,3,3,4,4,5,5],

"a": [0.1,0.2,0.2,0.6,'NaN',0.9,'NaN',0.5,0.1,0.1,0.3],

"time-a": [1,2,3,1,'NaN',1,'NaN',1,2,1,2],

"b": [1,4,'NaN',0.5,90,8.8,'NaN','NaN','NaN','NaN','NaN'],

"time-b": [1,2,'NaN', 1,2,1,'NaN','NaN','NaN','NaN','NaN'],

"c": ['NaN','NaN','NaN','NaN','NaN',4.1,0.4,'NaN','NaN',3.4,5.1],

"time-c": ['NaN','NaN','NaN','NaN','NaN',1,2,'NaN','NaN',1,2]})

Any help greatly appreciated

edited Nov 20 '18 at 15:23

jpp

96.5k2158109

asked Nov 20 '18 at 10:36

BAC83

1159

In df_long the Date column is equal to 1 five times, but in df_wide it is only equal to 1 three times, is this part of the expected output, and if so according to what logic?

– user3471881
Nov 20 '18 at 11:15

As the db is widened, it also shortens - so the 5 1's in long are now only 3 1's for param a and 2 '1's for param b. This is standard unstacking behaviour; the issue with these is that both measurements per date and per param are all different lengths

– BAC83
Nov 20 '18 at 11:27

add a comment |

Problem

I've attempted unstacking:

df.set_index(['index','start_time','param']).value.unstack().rename_axis(None, 1).reset_index()

pivotting:

df.pivot_table(values = 'value', index = ['index','start_time'], columns = 'param')

but the different lengths causes real problems (understandably).

Model dataset

Start

df_long = pd.DataFrame({"Date" : np.array([1]*5 + [2]*3 + [3]*4 + [4]*2 + [5]*4),

                "Param" : list('aaabbabbabccaaaacc'),

                "value": [0.1, 0.2, 0.2, 1, 4, 0.6, 0.5, 90, 0.9, 8.8, 4.1, 0.4, 0.5, 0.1, 0.1, 0.3, 3.4, 5.1],

               "time" : [1,2,3,1,2,1,1,2,1,1,1,2,1,2,1,2,1,2]

                })

Ideal output

df_wide = pd.DataFrame ({

"Date" : [1,1,1,2,2,3,3,4,4,5,5],

"a": [0.1,0.2,0.2,0.6,'NaN',0.9,'NaN',0.5,0.1,0.1,0.3],

"time-a": [1,2,3,1,'NaN',1,'NaN',1,2,1,2],

"b": [1,4,'NaN',0.5,90,8.8,'NaN','NaN','NaN','NaN','NaN'],

"time-b": [1,2,'NaN', 1,2,1,'NaN','NaN','NaN','NaN','NaN'],

"c": ['NaN','NaN','NaN','NaN','NaN',4.1,0.4,'NaN','NaN',3.4,5.1],

"time-c": ['NaN','NaN','NaN','NaN','NaN',1,2,'NaN','NaN',1,2]})

Any help greatly appreciated

edited Nov 20 '18 at 15:23

jpp

96.5k2158109

asked Nov 20 '18 at 10:36

BAC83

1159

Problem

I've attempted unstacking:

df.set_index(['index','start_time','param']).value.unstack().rename_axis(None, 1).reset_index()

pivotting:

df.pivot_table(values = 'value', index = ['index','start_time'], columns = 'param')

but the different lengths causes real problems (understandably).

Model dataset

Start

df_long = pd.DataFrame({"Date" : np.array([1]*5 + [2]*3 + [3]*4 + [4]*2 + [5]*4),

                "Param" : list('aaabbabbabccaaaacc'),

                "value": [0.1, 0.2, 0.2, 1, 4, 0.6, 0.5, 90, 0.9, 8.8, 4.1, 0.4, 0.5, 0.1, 0.1, 0.3, 3.4, 5.1],

               "time" : [1,2,3,1,2,1,1,2,1,1,1,2,1,2,1,2,1,2]

                })

Ideal output

df_wide = pd.DataFrame ({

"Date" : [1,1,1,2,2,3,3,4,4,5,5],

"a": [0.1,0.2,0.2,0.6,'NaN',0.9,'NaN',0.5,0.1,0.1,0.3],

"time-a": [1,2,3,1,'NaN',1,'NaN',1,2,1,2],

"b": [1,4,'NaN',0.5,90,8.8,'NaN','NaN','NaN','NaN','NaN'],

"time-b": [1,2,'NaN', 1,2,1,'NaN','NaN','NaN','NaN','NaN'],

"c": ['NaN','NaN','NaN','NaN','NaN',4.1,0.4,'NaN','NaN',3.4,5.1],

"time-c": ['NaN','NaN','NaN','NaN','NaN',1,2,'NaN','NaN',1,2]})

Any help greatly appreciated

python pandas dataframe pandas-groupby

edited Nov 20 '18 at 15:23

jpp

96.5k2158109

asked Nov 20 '18 at 10:36

BAC83

1159

edited Nov 20 '18 at 15:23

jpp

96.5k2158109

asked Nov 20 '18 at 10:36

BAC83

1159

edited Nov 20 '18 at 15:23

jpp

96.5k2158109

edited Nov 20 '18 at 15:23

jpp

96.5k2158109

edited Nov 20 '18 at 15:23

jpp

96.5k2158109

asked Nov 20 '18 at 10:36

BAC83

1159

asked Nov 20 '18 at 10:36

BAC83

1159

asked Nov 20 '18 at 10:36

BAC83

1159

In df_long the Date column is equal to 1 five times, but in df_wide it is only equal to 1 three times, is this part of the expected output, and if so according to what logic?

– user3471881
Nov 20 '18 at 11:15

As the db is widened, it also shortens - so the 5 1's in long are now only 3 1's for param a and 2 '1's for param b. This is standard unstacking behaviour; the issue with these is that both measurements per date and per param are all different lengths

– BAC83
Nov 20 '18 at 11:27

add a comment |

In df_long the Date column is equal to 1 five times, but in df_wide it is only equal to 1 three times, is this part of the expected output, and if so according to what logic?

– user3471881
Nov 20 '18 at 11:15

As the db is widened, it also shortens - so the 5 1's in long are now only 3 1's for param a and 2 '1's for param b. This is standard unstacking behaviour; the issue with these is that both measurements per date and per param are all different lengths

– BAC83
Nov 20 '18 at 11:27

In df_long the Date column is equal to 1 five times, but in df_wide it is only equal to 1 three times, is this part of the expected output, and if so according to what logic?

– user3471881
Nov 20 '18 at 11:15

As the db is widened, it also shortens - so the 5 1's in long are now only 3 1's for param a and 2 '1's for param b. This is standard unstacking behaviour; the issue with these is that both measurements per date and per param are all different lengths

– BAC83
Nov 20 '18 at 11:27

add a comment |

1 Answer
1

active

oldest

votes

`pd.pivot_table`

You can pivot your dataframe. The only difference versus your desired output is you only have a single time series; you can, if you wish, construct time-a, time-b, etc, by considering null values in other series.

res = pd.pivot_table(df_long, index=['Date', 'time'],

                     columns=['Param'], values='value').reset_index()



print(res)



Param  Date  time    a     b    c

0         1     1  0.1   1.0  NaN

1         1     2  0.2   4.0  NaN

2         1     3  0.2   NaN  NaN

3         2     1  0.6   0.5  NaN

4         2     2  NaN  90.0  NaN

5         3     1  0.9   8.8  4.1

6         3     2  NaN   NaN  0.4

7         4     1  0.5   NaN  NaN

8         4     2  0.1   NaN  NaN

9         5     1  0.1   NaN  3.4

10        5     2  0.3   NaN  5.1

answered Nov 20 '18 at 15:22

jpp

96.5k2158109

I was obsessed with keeping the times separated - which isn't needed! Thanks, this works. Annoying that i was most of the way there! Cheers

– BAC83
Nov 20 '18 at 15:35

How would it work if i wanted to keep the time series separated (as alluded to in your answer)?

– BAC83
Nov 20 '18 at 15:38

1

You'd do something like df['time-a'] = df['time'].mask(df['a'].isnull()).

– jpp
Nov 20 '18 at 15:41

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53391122%2fpandas-dataframe-wide-vs-long-unstack-vs-pivot-vs-outer-join-for-multiple-df%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

`pd.pivot_table`

res = pd.pivot_table(df_long, index=['Date', 'time'],

                     columns=['Param'], values='value').reset_index()



print(res)



Param  Date  time    a     b    c

0         1     1  0.1   1.0  NaN

1         1     2  0.2   4.0  NaN

2         1     3  0.2   NaN  NaN

3         2     1  0.6   0.5  NaN

4         2     2  NaN  90.0  NaN

5         3     1  0.9   8.8  4.1

6         3     2  NaN   NaN  0.4

7         4     1  0.5   NaN  NaN

8         4     2  0.1   NaN  NaN

9         5     1  0.1   NaN  3.4

10        5     2  0.3   NaN  5.1

answered Nov 20 '18 at 15:22

jpp

96.5k2158109

I was obsessed with keeping the times separated - which isn't needed! Thanks, this works. Annoying that i was most of the way there! Cheers

– BAC83
Nov 20 '18 at 15:35

How would it work if i wanted to keep the time series separated (as alluded to in your answer)?

– BAC83
Nov 20 '18 at 15:38

1

You'd do something like df['time-a'] = df['time'].mask(df['a'].isnull()).

– jpp
Nov 20 '18 at 15:41

add a comment |

`pd.pivot_table`

res = pd.pivot_table(df_long, index=['Date', 'time'],

                     columns=['Param'], values='value').reset_index()



print(res)



Param  Date  time    a     b    c

0         1     1  0.1   1.0  NaN

1         1     2  0.2   4.0  NaN

2         1     3  0.2   NaN  NaN

3         2     1  0.6   0.5  NaN

4         2     2  NaN  90.0  NaN

5         3     1  0.9   8.8  4.1

6         3     2  NaN   NaN  0.4

7         4     1  0.5   NaN  NaN

8         4     2  0.1   NaN  NaN

9         5     1  0.1   NaN  3.4

10        5     2  0.3   NaN  5.1

answered Nov 20 '18 at 15:22

jpp

96.5k2158109

I was obsessed with keeping the times separated - which isn't needed! Thanks, this works. Annoying that i was most of the way there! Cheers

– BAC83
Nov 20 '18 at 15:35

How would it work if i wanted to keep the time series separated (as alluded to in your answer)?

– BAC83
Nov 20 '18 at 15:38

1

You'd do something like df['time-a'] = df['time'].mask(df['a'].isnull()).

– jpp
Nov 20 '18 at 15:41

add a comment |

`pd.pivot_table`

res = pd.pivot_table(df_long, index=['Date', 'time'],

                     columns=['Param'], values='value').reset_index()



print(res)



Param  Date  time    a     b    c

0         1     1  0.1   1.0  NaN

1         1     2  0.2   4.0  NaN

2         1     3  0.2   NaN  NaN

3         2     1  0.6   0.5  NaN

4         2     2  NaN  90.0  NaN

5         3     1  0.9   8.8  4.1

6         3     2  NaN   NaN  0.4

7         4     1  0.5   NaN  NaN

8         4     2  0.1   NaN  NaN

9         5     1  0.1   NaN  3.4

10        5     2  0.3   NaN  5.1

answered Nov 20 '18 at 15:22

jpp

96.5k2158109

`pd.pivot_table`

res = pd.pivot_table(df_long, index=['Date', 'time'],

                     columns=['Param'], values='value').reset_index()



print(res)



Param  Date  time    a     b    c

0         1     1  0.1   1.0  NaN

1         1     2  0.2   4.0  NaN

2         1     3  0.2   NaN  NaN

3         2     1  0.6   0.5  NaN

4         2     2  NaN  90.0  NaN

5         3     1  0.9   8.8  4.1

6         3     2  NaN   NaN  0.4

7         4     1  0.5   NaN  NaN

8         4     2  0.1   NaN  NaN

9         5     1  0.1   NaN  3.4

10        5     2  0.3   NaN  5.1

answered Nov 20 '18 at 15:22

jpp

96.5k2158109

answered Nov 20 '18 at 15:22

jpp

96.5k2158109

answered Nov 20 '18 at 15:22

jpp

96.5k2158109

answered Nov 20 '18 at 15:22

jpp

96.5k2158109

I was obsessed with keeping the times separated - which isn't needed! Thanks, this works. Annoying that i was most of the way there! Cheers

– BAC83
Nov 20 '18 at 15:35

How would it work if i wanted to keep the time series separated (as alluded to in your answer)?

– BAC83
Nov 20 '18 at 15:38

1

You'd do something like df['time-a'] = df['time'].mask(df['a'].isnull()).

– jpp
Nov 20 '18 at 15:41

add a comment |

I was obsessed with keeping the times separated - which isn't needed! Thanks, this works. Annoying that i was most of the way there! Cheers

– BAC83
Nov 20 '18 at 15:35

How would it work if i wanted to keep the time series separated (as alluded to in your answer)?

– BAC83
Nov 20 '18 at 15:38

1

You'd do something like df['time-a'] = df['time'].mask(df['a'].isnull()).

– jpp
Nov 20 '18 at 15:41

I was obsessed with keeping the times separated - which isn't needed! Thanks, this works. Annoying that i was most of the way there! Cheers

– BAC83
Nov 20 '18 at 15:35

How would it work if i wanted to keep the time series separated (as alluded to in your answer)?

– BAC83
Nov 20 '18 at 15:38

You'd do something like df['time-a'] = df['time'].mask(df['a'].isnull()).

– jpp
Nov 20 '18 at 15:41

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu

Pandas dataframe wide vs long - unstack vs pivot vs outer join for MULTIPLE df

Problem

Model dataset

Start

Ideal output

Problem

Model dataset

Start

Ideal output

Problem

Model dataset

Start

Ideal output

Problem

Model dataset

Start

Ideal output

1 Answer
1

`pd.pivot_table`

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

`pd.pivot_table`

`pd.pivot_table`

`pd.pivot_table`

`pd.pivot_table`

Post as a guest

Popular posts from this blog

MongoDB - Not Authorized To Execute Command

How to fix TextFormField cause rebuild widget in Flutter

in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith

Category

Random preview

Pandas dataframe wide vs long - unstack vs pivot vs outer join for MULTIPLE df

Problem

Model dataset

Start

Ideal output

Problem

Model dataset

Start

Ideal output

Problem

Model dataset

Start

Ideal output

Problem

Model dataset

Start

Ideal output

1 Answer 1

pd.pivot_table

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

pd.pivot_table

pd.pivot_table

pd.pivot_table

pd.pivot_table

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

MongoDB - Not Authorized To Execute Command

How to fix TextFormField cause rebuild widget in Flutter

in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith

1 Answer
1

`pd.pivot_table`

1 Answer
1

1 Answer
1

`pd.pivot_table`

`pd.pivot_table`

`pd.pivot_table`

`pd.pivot_table`