Pandas assign a value to new row based on index on incoming live data
I am having trouble writing efficient code (without many loops) that assings a value to a cell in a pandas dataframe that is being updated every minute or so (live stream). In the training set I trained my model with one-hot encoded timestamp variables and it did better than continues variables, so that's what I want to use for production. The dataframe looks like this:
datetime DOW_1 DOW_2 ... DOW_7 Month1 Month2 Month3
`2018-07-01 09:30:00` 0 1 0 0 0 1
As you can see the columns are encoded with 0's and 1's to denote what month, day of week, (and I have more columns for day of year, is_holiday, etc...) I easily did this on training, validation, and test data using pd.get_dummies, but now that a live stream of data is coming in I cannot find an easy way to 'assign' month2 = 0 based on df.index.month
I tried doing something along the lines of this type of loop but it's quite tedious and slow..
i=0
while i < len(df):
for m in range(1,13):
if df.index.iloc[i].month == m:
df['Month'+str(m)][i] = 1
i+=1
else:
i+=1
Any better suggestions?
python pandas
add a comment |
I am having trouble writing efficient code (without many loops) that assings a value to a cell in a pandas dataframe that is being updated every minute or so (live stream). In the training set I trained my model with one-hot encoded timestamp variables and it did better than continues variables, so that's what I want to use for production. The dataframe looks like this:
datetime DOW_1 DOW_2 ... DOW_7 Month1 Month2 Month3
`2018-07-01 09:30:00` 0 1 0 0 0 1
As you can see the columns are encoded with 0's and 1's to denote what month, day of week, (and I have more columns for day of year, is_holiday, etc...) I easily did this on training, validation, and test data using pd.get_dummies, but now that a live stream of data is coming in I cannot find an easy way to 'assign' month2 = 0 based on df.index.month
I tried doing something along the lines of this type of loop but it's quite tedious and slow..
i=0
while i < len(df):
for m in range(1,13):
if df.index.iloc[i].month == m:
df['Month'+str(m)][i] = 1
i+=1
else:
i+=1
Any better suggestions?
python pandas
Is it possible that the linedf['Month'+str(m)][i] = i
should assing1
instead ofi
?
– Julian Peller
Nov 20 '18 at 2:05
Oops sorry typo.. Yep!
– Matt Elgazar
Nov 20 '18 at 2:58
add a comment |
I am having trouble writing efficient code (without many loops) that assings a value to a cell in a pandas dataframe that is being updated every minute or so (live stream). In the training set I trained my model with one-hot encoded timestamp variables and it did better than continues variables, so that's what I want to use for production. The dataframe looks like this:
datetime DOW_1 DOW_2 ... DOW_7 Month1 Month2 Month3
`2018-07-01 09:30:00` 0 1 0 0 0 1
As you can see the columns are encoded with 0's and 1's to denote what month, day of week, (and I have more columns for day of year, is_holiday, etc...) I easily did this on training, validation, and test data using pd.get_dummies, but now that a live stream of data is coming in I cannot find an easy way to 'assign' month2 = 0 based on df.index.month
I tried doing something along the lines of this type of loop but it's quite tedious and slow..
i=0
while i < len(df):
for m in range(1,13):
if df.index.iloc[i].month == m:
df['Month'+str(m)][i] = 1
i+=1
else:
i+=1
Any better suggestions?
python pandas
I am having trouble writing efficient code (without many loops) that assings a value to a cell in a pandas dataframe that is being updated every minute or so (live stream). In the training set I trained my model with one-hot encoded timestamp variables and it did better than continues variables, so that's what I want to use for production. The dataframe looks like this:
datetime DOW_1 DOW_2 ... DOW_7 Month1 Month2 Month3
`2018-07-01 09:30:00` 0 1 0 0 0 1
As you can see the columns are encoded with 0's and 1's to denote what month, day of week, (and I have more columns for day of year, is_holiday, etc...) I easily did this on training, validation, and test data using pd.get_dummies, but now that a live stream of data is coming in I cannot find an easy way to 'assign' month2 = 0 based on df.index.month
I tried doing something along the lines of this type of loop but it's quite tedious and slow..
i=0
while i < len(df):
for m in range(1,13):
if df.index.iloc[i].month == m:
df['Month'+str(m)][i] = 1
i+=1
else:
i+=1
Any better suggestions?
python pandas
python pandas
edited Nov 20 '18 at 2:59
Matt Elgazar
asked Nov 20 '18 at 1:46
Matt ElgazarMatt Elgazar
589
589
Is it possible that the linedf['Month'+str(m)][i] = i
should assing1
instead ofi
?
– Julian Peller
Nov 20 '18 at 2:05
Oops sorry typo.. Yep!
– Matt Elgazar
Nov 20 '18 at 2:58
add a comment |
Is it possible that the linedf['Month'+str(m)][i] = i
should assing1
instead ofi
?
– Julian Peller
Nov 20 '18 at 2:05
Oops sorry typo.. Yep!
– Matt Elgazar
Nov 20 '18 at 2:58
Is it possible that the line
df['Month'+str(m)][i] = i
should assing 1
instead of i
?– Julian Peller
Nov 20 '18 at 2:05
Is it possible that the line
df['Month'+str(m)][i] = i
should assing 1
instead of i
?– Julian Peller
Nov 20 '18 at 2:05
Oops sorry typo.. Yep!
– Matt Elgazar
Nov 20 '18 at 2:58
Oops sorry typo.. Yep!
– Matt Elgazar
Nov 20 '18 at 2:58
add a comment |
1 Answer
1
active
oldest
votes
I'm still thinking about a solution that removes even the for, but you can at least avoid the external while over len(df)
using .loc:
for m in range(1, 13):
df.loc[df.index.month == m, 'Month'+str(m)] = 1
This is great - pretty much exactly what I was looking for. I don't know if there's a way to do it without any loops since some of the columns don't exist and they have to be created. The only last issue is it leaves the other columns with NaN values (i.e. month_1 has all 1's but month 2-12 has NaN's). Nice work, I'll mark it as correct!
– Matt Elgazar
Nov 20 '18 at 7:57
Thanks. Regarding the NaN, you are right. I can think of 2 options to handle that: casting all NaN to zero, if possible:df = df.fillna(0)
or adding a second line to the for filling the zeros:df.loc[df.index.month != m, 'Month'+str(m)] = 0
(note the!=
instead of==
). I couldn't figure out a completely for-free solution... I don't think it exists.
– Julian Peller
Nov 20 '18 at 14:43
Nice! yeah the extra line makes more sense since I'm looping over a lot of columns
– Matt Elgazar
Nov 20 '18 at 15:37
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53385093%2fpandas-assign-a-value-to-new-row-based-on-index-on-incoming-live-data%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I'm still thinking about a solution that removes even the for, but you can at least avoid the external while over len(df)
using .loc:
for m in range(1, 13):
df.loc[df.index.month == m, 'Month'+str(m)] = 1
This is great - pretty much exactly what I was looking for. I don't know if there's a way to do it without any loops since some of the columns don't exist and they have to be created. The only last issue is it leaves the other columns with NaN values (i.e. month_1 has all 1's but month 2-12 has NaN's). Nice work, I'll mark it as correct!
– Matt Elgazar
Nov 20 '18 at 7:57
Thanks. Regarding the NaN, you are right. I can think of 2 options to handle that: casting all NaN to zero, if possible:df = df.fillna(0)
or adding a second line to the for filling the zeros:df.loc[df.index.month != m, 'Month'+str(m)] = 0
(note the!=
instead of==
). I couldn't figure out a completely for-free solution... I don't think it exists.
– Julian Peller
Nov 20 '18 at 14:43
Nice! yeah the extra line makes more sense since I'm looping over a lot of columns
– Matt Elgazar
Nov 20 '18 at 15:37
add a comment |
I'm still thinking about a solution that removes even the for, but you can at least avoid the external while over len(df)
using .loc:
for m in range(1, 13):
df.loc[df.index.month == m, 'Month'+str(m)] = 1
This is great - pretty much exactly what I was looking for. I don't know if there's a way to do it without any loops since some of the columns don't exist and they have to be created. The only last issue is it leaves the other columns with NaN values (i.e. month_1 has all 1's but month 2-12 has NaN's). Nice work, I'll mark it as correct!
– Matt Elgazar
Nov 20 '18 at 7:57
Thanks. Regarding the NaN, you are right. I can think of 2 options to handle that: casting all NaN to zero, if possible:df = df.fillna(0)
or adding a second line to the for filling the zeros:df.loc[df.index.month != m, 'Month'+str(m)] = 0
(note the!=
instead of==
). I couldn't figure out a completely for-free solution... I don't think it exists.
– Julian Peller
Nov 20 '18 at 14:43
Nice! yeah the extra line makes more sense since I'm looping over a lot of columns
– Matt Elgazar
Nov 20 '18 at 15:37
add a comment |
I'm still thinking about a solution that removes even the for, but you can at least avoid the external while over len(df)
using .loc:
for m in range(1, 13):
df.loc[df.index.month == m, 'Month'+str(m)] = 1
I'm still thinking about a solution that removes even the for, but you can at least avoid the external while over len(df)
using .loc:
for m in range(1, 13):
df.loc[df.index.month == m, 'Month'+str(m)] = 1
edited Nov 20 '18 at 2:17
answered Nov 20 '18 at 2:08
Julian PellerJulian Peller
8941511
8941511
This is great - pretty much exactly what I was looking for. I don't know if there's a way to do it without any loops since some of the columns don't exist and they have to be created. The only last issue is it leaves the other columns with NaN values (i.e. month_1 has all 1's but month 2-12 has NaN's). Nice work, I'll mark it as correct!
– Matt Elgazar
Nov 20 '18 at 7:57
Thanks. Regarding the NaN, you are right. I can think of 2 options to handle that: casting all NaN to zero, if possible:df = df.fillna(0)
or adding a second line to the for filling the zeros:df.loc[df.index.month != m, 'Month'+str(m)] = 0
(note the!=
instead of==
). I couldn't figure out a completely for-free solution... I don't think it exists.
– Julian Peller
Nov 20 '18 at 14:43
Nice! yeah the extra line makes more sense since I'm looping over a lot of columns
– Matt Elgazar
Nov 20 '18 at 15:37
add a comment |
This is great - pretty much exactly what I was looking for. I don't know if there's a way to do it without any loops since some of the columns don't exist and they have to be created. The only last issue is it leaves the other columns with NaN values (i.e. month_1 has all 1's but month 2-12 has NaN's). Nice work, I'll mark it as correct!
– Matt Elgazar
Nov 20 '18 at 7:57
Thanks. Regarding the NaN, you are right. I can think of 2 options to handle that: casting all NaN to zero, if possible:df = df.fillna(0)
or adding a second line to the for filling the zeros:df.loc[df.index.month != m, 'Month'+str(m)] = 0
(note the!=
instead of==
). I couldn't figure out a completely for-free solution... I don't think it exists.
– Julian Peller
Nov 20 '18 at 14:43
Nice! yeah the extra line makes more sense since I'm looping over a lot of columns
– Matt Elgazar
Nov 20 '18 at 15:37
This is great - pretty much exactly what I was looking for. I don't know if there's a way to do it without any loops since some of the columns don't exist and they have to be created. The only last issue is it leaves the other columns with NaN values (i.e. month_1 has all 1's but month 2-12 has NaN's). Nice work, I'll mark it as correct!
– Matt Elgazar
Nov 20 '18 at 7:57
This is great - pretty much exactly what I was looking for. I don't know if there's a way to do it without any loops since some of the columns don't exist and they have to be created. The only last issue is it leaves the other columns with NaN values (i.e. month_1 has all 1's but month 2-12 has NaN's). Nice work, I'll mark it as correct!
– Matt Elgazar
Nov 20 '18 at 7:57
Thanks. Regarding the NaN, you are right. I can think of 2 options to handle that: casting all NaN to zero, if possible:
df = df.fillna(0)
or adding a second line to the for filling the zeros: df.loc[df.index.month != m, 'Month'+str(m)] = 0
(note the !=
instead of ==
). I couldn't figure out a completely for-free solution... I don't think it exists.– Julian Peller
Nov 20 '18 at 14:43
Thanks. Regarding the NaN, you are right. I can think of 2 options to handle that: casting all NaN to zero, if possible:
df = df.fillna(0)
or adding a second line to the for filling the zeros: df.loc[df.index.month != m, 'Month'+str(m)] = 0
(note the !=
instead of ==
). I couldn't figure out a completely for-free solution... I don't think it exists.– Julian Peller
Nov 20 '18 at 14:43
Nice! yeah the extra line makes more sense since I'm looping over a lot of columns
– Matt Elgazar
Nov 20 '18 at 15:37
Nice! yeah the extra line makes more sense since I'm looping over a lot of columns
– Matt Elgazar
Nov 20 '18 at 15:37
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53385093%2fpandas-assign-a-value-to-new-row-based-on-index-on-incoming-live-data%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Is it possible that the line
df['Month'+str(m)][i] = i
should assing1
instead ofi
?– Julian Peller
Nov 20 '18 at 2:05
Oops sorry typo.. Yep!
– Matt Elgazar
Nov 20 '18 at 2:58