Pandas assign a value to new row based on index on incoming live data












0















I am having trouble writing efficient code (without many loops) that assings a value to a cell in a pandas dataframe that is being updated every minute or so (live stream). In the training set I trained my model with one-hot encoded timestamp variables and it did better than continues variables, so that's what I want to use for production. The dataframe looks like this:



datetime              DOW_1     DOW_2    ... DOW_7    Month1   Month2   Month3 
`2018-07-01 09:30:00` 0 1 0 0 0 1


As you can see the columns are encoded with 0's and 1's to denote what month, day of week, (and I have more columns for day of year, is_holiday, etc...) I easily did this on training, validation, and test data using pd.get_dummies, but now that a live stream of data is coming in I cannot find an easy way to 'assign' month2 = 0 based on df.index.month



I tried doing something along the lines of this type of loop but it's quite tedious and slow..



i=0
while i < len(df):
for m in range(1,13):
if df.index.iloc[i].month == m:
df['Month'+str(m)][i] = 1
i+=1
else:
i+=1


Any better suggestions?










share|improve this question

























  • Is it possible that the line df['Month'+str(m)][i] = i should assing 1 instead of i?

    – Julian Peller
    Nov 20 '18 at 2:05











  • Oops sorry typo.. Yep!

    – Matt Elgazar
    Nov 20 '18 at 2:58
















0















I am having trouble writing efficient code (without many loops) that assings a value to a cell in a pandas dataframe that is being updated every minute or so (live stream). In the training set I trained my model with one-hot encoded timestamp variables and it did better than continues variables, so that's what I want to use for production. The dataframe looks like this:



datetime              DOW_1     DOW_2    ... DOW_7    Month1   Month2   Month3 
`2018-07-01 09:30:00` 0 1 0 0 0 1


As you can see the columns are encoded with 0's and 1's to denote what month, day of week, (and I have more columns for day of year, is_holiday, etc...) I easily did this on training, validation, and test data using pd.get_dummies, but now that a live stream of data is coming in I cannot find an easy way to 'assign' month2 = 0 based on df.index.month



I tried doing something along the lines of this type of loop but it's quite tedious and slow..



i=0
while i < len(df):
for m in range(1,13):
if df.index.iloc[i].month == m:
df['Month'+str(m)][i] = 1
i+=1
else:
i+=1


Any better suggestions?










share|improve this question

























  • Is it possible that the line df['Month'+str(m)][i] = i should assing 1 instead of i?

    – Julian Peller
    Nov 20 '18 at 2:05











  • Oops sorry typo.. Yep!

    – Matt Elgazar
    Nov 20 '18 at 2:58














0












0








0








I am having trouble writing efficient code (without many loops) that assings a value to a cell in a pandas dataframe that is being updated every minute or so (live stream). In the training set I trained my model with one-hot encoded timestamp variables and it did better than continues variables, so that's what I want to use for production. The dataframe looks like this:



datetime              DOW_1     DOW_2    ... DOW_7    Month1   Month2   Month3 
`2018-07-01 09:30:00` 0 1 0 0 0 1


As you can see the columns are encoded with 0's and 1's to denote what month, day of week, (and I have more columns for day of year, is_holiday, etc...) I easily did this on training, validation, and test data using pd.get_dummies, but now that a live stream of data is coming in I cannot find an easy way to 'assign' month2 = 0 based on df.index.month



I tried doing something along the lines of this type of loop but it's quite tedious and slow..



i=0
while i < len(df):
for m in range(1,13):
if df.index.iloc[i].month == m:
df['Month'+str(m)][i] = 1
i+=1
else:
i+=1


Any better suggestions?










share|improve this question
















I am having trouble writing efficient code (without many loops) that assings a value to a cell in a pandas dataframe that is being updated every minute or so (live stream). In the training set I trained my model with one-hot encoded timestamp variables and it did better than continues variables, so that's what I want to use for production. The dataframe looks like this:



datetime              DOW_1     DOW_2    ... DOW_7    Month1   Month2   Month3 
`2018-07-01 09:30:00` 0 1 0 0 0 1


As you can see the columns are encoded with 0's and 1's to denote what month, day of week, (and I have more columns for day of year, is_holiday, etc...) I easily did this on training, validation, and test data using pd.get_dummies, but now that a live stream of data is coming in I cannot find an easy way to 'assign' month2 = 0 based on df.index.month



I tried doing something along the lines of this type of loop but it's quite tedious and slow..



i=0
while i < len(df):
for m in range(1,13):
if df.index.iloc[i].month == m:
df['Month'+str(m)][i] = 1
i+=1
else:
i+=1


Any better suggestions?







python pandas






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 20 '18 at 2:59







Matt Elgazar

















asked Nov 20 '18 at 1:46









Matt ElgazarMatt Elgazar

589




589













  • Is it possible that the line df['Month'+str(m)][i] = i should assing 1 instead of i?

    – Julian Peller
    Nov 20 '18 at 2:05











  • Oops sorry typo.. Yep!

    – Matt Elgazar
    Nov 20 '18 at 2:58



















  • Is it possible that the line df['Month'+str(m)][i] = i should assing 1 instead of i?

    – Julian Peller
    Nov 20 '18 at 2:05











  • Oops sorry typo.. Yep!

    – Matt Elgazar
    Nov 20 '18 at 2:58

















Is it possible that the line df['Month'+str(m)][i] = i should assing 1 instead of i?

– Julian Peller
Nov 20 '18 at 2:05





Is it possible that the line df['Month'+str(m)][i] = i should assing 1 instead of i?

– Julian Peller
Nov 20 '18 at 2:05













Oops sorry typo.. Yep!

– Matt Elgazar
Nov 20 '18 at 2:58





Oops sorry typo.. Yep!

– Matt Elgazar
Nov 20 '18 at 2:58












1 Answer
1






active

oldest

votes


















1














I'm still thinking about a solution that removes even the for, but you can at least avoid the external while over len(df) using .loc:



for m in range(1, 13):
df.loc[df.index.month == m, 'Month'+str(m)] = 1





share|improve this answer


























  • This is great - pretty much exactly what I was looking for. I don't know if there's a way to do it without any loops since some of the columns don't exist and they have to be created. The only last issue is it leaves the other columns with NaN values (i.e. month_1 has all 1's but month 2-12 has NaN's). Nice work, I'll mark it as correct!

    – Matt Elgazar
    Nov 20 '18 at 7:57













  • Thanks. Regarding the NaN, you are right. I can think of 2 options to handle that: casting all NaN to zero, if possible: df = df.fillna(0) or adding a second line to the for filling the zeros: df.loc[df.index.month != m, 'Month'+str(m)] = 0 (note the != instead of ==). I couldn't figure out a completely for-free solution... I don't think it exists.

    – Julian Peller
    Nov 20 '18 at 14:43













  • Nice! yeah the extra line makes more sense since I'm looping over a lot of columns

    – Matt Elgazar
    Nov 20 '18 at 15:37











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53385093%2fpandas-assign-a-value-to-new-row-based-on-index-on-incoming-live-data%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














I'm still thinking about a solution that removes even the for, but you can at least avoid the external while over len(df) using .loc:



for m in range(1, 13):
df.loc[df.index.month == m, 'Month'+str(m)] = 1





share|improve this answer


























  • This is great - pretty much exactly what I was looking for. I don't know if there's a way to do it without any loops since some of the columns don't exist and they have to be created. The only last issue is it leaves the other columns with NaN values (i.e. month_1 has all 1's but month 2-12 has NaN's). Nice work, I'll mark it as correct!

    – Matt Elgazar
    Nov 20 '18 at 7:57













  • Thanks. Regarding the NaN, you are right. I can think of 2 options to handle that: casting all NaN to zero, if possible: df = df.fillna(0) or adding a second line to the for filling the zeros: df.loc[df.index.month != m, 'Month'+str(m)] = 0 (note the != instead of ==). I couldn't figure out a completely for-free solution... I don't think it exists.

    – Julian Peller
    Nov 20 '18 at 14:43













  • Nice! yeah the extra line makes more sense since I'm looping over a lot of columns

    – Matt Elgazar
    Nov 20 '18 at 15:37
















1














I'm still thinking about a solution that removes even the for, but you can at least avoid the external while over len(df) using .loc:



for m in range(1, 13):
df.loc[df.index.month == m, 'Month'+str(m)] = 1





share|improve this answer


























  • This is great - pretty much exactly what I was looking for. I don't know if there's a way to do it without any loops since some of the columns don't exist and they have to be created. The only last issue is it leaves the other columns with NaN values (i.e. month_1 has all 1's but month 2-12 has NaN's). Nice work, I'll mark it as correct!

    – Matt Elgazar
    Nov 20 '18 at 7:57













  • Thanks. Regarding the NaN, you are right. I can think of 2 options to handle that: casting all NaN to zero, if possible: df = df.fillna(0) or adding a second line to the for filling the zeros: df.loc[df.index.month != m, 'Month'+str(m)] = 0 (note the != instead of ==). I couldn't figure out a completely for-free solution... I don't think it exists.

    – Julian Peller
    Nov 20 '18 at 14:43













  • Nice! yeah the extra line makes more sense since I'm looping over a lot of columns

    – Matt Elgazar
    Nov 20 '18 at 15:37














1












1








1







I'm still thinking about a solution that removes even the for, but you can at least avoid the external while over len(df) using .loc:



for m in range(1, 13):
df.loc[df.index.month == m, 'Month'+str(m)] = 1





share|improve this answer















I'm still thinking about a solution that removes even the for, but you can at least avoid the external while over len(df) using .loc:



for m in range(1, 13):
df.loc[df.index.month == m, 'Month'+str(m)] = 1






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 20 '18 at 2:17

























answered Nov 20 '18 at 2:08









Julian PellerJulian Peller

8941511




8941511













  • This is great - pretty much exactly what I was looking for. I don't know if there's a way to do it without any loops since some of the columns don't exist and they have to be created. The only last issue is it leaves the other columns with NaN values (i.e. month_1 has all 1's but month 2-12 has NaN's). Nice work, I'll mark it as correct!

    – Matt Elgazar
    Nov 20 '18 at 7:57













  • Thanks. Regarding the NaN, you are right. I can think of 2 options to handle that: casting all NaN to zero, if possible: df = df.fillna(0) or adding a second line to the for filling the zeros: df.loc[df.index.month != m, 'Month'+str(m)] = 0 (note the != instead of ==). I couldn't figure out a completely for-free solution... I don't think it exists.

    – Julian Peller
    Nov 20 '18 at 14:43













  • Nice! yeah the extra line makes more sense since I'm looping over a lot of columns

    – Matt Elgazar
    Nov 20 '18 at 15:37



















  • This is great - pretty much exactly what I was looking for. I don't know if there's a way to do it without any loops since some of the columns don't exist and they have to be created. The only last issue is it leaves the other columns with NaN values (i.e. month_1 has all 1's but month 2-12 has NaN's). Nice work, I'll mark it as correct!

    – Matt Elgazar
    Nov 20 '18 at 7:57













  • Thanks. Regarding the NaN, you are right. I can think of 2 options to handle that: casting all NaN to zero, if possible: df = df.fillna(0) or adding a second line to the for filling the zeros: df.loc[df.index.month != m, 'Month'+str(m)] = 0 (note the != instead of ==). I couldn't figure out a completely for-free solution... I don't think it exists.

    – Julian Peller
    Nov 20 '18 at 14:43













  • Nice! yeah the extra line makes more sense since I'm looping over a lot of columns

    – Matt Elgazar
    Nov 20 '18 at 15:37

















This is great - pretty much exactly what I was looking for. I don't know if there's a way to do it without any loops since some of the columns don't exist and they have to be created. The only last issue is it leaves the other columns with NaN values (i.e. month_1 has all 1's but month 2-12 has NaN's). Nice work, I'll mark it as correct!

– Matt Elgazar
Nov 20 '18 at 7:57







This is great - pretty much exactly what I was looking for. I don't know if there's a way to do it without any loops since some of the columns don't exist and they have to be created. The only last issue is it leaves the other columns with NaN values (i.e. month_1 has all 1's but month 2-12 has NaN's). Nice work, I'll mark it as correct!

– Matt Elgazar
Nov 20 '18 at 7:57















Thanks. Regarding the NaN, you are right. I can think of 2 options to handle that: casting all NaN to zero, if possible: df = df.fillna(0) or adding a second line to the for filling the zeros: df.loc[df.index.month != m, 'Month'+str(m)] = 0 (note the != instead of ==). I couldn't figure out a completely for-free solution... I don't think it exists.

– Julian Peller
Nov 20 '18 at 14:43







Thanks. Regarding the NaN, you are right. I can think of 2 options to handle that: casting all NaN to zero, if possible: df = df.fillna(0) or adding a second line to the for filling the zeros: df.loc[df.index.month != m, 'Month'+str(m)] = 0 (note the != instead of ==). I couldn't figure out a completely for-free solution... I don't think it exists.

– Julian Peller
Nov 20 '18 at 14:43















Nice! yeah the extra line makes more sense since I'm looping over a lot of columns

– Matt Elgazar
Nov 20 '18 at 15:37





Nice! yeah the extra line makes more sense since I'm looping over a lot of columns

– Matt Elgazar
Nov 20 '18 at 15:37


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53385093%2fpandas-assign-a-value-to-new-row-based-on-index-on-incoming-live-data%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

MongoDB - Not Authorized To Execute Command

in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith

How to fix TextFormField cause rebuild widget in Flutter