IndexError when replacing missing values with mode using groupby in pandas












1















I have a dataset which requires missing value treatment.



 Column                      Missing Values

Complaint_ID 0
Date_received 0
Transaction_Type 0
Complaint_reason 0
Company_response 22506
Date_sent_to_company 0
Complaint_Status 0
Consumer_disputes 7698


Now the problem is, when I try to replace the missing values with mode of other columns using groupby:



Code:



data11["Company_response"] = 
data11.groupby("Complaint_reason").transform(lambda x: x.fillna(x.mode()
[0]))["Company_response"]

data11["Consumer_disputes"] =
data11.groupby("Transaction_Type").transform(lambda x: x.fillna(x.mode()
[0]))["Consumer_disputes"]


I get the following error:



Stacktrace



Traceback (most recent call last):

File "<ipython-input-89-8de6a010a299>", line 1, in <module>
data11["Company_response"] = data11.groupby("Complaint_reason").transform(lambda x: x.fillna(x.mode()[0]))["Company_response"]

File "C:Anaconda3libsite-packagespandascoregroupby.py", line 3741, in transform
return self._transform_general(func, *args, **kwargs)

File "C:Anaconda3libsite-packagespandascoregroupby.py", line 3699, in _transform_general
res = path(group)

File "C:Anaconda3libsite-packagespandascoregroupby.py", line 3783, in <lambda>
lambda x: func(x, *args, **kwargs), axis=self.axis)

File "C:Anaconda3libsite-packagespandascoreframe.py", line 4360, in apply
ignore_failures=ignore_failures)

File "C:Anaconda3libsite-packagespandascoreframe.py", line 4456, in _apply_standard
results[i] = func(v)

File "C:Anaconda3libsite-packagespandascoregroupby.py", line 3783, in <lambda>
lambda x: func(x, *args, **kwargs), axis=self.axis)

File "<ipython-input-89-8de6a010a299>", line 1, in <lambda>
data11["Company_response"] = data11.groupby("Complaint_reason").transform(lambda x: x.fillna(x.mode()[0]))["Company_response"]

File "C:Anaconda3libsite-packagespandascoreseries.py", line 601, in __getitem__
result = self.index.get_value(self, key)

File "C:Anaconda3libsite-packagespandascoreindexesbase.py", line 2434, in get_value
return libts.get_value_box(s, key)

File "pandas_libstslib.pyx", line 923, in pandas._libs.tslib.get_value_box (pandas_libstslib.c:18843)

File "pandas_libstslib.pyx", line 939, in pandas._libs.tslib.get_value_box (pandas_libstslib.c:18560)

IndexError: ('index out of bounds', 'occurred at index Consumer_disputes')


I have checked the length of the dataframeand all of its columns and it is same: 43266.



I have also found a question similar to this but does not have correct answer: Click here



Please help resolve the error.




IndexError: ('index out of bounds', 'occurred at index Consumer_disputes')




Here is a snapshot of the dataset if it helps in any way: Dataset Snapshot



I am using the below code successfully. But it does not serve my purpose exactly. Helps to fill the missing values though.



data11['Company_response'].fillna(data11['Company_response'].mode()[0], 
inplace=True)
data11['Consumer_disputes'].fillna(data11['Consumer_disputes'].mode()[0],
inplace=True)


Edit1: (Attaching Sample)



Input Given:
InputImage



Expected Output:
OutputImage



You can see that the missing values for company-response of Tr-1 and Tr-3 are filled by taking mode of Complaint-Reason.
And similarly for the Consumer-Disputes by taking mode of transaction-type, for Tr-5.



The below snippet consists of the dataframe and the code for those who want to replicate and give it a try.



Replication Code



import pandas as pd
import numpy as np

data11=pd.DataFrame({'Complaint_ID':['Tr-1','Tr-2','Tr-3','Tr-4','Tr-5','Tr-6'],
'Transaction_Type':['Mortgage','Credit card','Bank account or service','Debt collection','Credit card','Mortgage'],
'Complaint_reason':['Loan servicing, payments, escrow account','Incorrect information on credit report',"Cont'd attempts collect debt not owed","Cont'd attempts collect debt not owed",'Payoff process','Loan servicing, payments, escrow account'],
'Company_response':[np.nan,'Company chooses not to provide a public response',np.nan,'Company believes it acted appropriately as authorized by contract or law','Company has responded to the consumer and the CFPB and chooses not to provide a public response','Company disputes the facts presented in the complaint'],
'Consumer_disputes':['Yes','No','No','No',np.nan,'Yes']})

data11.isnull().sum()

data11["Company_response"] = data11.groupby("Complaint_reason").transform(lambda x: x.fillna(x.mode()[0]))["Company_response"]
data11["Consumer_disputes"] = data11.groupby("Transaction_Type").transform(lambda x: x.fillna(x.mode()[0]))["Consumer_disputes"]









share|improve this question

























  • the question literally died last time, i edited it, left comments but no one answered for almost 6 days, so unfortunately i had to post it again as i do not have any bounties to offer, so guys if you find it interesting and are unable to solve it, please upvote the question so that it might interest others as well...

    – Ashu Grover
    Jan 1 at 11:10











  • Could you add a small input sample and the expected output

    – Daniel Mesejo
    Jan 1 at 11:36






  • 1





    the question did not "literally die" - this is a metaphor. it figuratively died!

    – Josh Friedlander
    Jan 1 at 12:33













  • @JoshFriedlander haha... yes Josh... got a bit carried away i guess...

    – Ashu Grover
    Jan 1 at 12:40











  • :) as for your question - it would help if you could post like 5 rows of your data, or made-up equivalents - that screenshot is the right idea but text is much easier to work with than an image

    – Josh Friedlander
    Jan 1 at 12:45


















1















I have a dataset which requires missing value treatment.



 Column                      Missing Values

Complaint_ID 0
Date_received 0
Transaction_Type 0
Complaint_reason 0
Company_response 22506
Date_sent_to_company 0
Complaint_Status 0
Consumer_disputes 7698


Now the problem is, when I try to replace the missing values with mode of other columns using groupby:



Code:



data11["Company_response"] = 
data11.groupby("Complaint_reason").transform(lambda x: x.fillna(x.mode()
[0]))["Company_response"]

data11["Consumer_disputes"] =
data11.groupby("Transaction_Type").transform(lambda x: x.fillna(x.mode()
[0]))["Consumer_disputes"]


I get the following error:



Stacktrace



Traceback (most recent call last):

File "<ipython-input-89-8de6a010a299>", line 1, in <module>
data11["Company_response"] = data11.groupby("Complaint_reason").transform(lambda x: x.fillna(x.mode()[0]))["Company_response"]

File "C:Anaconda3libsite-packagespandascoregroupby.py", line 3741, in transform
return self._transform_general(func, *args, **kwargs)

File "C:Anaconda3libsite-packagespandascoregroupby.py", line 3699, in _transform_general
res = path(group)

File "C:Anaconda3libsite-packagespandascoregroupby.py", line 3783, in <lambda>
lambda x: func(x, *args, **kwargs), axis=self.axis)

File "C:Anaconda3libsite-packagespandascoreframe.py", line 4360, in apply
ignore_failures=ignore_failures)

File "C:Anaconda3libsite-packagespandascoreframe.py", line 4456, in _apply_standard
results[i] = func(v)

File "C:Anaconda3libsite-packagespandascoregroupby.py", line 3783, in <lambda>
lambda x: func(x, *args, **kwargs), axis=self.axis)

File "<ipython-input-89-8de6a010a299>", line 1, in <lambda>
data11["Company_response"] = data11.groupby("Complaint_reason").transform(lambda x: x.fillna(x.mode()[0]))["Company_response"]

File "C:Anaconda3libsite-packagespandascoreseries.py", line 601, in __getitem__
result = self.index.get_value(self, key)

File "C:Anaconda3libsite-packagespandascoreindexesbase.py", line 2434, in get_value
return libts.get_value_box(s, key)

File "pandas_libstslib.pyx", line 923, in pandas._libs.tslib.get_value_box (pandas_libstslib.c:18843)

File "pandas_libstslib.pyx", line 939, in pandas._libs.tslib.get_value_box (pandas_libstslib.c:18560)

IndexError: ('index out of bounds', 'occurred at index Consumer_disputes')


I have checked the length of the dataframeand all of its columns and it is same: 43266.



I have also found a question similar to this but does not have correct answer: Click here



Please help resolve the error.




IndexError: ('index out of bounds', 'occurred at index Consumer_disputes')




Here is a snapshot of the dataset if it helps in any way: Dataset Snapshot



I am using the below code successfully. But it does not serve my purpose exactly. Helps to fill the missing values though.



data11['Company_response'].fillna(data11['Company_response'].mode()[0], 
inplace=True)
data11['Consumer_disputes'].fillna(data11['Consumer_disputes'].mode()[0],
inplace=True)


Edit1: (Attaching Sample)



Input Given:
InputImage



Expected Output:
OutputImage



You can see that the missing values for company-response of Tr-1 and Tr-3 are filled by taking mode of Complaint-Reason.
And similarly for the Consumer-Disputes by taking mode of transaction-type, for Tr-5.



The below snippet consists of the dataframe and the code for those who want to replicate and give it a try.



Replication Code



import pandas as pd
import numpy as np

data11=pd.DataFrame({'Complaint_ID':['Tr-1','Tr-2','Tr-3','Tr-4','Tr-5','Tr-6'],
'Transaction_Type':['Mortgage','Credit card','Bank account or service','Debt collection','Credit card','Mortgage'],
'Complaint_reason':['Loan servicing, payments, escrow account','Incorrect information on credit report',"Cont'd attempts collect debt not owed","Cont'd attempts collect debt not owed",'Payoff process','Loan servicing, payments, escrow account'],
'Company_response':[np.nan,'Company chooses not to provide a public response',np.nan,'Company believes it acted appropriately as authorized by contract or law','Company has responded to the consumer and the CFPB and chooses not to provide a public response','Company disputes the facts presented in the complaint'],
'Consumer_disputes':['Yes','No','No','No',np.nan,'Yes']})

data11.isnull().sum()

data11["Company_response"] = data11.groupby("Complaint_reason").transform(lambda x: x.fillna(x.mode()[0]))["Company_response"]
data11["Consumer_disputes"] = data11.groupby("Transaction_Type").transform(lambda x: x.fillna(x.mode()[0]))["Consumer_disputes"]









share|improve this question

























  • the question literally died last time, i edited it, left comments but no one answered for almost 6 days, so unfortunately i had to post it again as i do not have any bounties to offer, so guys if you find it interesting and are unable to solve it, please upvote the question so that it might interest others as well...

    – Ashu Grover
    Jan 1 at 11:10











  • Could you add a small input sample and the expected output

    – Daniel Mesejo
    Jan 1 at 11:36






  • 1





    the question did not "literally die" - this is a metaphor. it figuratively died!

    – Josh Friedlander
    Jan 1 at 12:33













  • @JoshFriedlander haha... yes Josh... got a bit carried away i guess...

    – Ashu Grover
    Jan 1 at 12:40











  • :) as for your question - it would help if you could post like 5 rows of your data, or made-up equivalents - that screenshot is the right idea but text is much easier to work with than an image

    – Josh Friedlander
    Jan 1 at 12:45
















1












1








1


0






I have a dataset which requires missing value treatment.



 Column                      Missing Values

Complaint_ID 0
Date_received 0
Transaction_Type 0
Complaint_reason 0
Company_response 22506
Date_sent_to_company 0
Complaint_Status 0
Consumer_disputes 7698


Now the problem is, when I try to replace the missing values with mode of other columns using groupby:



Code:



data11["Company_response"] = 
data11.groupby("Complaint_reason").transform(lambda x: x.fillna(x.mode()
[0]))["Company_response"]

data11["Consumer_disputes"] =
data11.groupby("Transaction_Type").transform(lambda x: x.fillna(x.mode()
[0]))["Consumer_disputes"]


I get the following error:



Stacktrace



Traceback (most recent call last):

File "<ipython-input-89-8de6a010a299>", line 1, in <module>
data11["Company_response"] = data11.groupby("Complaint_reason").transform(lambda x: x.fillna(x.mode()[0]))["Company_response"]

File "C:Anaconda3libsite-packagespandascoregroupby.py", line 3741, in transform
return self._transform_general(func, *args, **kwargs)

File "C:Anaconda3libsite-packagespandascoregroupby.py", line 3699, in _transform_general
res = path(group)

File "C:Anaconda3libsite-packagespandascoregroupby.py", line 3783, in <lambda>
lambda x: func(x, *args, **kwargs), axis=self.axis)

File "C:Anaconda3libsite-packagespandascoreframe.py", line 4360, in apply
ignore_failures=ignore_failures)

File "C:Anaconda3libsite-packagespandascoreframe.py", line 4456, in _apply_standard
results[i] = func(v)

File "C:Anaconda3libsite-packagespandascoregroupby.py", line 3783, in <lambda>
lambda x: func(x, *args, **kwargs), axis=self.axis)

File "<ipython-input-89-8de6a010a299>", line 1, in <lambda>
data11["Company_response"] = data11.groupby("Complaint_reason").transform(lambda x: x.fillna(x.mode()[0]))["Company_response"]

File "C:Anaconda3libsite-packagespandascoreseries.py", line 601, in __getitem__
result = self.index.get_value(self, key)

File "C:Anaconda3libsite-packagespandascoreindexesbase.py", line 2434, in get_value
return libts.get_value_box(s, key)

File "pandas_libstslib.pyx", line 923, in pandas._libs.tslib.get_value_box (pandas_libstslib.c:18843)

File "pandas_libstslib.pyx", line 939, in pandas._libs.tslib.get_value_box (pandas_libstslib.c:18560)

IndexError: ('index out of bounds', 'occurred at index Consumer_disputes')


I have checked the length of the dataframeand all of its columns and it is same: 43266.



I have also found a question similar to this but does not have correct answer: Click here



Please help resolve the error.




IndexError: ('index out of bounds', 'occurred at index Consumer_disputes')




Here is a snapshot of the dataset if it helps in any way: Dataset Snapshot



I am using the below code successfully. But it does not serve my purpose exactly. Helps to fill the missing values though.



data11['Company_response'].fillna(data11['Company_response'].mode()[0], 
inplace=True)
data11['Consumer_disputes'].fillna(data11['Consumer_disputes'].mode()[0],
inplace=True)


Edit1: (Attaching Sample)



Input Given:
InputImage



Expected Output:
OutputImage



You can see that the missing values for company-response of Tr-1 and Tr-3 are filled by taking mode of Complaint-Reason.
And similarly for the Consumer-Disputes by taking mode of transaction-type, for Tr-5.



The below snippet consists of the dataframe and the code for those who want to replicate and give it a try.



Replication Code



import pandas as pd
import numpy as np

data11=pd.DataFrame({'Complaint_ID':['Tr-1','Tr-2','Tr-3','Tr-4','Tr-5','Tr-6'],
'Transaction_Type':['Mortgage','Credit card','Bank account or service','Debt collection','Credit card','Mortgage'],
'Complaint_reason':['Loan servicing, payments, escrow account','Incorrect information on credit report',"Cont'd attempts collect debt not owed","Cont'd attempts collect debt not owed",'Payoff process','Loan servicing, payments, escrow account'],
'Company_response':[np.nan,'Company chooses not to provide a public response',np.nan,'Company believes it acted appropriately as authorized by contract or law','Company has responded to the consumer and the CFPB and chooses not to provide a public response','Company disputes the facts presented in the complaint'],
'Consumer_disputes':['Yes','No','No','No',np.nan,'Yes']})

data11.isnull().sum()

data11["Company_response"] = data11.groupby("Complaint_reason").transform(lambda x: x.fillna(x.mode()[0]))["Company_response"]
data11["Consumer_disputes"] = data11.groupby("Transaction_Type").transform(lambda x: x.fillna(x.mode()[0]))["Consumer_disputes"]









share|improve this question
















I have a dataset which requires missing value treatment.



 Column                      Missing Values

Complaint_ID 0
Date_received 0
Transaction_Type 0
Complaint_reason 0
Company_response 22506
Date_sent_to_company 0
Complaint_Status 0
Consumer_disputes 7698


Now the problem is, when I try to replace the missing values with mode of other columns using groupby:



Code:



data11["Company_response"] = 
data11.groupby("Complaint_reason").transform(lambda x: x.fillna(x.mode()
[0]))["Company_response"]

data11["Consumer_disputes"] =
data11.groupby("Transaction_Type").transform(lambda x: x.fillna(x.mode()
[0]))["Consumer_disputes"]


I get the following error:



Stacktrace



Traceback (most recent call last):

File "<ipython-input-89-8de6a010a299>", line 1, in <module>
data11["Company_response"] = data11.groupby("Complaint_reason").transform(lambda x: x.fillna(x.mode()[0]))["Company_response"]

File "C:Anaconda3libsite-packagespandascoregroupby.py", line 3741, in transform
return self._transform_general(func, *args, **kwargs)

File "C:Anaconda3libsite-packagespandascoregroupby.py", line 3699, in _transform_general
res = path(group)

File "C:Anaconda3libsite-packagespandascoregroupby.py", line 3783, in <lambda>
lambda x: func(x, *args, **kwargs), axis=self.axis)

File "C:Anaconda3libsite-packagespandascoreframe.py", line 4360, in apply
ignore_failures=ignore_failures)

File "C:Anaconda3libsite-packagespandascoreframe.py", line 4456, in _apply_standard
results[i] = func(v)

File "C:Anaconda3libsite-packagespandascoregroupby.py", line 3783, in <lambda>
lambda x: func(x, *args, **kwargs), axis=self.axis)

File "<ipython-input-89-8de6a010a299>", line 1, in <lambda>
data11["Company_response"] = data11.groupby("Complaint_reason").transform(lambda x: x.fillna(x.mode()[0]))["Company_response"]

File "C:Anaconda3libsite-packagespandascoreseries.py", line 601, in __getitem__
result = self.index.get_value(self, key)

File "C:Anaconda3libsite-packagespandascoreindexesbase.py", line 2434, in get_value
return libts.get_value_box(s, key)

File "pandas_libstslib.pyx", line 923, in pandas._libs.tslib.get_value_box (pandas_libstslib.c:18843)

File "pandas_libstslib.pyx", line 939, in pandas._libs.tslib.get_value_box (pandas_libstslib.c:18560)

IndexError: ('index out of bounds', 'occurred at index Consumer_disputes')


I have checked the length of the dataframeand all of its columns and it is same: 43266.



I have also found a question similar to this but does not have correct answer: Click here



Please help resolve the error.




IndexError: ('index out of bounds', 'occurred at index Consumer_disputes')




Here is a snapshot of the dataset if it helps in any way: Dataset Snapshot



I am using the below code successfully. But it does not serve my purpose exactly. Helps to fill the missing values though.



data11['Company_response'].fillna(data11['Company_response'].mode()[0], 
inplace=True)
data11['Consumer_disputes'].fillna(data11['Consumer_disputes'].mode()[0],
inplace=True)


Edit1: (Attaching Sample)



Input Given:
InputImage



Expected Output:
OutputImage



You can see that the missing values for company-response of Tr-1 and Tr-3 are filled by taking mode of Complaint-Reason.
And similarly for the Consumer-Disputes by taking mode of transaction-type, for Tr-5.



The below snippet consists of the dataframe and the code for those who want to replicate and give it a try.



Replication Code



import pandas as pd
import numpy as np

data11=pd.DataFrame({'Complaint_ID':['Tr-1','Tr-2','Tr-3','Tr-4','Tr-5','Tr-6'],
'Transaction_Type':['Mortgage','Credit card','Bank account or service','Debt collection','Credit card','Mortgage'],
'Complaint_reason':['Loan servicing, payments, escrow account','Incorrect information on credit report',"Cont'd attempts collect debt not owed","Cont'd attempts collect debt not owed",'Payoff process','Loan servicing, payments, escrow account'],
'Company_response':[np.nan,'Company chooses not to provide a public response',np.nan,'Company believes it acted appropriately as authorized by contract or law','Company has responded to the consumer and the CFPB and chooses not to provide a public response','Company disputes the facts presented in the complaint'],
'Consumer_disputes':['Yes','No','No','No',np.nan,'Yes']})

data11.isnull().sum()

data11["Company_response"] = data11.groupby("Complaint_reason").transform(lambda x: x.fillna(x.mode()[0]))["Company_response"]
data11["Consumer_disputes"] = data11.groupby("Transaction_Type").transform(lambda x: x.fillna(x.mode()[0]))["Consumer_disputes"]






python pandas dataframe pandas-groupby missing-data






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 1 at 17:46







Ashu Grover

















asked Jan 1 at 10:12









Ashu GroverAshu Grover

151112




151112













  • the question literally died last time, i edited it, left comments but no one answered for almost 6 days, so unfortunately i had to post it again as i do not have any bounties to offer, so guys if you find it interesting and are unable to solve it, please upvote the question so that it might interest others as well...

    – Ashu Grover
    Jan 1 at 11:10











  • Could you add a small input sample and the expected output

    – Daniel Mesejo
    Jan 1 at 11:36






  • 1





    the question did not "literally die" - this is a metaphor. it figuratively died!

    – Josh Friedlander
    Jan 1 at 12:33













  • @JoshFriedlander haha... yes Josh... got a bit carried away i guess...

    – Ashu Grover
    Jan 1 at 12:40











  • :) as for your question - it would help if you could post like 5 rows of your data, or made-up equivalents - that screenshot is the right idea but text is much easier to work with than an image

    – Josh Friedlander
    Jan 1 at 12:45





















  • the question literally died last time, i edited it, left comments but no one answered for almost 6 days, so unfortunately i had to post it again as i do not have any bounties to offer, so guys if you find it interesting and are unable to solve it, please upvote the question so that it might interest others as well...

    – Ashu Grover
    Jan 1 at 11:10











  • Could you add a small input sample and the expected output

    – Daniel Mesejo
    Jan 1 at 11:36






  • 1





    the question did not "literally die" - this is a metaphor. it figuratively died!

    – Josh Friedlander
    Jan 1 at 12:33













  • @JoshFriedlander haha... yes Josh... got a bit carried away i guess...

    – Ashu Grover
    Jan 1 at 12:40











  • :) as for your question - it would help if you could post like 5 rows of your data, or made-up equivalents - that screenshot is the right idea but text is much easier to work with than an image

    – Josh Friedlander
    Jan 1 at 12:45



















the question literally died last time, i edited it, left comments but no one answered for almost 6 days, so unfortunately i had to post it again as i do not have any bounties to offer, so guys if you find it interesting and are unable to solve it, please upvote the question so that it might interest others as well...

– Ashu Grover
Jan 1 at 11:10





the question literally died last time, i edited it, left comments but no one answered for almost 6 days, so unfortunately i had to post it again as i do not have any bounties to offer, so guys if you find it interesting and are unable to solve it, please upvote the question so that it might interest others as well...

– Ashu Grover
Jan 1 at 11:10













Could you add a small input sample and the expected output

– Daniel Mesejo
Jan 1 at 11:36





Could you add a small input sample and the expected output

– Daniel Mesejo
Jan 1 at 11:36




1




1





the question did not "literally die" - this is a metaphor. it figuratively died!

– Josh Friedlander
Jan 1 at 12:33







the question did not "literally die" - this is a metaphor. it figuratively died!

– Josh Friedlander
Jan 1 at 12:33















@JoshFriedlander haha... yes Josh... got a bit carried away i guess...

– Ashu Grover
Jan 1 at 12:40





@JoshFriedlander haha... yes Josh... got a bit carried away i guess...

– Ashu Grover
Jan 1 at 12:40













:) as for your question - it would help if you could post like 5 rows of your data, or made-up equivalents - that screenshot is the right idea but text is much easier to work with than an image

– Josh Friedlander
Jan 1 at 12:45







:) as for your question - it would help if you could post like 5 rows of your data, or made-up equivalents - that screenshot is the right idea but text is much easier to work with than an image

– Josh Friedlander
Jan 1 at 12:45














3 Answers
3






active

oldest

votes


















1














Try:



data11["Company_response"] = data11.groupby("Complaint_reason")['Company_response'].transform(lambda x: x.fillna(x.mode()[0]))

data11["Consumer_disputes"] = data11.groupby("Transaction_Type")['Consumer_disputes'].transform(lambda x: x.fillna(x.mode()[0]))





share|improve this answer


























  • Thanks Scott... :)

    – Ashu Grover
    Jan 2 at 16:36











  • @AshuGrover You're welcome. Happy coding. Thanks for editing my solution to match your needs.

    – Scott Boston
    Jan 2 at 16:37



















2














The error is raised because for at least one of the groups the values in corresponding aggregated columns contains only np.nan values. In this case pd.Series([np.nan]).mode() returns an empty series which leads to an error when you take the first value.



So, you may use something like transform(lambda x: x.fillna(x.mode()[0] if not x.mode().empty else "Empty") ).






share|improve this answer


























  • On running this I get the exact error mentioned in the question. Mikhail can you please replicate it on your local (I have given the code in the question for replication) , I am really stuck on this since long now..

    – Ashu Grover
    Jan 1 at 16:58











  • Could you provide a full stacktrace when you run it along with a self-sufficient code to create an input data for that?

    – Mikhail Berlinkov
    Jan 1 at 17:03











  • Mikhail I have mentioned the self sufficient code with input and the stacktrace both in the question itself..

    – Ashu Grover
    Jan 1 at 17:19











  • I meant the stacktrace when you run data11.groupby("Transaction_Type").transform(lambda x: x.fillna(x.mode()[0])). I want to be sure it's indeed the same which is very unlikely. Also, I didn't find code snippet to create an input dataframe. I can't take it from screenshot of an excel file.

    – Mikhail Berlinkov
    Jan 1 at 17:22













  • Mikhail please observe carefully I have mentioned the Replication Code (in bold letters below the excel screenshots) with the input dataframe... and the stacktrace I get when I run data11.groupby("Transaction_Type").transform(lambda x: x.fillna(x.mode()[0])) is very long it cannot be put in the comments section, so let me highlight the stacktrace also in the question itself in bolds. Note: the stacktrace is exactly same for the code you have asked me to run.

    – Ashu Grover
    Jan 1 at 17:36





















0














@Mikhail Berlinkov is almost certainly correct. I was able to reproduce your error, and then avoid it by using dropna():



data11.groupby("Transaction-Type").transform(
lambda x: x.fillna(x.mode() [0]))["Consumer-disputes"]
# Returns IndexError

data11.dropna().groupby("Transaction-Type").transform(
lambda x: x.fillna(x.mode() [0]))["Consumer-disputes"]
# Works





share|improve this answer
























  • Thanks for the input Josh but this fill further mess up the dataframe. Try it for yourself and see the results...

    – Ashu Grover
    Jan 2 at 16:24











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53994621%2findexerror-when-replacing-missing-values-with-mode-using-groupby-in-pandas%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























3 Answers
3






active

oldest

votes








3 Answers
3






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














Try:



data11["Company_response"] = data11.groupby("Complaint_reason")['Company_response'].transform(lambda x: x.fillna(x.mode()[0]))

data11["Consumer_disputes"] = data11.groupby("Transaction_Type")['Consumer_disputes'].transform(lambda x: x.fillna(x.mode()[0]))





share|improve this answer


























  • Thanks Scott... :)

    – Ashu Grover
    Jan 2 at 16:36











  • @AshuGrover You're welcome. Happy coding. Thanks for editing my solution to match your needs.

    – Scott Boston
    Jan 2 at 16:37
















1














Try:



data11["Company_response"] = data11.groupby("Complaint_reason")['Company_response'].transform(lambda x: x.fillna(x.mode()[0]))

data11["Consumer_disputes"] = data11.groupby("Transaction_Type")['Consumer_disputes'].transform(lambda x: x.fillna(x.mode()[0]))





share|improve this answer


























  • Thanks Scott... :)

    – Ashu Grover
    Jan 2 at 16:36











  • @AshuGrover You're welcome. Happy coding. Thanks for editing my solution to match your needs.

    – Scott Boston
    Jan 2 at 16:37














1












1








1







Try:



data11["Company_response"] = data11.groupby("Complaint_reason")['Company_response'].transform(lambda x: x.fillna(x.mode()[0]))

data11["Consumer_disputes"] = data11.groupby("Transaction_Type")['Consumer_disputes'].transform(lambda x: x.fillna(x.mode()[0]))





share|improve this answer















Try:



data11["Company_response"] = data11.groupby("Complaint_reason")['Company_response'].transform(lambda x: x.fillna(x.mode()[0]))

data11["Consumer_disputes"] = data11.groupby("Transaction_Type")['Consumer_disputes'].transform(lambda x: x.fillna(x.mode()[0]))






share|improve this answer














share|improve this answer



share|improve this answer








edited Jan 2 at 16:34









Ashu Grover

151112




151112










answered Jan 1 at 20:09









Scott BostonScott Boston

56.3k73157




56.3k73157













  • Thanks Scott... :)

    – Ashu Grover
    Jan 2 at 16:36











  • @AshuGrover You're welcome. Happy coding. Thanks for editing my solution to match your needs.

    – Scott Boston
    Jan 2 at 16:37



















  • Thanks Scott... :)

    – Ashu Grover
    Jan 2 at 16:36











  • @AshuGrover You're welcome. Happy coding. Thanks for editing my solution to match your needs.

    – Scott Boston
    Jan 2 at 16:37

















Thanks Scott... :)

– Ashu Grover
Jan 2 at 16:36





Thanks Scott... :)

– Ashu Grover
Jan 2 at 16:36













@AshuGrover You're welcome. Happy coding. Thanks for editing my solution to match your needs.

– Scott Boston
Jan 2 at 16:37





@AshuGrover You're welcome. Happy coding. Thanks for editing my solution to match your needs.

– Scott Boston
Jan 2 at 16:37













2














The error is raised because for at least one of the groups the values in corresponding aggregated columns contains only np.nan values. In this case pd.Series([np.nan]).mode() returns an empty series which leads to an error when you take the first value.



So, you may use something like transform(lambda x: x.fillna(x.mode()[0] if not x.mode().empty else "Empty") ).






share|improve this answer


























  • On running this I get the exact error mentioned in the question. Mikhail can you please replicate it on your local (I have given the code in the question for replication) , I am really stuck on this since long now..

    – Ashu Grover
    Jan 1 at 16:58











  • Could you provide a full stacktrace when you run it along with a self-sufficient code to create an input data for that?

    – Mikhail Berlinkov
    Jan 1 at 17:03











  • Mikhail I have mentioned the self sufficient code with input and the stacktrace both in the question itself..

    – Ashu Grover
    Jan 1 at 17:19











  • I meant the stacktrace when you run data11.groupby("Transaction_Type").transform(lambda x: x.fillna(x.mode()[0])). I want to be sure it's indeed the same which is very unlikely. Also, I didn't find code snippet to create an input dataframe. I can't take it from screenshot of an excel file.

    – Mikhail Berlinkov
    Jan 1 at 17:22













  • Mikhail please observe carefully I have mentioned the Replication Code (in bold letters below the excel screenshots) with the input dataframe... and the stacktrace I get when I run data11.groupby("Transaction_Type").transform(lambda x: x.fillna(x.mode()[0])) is very long it cannot be put in the comments section, so let me highlight the stacktrace also in the question itself in bolds. Note: the stacktrace is exactly same for the code you have asked me to run.

    – Ashu Grover
    Jan 1 at 17:36


















2














The error is raised because for at least one of the groups the values in corresponding aggregated columns contains only np.nan values. In this case pd.Series([np.nan]).mode() returns an empty series which leads to an error when you take the first value.



So, you may use something like transform(lambda x: x.fillna(x.mode()[0] if not x.mode().empty else "Empty") ).






share|improve this answer


























  • On running this I get the exact error mentioned in the question. Mikhail can you please replicate it on your local (I have given the code in the question for replication) , I am really stuck on this since long now..

    – Ashu Grover
    Jan 1 at 16:58











  • Could you provide a full stacktrace when you run it along with a self-sufficient code to create an input data for that?

    – Mikhail Berlinkov
    Jan 1 at 17:03











  • Mikhail I have mentioned the self sufficient code with input and the stacktrace both in the question itself..

    – Ashu Grover
    Jan 1 at 17:19











  • I meant the stacktrace when you run data11.groupby("Transaction_Type").transform(lambda x: x.fillna(x.mode()[0])). I want to be sure it's indeed the same which is very unlikely. Also, I didn't find code snippet to create an input dataframe. I can't take it from screenshot of an excel file.

    – Mikhail Berlinkov
    Jan 1 at 17:22













  • Mikhail please observe carefully I have mentioned the Replication Code (in bold letters below the excel screenshots) with the input dataframe... and the stacktrace I get when I run data11.groupby("Transaction_Type").transform(lambda x: x.fillna(x.mode()[0])) is very long it cannot be put in the comments section, so let me highlight the stacktrace also in the question itself in bolds. Note: the stacktrace is exactly same for the code you have asked me to run.

    – Ashu Grover
    Jan 1 at 17:36
















2












2








2







The error is raised because for at least one of the groups the values in corresponding aggregated columns contains only np.nan values. In this case pd.Series([np.nan]).mode() returns an empty series which leads to an error when you take the first value.



So, you may use something like transform(lambda x: x.fillna(x.mode()[0] if not x.mode().empty else "Empty") ).






share|improve this answer















The error is raised because for at least one of the groups the values in corresponding aggregated columns contains only np.nan values. In this case pd.Series([np.nan]).mode() returns an empty series which leads to an error when you take the first value.



So, you may use something like transform(lambda x: x.fillna(x.mode()[0] if not x.mode().empty else "Empty") ).







share|improve this answer














share|improve this answer



share|improve this answer








edited Jan 1 at 19:10

























answered Jan 1 at 16:42









Mikhail BerlinkovMikhail Berlinkov

1,174411




1,174411













  • On running this I get the exact error mentioned in the question. Mikhail can you please replicate it on your local (I have given the code in the question for replication) , I am really stuck on this since long now..

    – Ashu Grover
    Jan 1 at 16:58











  • Could you provide a full stacktrace when you run it along with a self-sufficient code to create an input data for that?

    – Mikhail Berlinkov
    Jan 1 at 17:03











  • Mikhail I have mentioned the self sufficient code with input and the stacktrace both in the question itself..

    – Ashu Grover
    Jan 1 at 17:19











  • I meant the stacktrace when you run data11.groupby("Transaction_Type").transform(lambda x: x.fillna(x.mode()[0])). I want to be sure it's indeed the same which is very unlikely. Also, I didn't find code snippet to create an input dataframe. I can't take it from screenshot of an excel file.

    – Mikhail Berlinkov
    Jan 1 at 17:22













  • Mikhail please observe carefully I have mentioned the Replication Code (in bold letters below the excel screenshots) with the input dataframe... and the stacktrace I get when I run data11.groupby("Transaction_Type").transform(lambda x: x.fillna(x.mode()[0])) is very long it cannot be put in the comments section, so let me highlight the stacktrace also in the question itself in bolds. Note: the stacktrace is exactly same for the code you have asked me to run.

    – Ashu Grover
    Jan 1 at 17:36





















  • On running this I get the exact error mentioned in the question. Mikhail can you please replicate it on your local (I have given the code in the question for replication) , I am really stuck on this since long now..

    – Ashu Grover
    Jan 1 at 16:58











  • Could you provide a full stacktrace when you run it along with a self-sufficient code to create an input data for that?

    – Mikhail Berlinkov
    Jan 1 at 17:03











  • Mikhail I have mentioned the self sufficient code with input and the stacktrace both in the question itself..

    – Ashu Grover
    Jan 1 at 17:19











  • I meant the stacktrace when you run data11.groupby("Transaction_Type").transform(lambda x: x.fillna(x.mode()[0])). I want to be sure it's indeed the same which is very unlikely. Also, I didn't find code snippet to create an input dataframe. I can't take it from screenshot of an excel file.

    – Mikhail Berlinkov
    Jan 1 at 17:22













  • Mikhail please observe carefully I have mentioned the Replication Code (in bold letters below the excel screenshots) with the input dataframe... and the stacktrace I get when I run data11.groupby("Transaction_Type").transform(lambda x: x.fillna(x.mode()[0])) is very long it cannot be put in the comments section, so let me highlight the stacktrace also in the question itself in bolds. Note: the stacktrace is exactly same for the code you have asked me to run.

    – Ashu Grover
    Jan 1 at 17:36



















On running this I get the exact error mentioned in the question. Mikhail can you please replicate it on your local (I have given the code in the question for replication) , I am really stuck on this since long now..

– Ashu Grover
Jan 1 at 16:58





On running this I get the exact error mentioned in the question. Mikhail can you please replicate it on your local (I have given the code in the question for replication) , I am really stuck on this since long now..

– Ashu Grover
Jan 1 at 16:58













Could you provide a full stacktrace when you run it along with a self-sufficient code to create an input data for that?

– Mikhail Berlinkov
Jan 1 at 17:03





Could you provide a full stacktrace when you run it along with a self-sufficient code to create an input data for that?

– Mikhail Berlinkov
Jan 1 at 17:03













Mikhail I have mentioned the self sufficient code with input and the stacktrace both in the question itself..

– Ashu Grover
Jan 1 at 17:19





Mikhail I have mentioned the self sufficient code with input and the stacktrace both in the question itself..

– Ashu Grover
Jan 1 at 17:19













I meant the stacktrace when you run data11.groupby("Transaction_Type").transform(lambda x: x.fillna(x.mode()[0])). I want to be sure it's indeed the same which is very unlikely. Also, I didn't find code snippet to create an input dataframe. I can't take it from screenshot of an excel file.

– Mikhail Berlinkov
Jan 1 at 17:22







I meant the stacktrace when you run data11.groupby("Transaction_Type").transform(lambda x: x.fillna(x.mode()[0])). I want to be sure it's indeed the same which is very unlikely. Also, I didn't find code snippet to create an input dataframe. I can't take it from screenshot of an excel file.

– Mikhail Berlinkov
Jan 1 at 17:22















Mikhail please observe carefully I have mentioned the Replication Code (in bold letters below the excel screenshots) with the input dataframe... and the stacktrace I get when I run data11.groupby("Transaction_Type").transform(lambda x: x.fillna(x.mode()[0])) is very long it cannot be put in the comments section, so let me highlight the stacktrace also in the question itself in bolds. Note: the stacktrace is exactly same for the code you have asked me to run.

– Ashu Grover
Jan 1 at 17:36







Mikhail please observe carefully I have mentioned the Replication Code (in bold letters below the excel screenshots) with the input dataframe... and the stacktrace I get when I run data11.groupby("Transaction_Type").transform(lambda x: x.fillna(x.mode()[0])) is very long it cannot be put in the comments section, so let me highlight the stacktrace also in the question itself in bolds. Note: the stacktrace is exactly same for the code you have asked me to run.

– Ashu Grover
Jan 1 at 17:36













0














@Mikhail Berlinkov is almost certainly correct. I was able to reproduce your error, and then avoid it by using dropna():



data11.groupby("Transaction-Type").transform(
lambda x: x.fillna(x.mode() [0]))["Consumer-disputes"]
# Returns IndexError

data11.dropna().groupby("Transaction-Type").transform(
lambda x: x.fillna(x.mode() [0]))["Consumer-disputes"]
# Works





share|improve this answer
























  • Thanks for the input Josh but this fill further mess up the dataframe. Try it for yourself and see the results...

    – Ashu Grover
    Jan 2 at 16:24
















0














@Mikhail Berlinkov is almost certainly correct. I was able to reproduce your error, and then avoid it by using dropna():



data11.groupby("Transaction-Type").transform(
lambda x: x.fillna(x.mode() [0]))["Consumer-disputes"]
# Returns IndexError

data11.dropna().groupby("Transaction-Type").transform(
lambda x: x.fillna(x.mode() [0]))["Consumer-disputes"]
# Works





share|improve this answer
























  • Thanks for the input Josh but this fill further mess up the dataframe. Try it for yourself and see the results...

    – Ashu Grover
    Jan 2 at 16:24














0












0








0







@Mikhail Berlinkov is almost certainly correct. I was able to reproduce your error, and then avoid it by using dropna():



data11.groupby("Transaction-Type").transform(
lambda x: x.fillna(x.mode() [0]))["Consumer-disputes"]
# Returns IndexError

data11.dropna().groupby("Transaction-Type").transform(
lambda x: x.fillna(x.mode() [0]))["Consumer-disputes"]
# Works





share|improve this answer













@Mikhail Berlinkov is almost certainly correct. I was able to reproduce your error, and then avoid it by using dropna():



data11.groupby("Transaction-Type").transform(
lambda x: x.fillna(x.mode() [0]))["Consumer-disputes"]
# Returns IndexError

data11.dropna().groupby("Transaction-Type").transform(
lambda x: x.fillna(x.mode() [0]))["Consumer-disputes"]
# Works






share|improve this answer












share|improve this answer



share|improve this answer










answered Jan 2 at 7:53









Josh FriedlanderJosh Friedlander

2,7171928




2,7171928













  • Thanks for the input Josh but this fill further mess up the dataframe. Try it for yourself and see the results...

    – Ashu Grover
    Jan 2 at 16:24



















  • Thanks for the input Josh but this fill further mess up the dataframe. Try it for yourself and see the results...

    – Ashu Grover
    Jan 2 at 16:24

















Thanks for the input Josh but this fill further mess up the dataframe. Try it for yourself and see the results...

– Ashu Grover
Jan 2 at 16:24





Thanks for the input Josh but this fill further mess up the dataframe. Try it for yourself and see the results...

– Ashu Grover
Jan 2 at 16:24


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53994621%2findexerror-when-replacing-missing-values-with-mode-using-groupby-in-pandas%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

'app-layout' is not a known element: how to share Component with different Modules

android studio warns about leanback feature tag usage required on manifest while using Unity exported app?

WPF add header to Image with URL pettitions [duplicate]