Pandas read_csv low_memory and dtype options. TypeError: Cannot cast array from dtype('O') to...












0















I am trying to read a csv file



df = pd.read_csv('Salaries.csv')


I have this:




sys:1: DtypeWarning: Columns (3,4,5,6,12) have mixed types. Specify dtype option on import or set low_memory=False.




So, I tried:



df = pd.read_csv('Salaries.csv', sep=',', dtype={
'Id': int,
'EmployeeName': str,
'JobTitle': str,
'BasePay': float,
'OvertimePay': float,
'OtherPay': float,
'Benefits': float,
'TotalPay': np.float64,
'TotalPayBenefits': np.float64,
'Year': np.int64,
'Notes': np.float64,
'Agency': str,
'Status': float})


And now I have this:




Traceback (most recent call last): File "pandas_libsparsers.pyx", line 1156, in
pandas._libs.parsers.TextReader._convert_tokens TypeError: Cannot cast
array from dtype('O') to dtype('float64') according to the rule 'safe'




Also I have read previously asked questions and official docs but don't understand where the problem is.



Here is example of data from Salaries.csv




Id,EmployeeName,JobTitle,BasePay,OvertimePay,OtherPay,Benefits,TotalPay,TotalPayBenefits,Year,Notes,Agency,Status



1,NATHANIEL FORD,GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY,167411.18,0.0,400184.25,,567595.43,567595.43,2011,,San Francisco,











share|improve this question























  • check this

    – anky_91
    Nov 20 '18 at 15:01











  • Have you tried speficying a dtype of O for those columns with mixed types? Right now you are forcing a dtype of float64on columns that have mixed types, which of course raises and error.

    – yatu
    Nov 20 '18 at 15:03













  • The type for the last column Status should be object

    – Vaishali
    Nov 20 '18 at 15:05











  • @AlexandreNixon Thanks, it has helped. But columns 7 and 8 are casted to 567595 instead of 567595.43. Do you have any suggestions?

    – oliinykmd
    Nov 20 '18 at 15:29













  • run the first line of code again: df = pd.read_csv('Salaries.csv') and do print(df.dtypes) to compare the infered dtypes from the ones you're writing explicitly. May give you some insight, at least see which are the offending columns

    – Robert
    Nov 20 '18 at 15:59
















0















I am trying to read a csv file



df = pd.read_csv('Salaries.csv')


I have this:




sys:1: DtypeWarning: Columns (3,4,5,6,12) have mixed types. Specify dtype option on import or set low_memory=False.




So, I tried:



df = pd.read_csv('Salaries.csv', sep=',', dtype={
'Id': int,
'EmployeeName': str,
'JobTitle': str,
'BasePay': float,
'OvertimePay': float,
'OtherPay': float,
'Benefits': float,
'TotalPay': np.float64,
'TotalPayBenefits': np.float64,
'Year': np.int64,
'Notes': np.float64,
'Agency': str,
'Status': float})


And now I have this:




Traceback (most recent call last): File "pandas_libsparsers.pyx", line 1156, in
pandas._libs.parsers.TextReader._convert_tokens TypeError: Cannot cast
array from dtype('O') to dtype('float64') according to the rule 'safe'




Also I have read previously asked questions and official docs but don't understand where the problem is.



Here is example of data from Salaries.csv




Id,EmployeeName,JobTitle,BasePay,OvertimePay,OtherPay,Benefits,TotalPay,TotalPayBenefits,Year,Notes,Agency,Status



1,NATHANIEL FORD,GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY,167411.18,0.0,400184.25,,567595.43,567595.43,2011,,San Francisco,











share|improve this question























  • check this

    – anky_91
    Nov 20 '18 at 15:01











  • Have you tried speficying a dtype of O for those columns with mixed types? Right now you are forcing a dtype of float64on columns that have mixed types, which of course raises and error.

    – yatu
    Nov 20 '18 at 15:03













  • The type for the last column Status should be object

    – Vaishali
    Nov 20 '18 at 15:05











  • @AlexandreNixon Thanks, it has helped. But columns 7 and 8 are casted to 567595 instead of 567595.43. Do you have any suggestions?

    – oliinykmd
    Nov 20 '18 at 15:29













  • run the first line of code again: df = pd.read_csv('Salaries.csv') and do print(df.dtypes) to compare the infered dtypes from the ones you're writing explicitly. May give you some insight, at least see which are the offending columns

    – Robert
    Nov 20 '18 at 15:59














0












0








0








I am trying to read a csv file



df = pd.read_csv('Salaries.csv')


I have this:




sys:1: DtypeWarning: Columns (3,4,5,6,12) have mixed types. Specify dtype option on import or set low_memory=False.




So, I tried:



df = pd.read_csv('Salaries.csv', sep=',', dtype={
'Id': int,
'EmployeeName': str,
'JobTitle': str,
'BasePay': float,
'OvertimePay': float,
'OtherPay': float,
'Benefits': float,
'TotalPay': np.float64,
'TotalPayBenefits': np.float64,
'Year': np.int64,
'Notes': np.float64,
'Agency': str,
'Status': float})


And now I have this:




Traceback (most recent call last): File "pandas_libsparsers.pyx", line 1156, in
pandas._libs.parsers.TextReader._convert_tokens TypeError: Cannot cast
array from dtype('O') to dtype('float64') according to the rule 'safe'




Also I have read previously asked questions and official docs but don't understand where the problem is.



Here is example of data from Salaries.csv




Id,EmployeeName,JobTitle,BasePay,OvertimePay,OtherPay,Benefits,TotalPay,TotalPayBenefits,Year,Notes,Agency,Status



1,NATHANIEL FORD,GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY,167411.18,0.0,400184.25,,567595.43,567595.43,2011,,San Francisco,











share|improve this question














I am trying to read a csv file



df = pd.read_csv('Salaries.csv')


I have this:




sys:1: DtypeWarning: Columns (3,4,5,6,12) have mixed types. Specify dtype option on import or set low_memory=False.




So, I tried:



df = pd.read_csv('Salaries.csv', sep=',', dtype={
'Id': int,
'EmployeeName': str,
'JobTitle': str,
'BasePay': float,
'OvertimePay': float,
'OtherPay': float,
'Benefits': float,
'TotalPay': np.float64,
'TotalPayBenefits': np.float64,
'Year': np.int64,
'Notes': np.float64,
'Agency': str,
'Status': float})


And now I have this:




Traceback (most recent call last): File "pandas_libsparsers.pyx", line 1156, in
pandas._libs.parsers.TextReader._convert_tokens TypeError: Cannot cast
array from dtype('O') to dtype('float64') according to the rule 'safe'




Also I have read previously asked questions and official docs but don't understand where the problem is.



Here is example of data from Salaries.csv




Id,EmployeeName,JobTitle,BasePay,OvertimePay,OtherPay,Benefits,TotalPay,TotalPayBenefits,Year,Notes,Agency,Status



1,NATHANIEL FORD,GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY,167411.18,0.0,400184.25,,567595.43,567595.43,2011,,San Francisco,








python python-3.x pandas csv dataframe






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 20 '18 at 14:51









oliinykmdoliinykmd

112




112













  • check this

    – anky_91
    Nov 20 '18 at 15:01











  • Have you tried speficying a dtype of O for those columns with mixed types? Right now you are forcing a dtype of float64on columns that have mixed types, which of course raises and error.

    – yatu
    Nov 20 '18 at 15:03













  • The type for the last column Status should be object

    – Vaishali
    Nov 20 '18 at 15:05











  • @AlexandreNixon Thanks, it has helped. But columns 7 and 8 are casted to 567595 instead of 567595.43. Do you have any suggestions?

    – oliinykmd
    Nov 20 '18 at 15:29













  • run the first line of code again: df = pd.read_csv('Salaries.csv') and do print(df.dtypes) to compare the infered dtypes from the ones you're writing explicitly. May give you some insight, at least see which are the offending columns

    – Robert
    Nov 20 '18 at 15:59



















  • check this

    – anky_91
    Nov 20 '18 at 15:01











  • Have you tried speficying a dtype of O for those columns with mixed types? Right now you are forcing a dtype of float64on columns that have mixed types, which of course raises and error.

    – yatu
    Nov 20 '18 at 15:03













  • The type for the last column Status should be object

    – Vaishali
    Nov 20 '18 at 15:05











  • @AlexandreNixon Thanks, it has helped. But columns 7 and 8 are casted to 567595 instead of 567595.43. Do you have any suggestions?

    – oliinykmd
    Nov 20 '18 at 15:29













  • run the first line of code again: df = pd.read_csv('Salaries.csv') and do print(df.dtypes) to compare the infered dtypes from the ones you're writing explicitly. May give you some insight, at least see which are the offending columns

    – Robert
    Nov 20 '18 at 15:59

















check this

– anky_91
Nov 20 '18 at 15:01





check this

– anky_91
Nov 20 '18 at 15:01













Have you tried speficying a dtype of O for those columns with mixed types? Right now you are forcing a dtype of float64on columns that have mixed types, which of course raises and error.

– yatu
Nov 20 '18 at 15:03







Have you tried speficying a dtype of O for those columns with mixed types? Right now you are forcing a dtype of float64on columns that have mixed types, which of course raises and error.

– yatu
Nov 20 '18 at 15:03















The type for the last column Status should be object

– Vaishali
Nov 20 '18 at 15:05





The type for the last column Status should be object

– Vaishali
Nov 20 '18 at 15:05













@AlexandreNixon Thanks, it has helped. But columns 7 and 8 are casted to 567595 instead of 567595.43. Do you have any suggestions?

– oliinykmd
Nov 20 '18 at 15:29







@AlexandreNixon Thanks, it has helped. But columns 7 and 8 are casted to 567595 instead of 567595.43. Do you have any suggestions?

– oliinykmd
Nov 20 '18 at 15:29















run the first line of code again: df = pd.read_csv('Salaries.csv') and do print(df.dtypes) to compare the infered dtypes from the ones you're writing explicitly. May give you some insight, at least see which are the offending columns

– Robert
Nov 20 '18 at 15:59





run the first line of code again: df = pd.read_csv('Salaries.csv') and do print(df.dtypes) to compare the infered dtypes from the ones you're writing explicitly. May give you some insight, at least see which are the offending columns

– Robert
Nov 20 '18 at 15:59












1 Answer
1






active

oldest

votes


















0














There may be nan values in your dataframe. So when you specify a dtype ensure you have filled all the columns with some value to avoid mixed dtype for that column.



For eg :



column_name
np.nan
1
2
3


Fill this nan value with df.column_name.fillna(0, inplace=True) before you write this df to a csv.
So whenever you read this df again with pd.read_csv there shouldn't be a problem.






share|improve this answer
























  • This is not a good general solution. 0 and Nan are completely different. Moreover float64 columns allow for null values as illustrated here: import pandas as pd import numpy as np df = pd.DataFrame({'a':[1,2,np.nan]}) df.to_csv("test.csv") df2 = pd.read_csv("test.csv", dtype=np.float64)

    – Robert
    Nov 20 '18 at 15:54













Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53395628%2fpandas-read-csv-low-memory-and-dtype-options-typeerror-cannot-cast-array-from%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














There may be nan values in your dataframe. So when you specify a dtype ensure you have filled all the columns with some value to avoid mixed dtype for that column.



For eg :



column_name
np.nan
1
2
3


Fill this nan value with df.column_name.fillna(0, inplace=True) before you write this df to a csv.
So whenever you read this df again with pd.read_csv there shouldn't be a problem.






share|improve this answer
























  • This is not a good general solution. 0 and Nan are completely different. Moreover float64 columns allow for null values as illustrated here: import pandas as pd import numpy as np df = pd.DataFrame({'a':[1,2,np.nan]}) df.to_csv("test.csv") df2 = pd.read_csv("test.csv", dtype=np.float64)

    – Robert
    Nov 20 '18 at 15:54


















0














There may be nan values in your dataframe. So when you specify a dtype ensure you have filled all the columns with some value to avoid mixed dtype for that column.



For eg :



column_name
np.nan
1
2
3


Fill this nan value with df.column_name.fillna(0, inplace=True) before you write this df to a csv.
So whenever you read this df again with pd.read_csv there shouldn't be a problem.






share|improve this answer
























  • This is not a good general solution. 0 and Nan are completely different. Moreover float64 columns allow for null values as illustrated here: import pandas as pd import numpy as np df = pd.DataFrame({'a':[1,2,np.nan]}) df.to_csv("test.csv") df2 = pd.read_csv("test.csv", dtype=np.float64)

    – Robert
    Nov 20 '18 at 15:54
















0












0








0







There may be nan values in your dataframe. So when you specify a dtype ensure you have filled all the columns with some value to avoid mixed dtype for that column.



For eg :



column_name
np.nan
1
2
3


Fill this nan value with df.column_name.fillna(0, inplace=True) before you write this df to a csv.
So whenever you read this df again with pd.read_csv there shouldn't be a problem.






share|improve this answer













There may be nan values in your dataframe. So when you specify a dtype ensure you have filled all the columns with some value to avoid mixed dtype for that column.



For eg :



column_name
np.nan
1
2
3


Fill this nan value with df.column_name.fillna(0, inplace=True) before you write this df to a csv.
So whenever you read this df again with pd.read_csv there shouldn't be a problem.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 20 '18 at 15:02









Mahendra SinghMahendra Singh

405




405













  • This is not a good general solution. 0 and Nan are completely different. Moreover float64 columns allow for null values as illustrated here: import pandas as pd import numpy as np df = pd.DataFrame({'a':[1,2,np.nan]}) df.to_csv("test.csv") df2 = pd.read_csv("test.csv", dtype=np.float64)

    – Robert
    Nov 20 '18 at 15:54





















  • This is not a good general solution. 0 and Nan are completely different. Moreover float64 columns allow for null values as illustrated here: import pandas as pd import numpy as np df = pd.DataFrame({'a':[1,2,np.nan]}) df.to_csv("test.csv") df2 = pd.read_csv("test.csv", dtype=np.float64)

    – Robert
    Nov 20 '18 at 15:54



















This is not a good general solution. 0 and Nan are completely different. Moreover float64 columns allow for null values as illustrated here: import pandas as pd import numpy as np df = pd.DataFrame({'a':[1,2,np.nan]}) df.to_csv("test.csv") df2 = pd.read_csv("test.csv", dtype=np.float64)

– Robert
Nov 20 '18 at 15:54







This is not a good general solution. 0 and Nan are completely different. Moreover float64 columns allow for null values as illustrated here: import pandas as pd import numpy as np df = pd.DataFrame({'a':[1,2,np.nan]}) df.to_csv("test.csv") df2 = pd.read_csv("test.csv", dtype=np.float64)

– Robert
Nov 20 '18 at 15:54




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53395628%2fpandas-read-csv-low-memory-and-dtype-options-typeerror-cannot-cast-array-from%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

MongoDB - Not Authorized To Execute Command

in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith

Npm cannot find a required file even through it is in the searched directory