Pandas read_csv low_memory and dtype options. TypeError: Cannot cast array from dtype('O') to...

I am trying to read a csv file

df = pd.read_csv('Salaries.csv')

I have this:

sys:1: DtypeWarning: Columns (3,4,5,6,12) have mixed types. Specify dtype option on import or set low_memory=False.

So, I tried:

df = pd.read_csv('Salaries.csv', sep=',', dtype={

'Id': int, 

'EmployeeName': str, 

'JobTitle': str, 

'BasePay': float,

'OvertimePay': float, 

'OtherPay': float, 

'Benefits': float, 

'TotalPay': np.float64,

'TotalPayBenefits': np.float64,

'Year': np.int64,

'Notes': np.float64,

'Agency': str,

'Status': float})

And now I have this:

Traceback (most recent call last): File "pandas_libsparsers.pyx", line 1156, in
pandas._libs.parsers.TextReader._convert_tokens TypeError: Cannot cast
array from dtype('O') to dtype('float64') according to the rule 'safe'

Also I have read previously asked questions and official docs but don't understand where the problem is.

Here is example of data from Salaries.csv

Id,EmployeeName,JobTitle,BasePay,OvertimePay,OtherPay,Benefits,TotalPay,TotalPayBenefits,Year,Notes,Agency,Status

1,NATHANIEL FORD,GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY,167411.18,0.0,400184.25,,567595.43,567595.43,2011,,San Francisco,

asked Nov 20 '18 at 14:51

oliinykmd

112

check this

– anky_91
Nov 20 '18 at 15:01

Have you tried speficying a dtype of O for those columns with mixed types? Right now you are forcing a dtype of float64on columns that have mixed types, which of course raises and error.

– yatu
Nov 20 '18 at 15:03

The type for the last column Status should be object

– Vaishali
Nov 20 '18 at 15:05

@AlexandreNixon Thanks, it has helped. But columns 7 and 8 are casted to 567595 instead of 567595.43. Do you have any suggestions?

– oliinykmd
Nov 20 '18 at 15:29

run the first line of code again: df = pd.read_csv('Salaries.csv') and do print(df.dtypes) to compare the infered dtypes from the ones you're writing explicitly. May give you some insight, at least see which are the offending columns

– Robert
Nov 20 '18 at 15:59

|
show 3 more comments

I am trying to read a csv file

df = pd.read_csv('Salaries.csv')

I have this:

sys:1: DtypeWarning: Columns (3,4,5,6,12) have mixed types. Specify dtype option on import or set low_memory=False.

So, I tried:

df = pd.read_csv('Salaries.csv', sep=',', dtype={

'Id': int, 

'EmployeeName': str, 

'JobTitle': str, 

'BasePay': float,

'OvertimePay': float, 

'OtherPay': float, 

'Benefits': float, 

'TotalPay': np.float64,

'TotalPayBenefits': np.float64,

'Year': np.int64,

'Notes': np.float64,

'Agency': str,

'Status': float})

And now I have this:

Traceback (most recent call last): File "pandas_libsparsers.pyx", line 1156, in
pandas._libs.parsers.TextReader._convert_tokens TypeError: Cannot cast
array from dtype('O') to dtype('float64') according to the rule 'safe'

Also I have read previously asked questions and official docs but don't understand where the problem is.

Here is example of data from Salaries.csv

Id,EmployeeName,JobTitle,BasePay,OvertimePay,OtherPay,Benefits,TotalPay,TotalPayBenefits,Year,Notes,Agency,Status

1,NATHANIEL FORD,GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY,167411.18,0.0,400184.25,,567595.43,567595.43,2011,,San Francisco,

asked Nov 20 '18 at 14:51

oliinykmd

112

check this

– anky_91
Nov 20 '18 at 15:01

Have you tried speficying a dtype of O for those columns with mixed types? Right now you are forcing a dtype of float64on columns that have mixed types, which of course raises and error.

– yatu
Nov 20 '18 at 15:03

The type for the last column Status should be object

– Vaishali
Nov 20 '18 at 15:05

@AlexandreNixon Thanks, it has helped. But columns 7 and 8 are casted to 567595 instead of 567595.43. Do you have any suggestions?

– oliinykmd
Nov 20 '18 at 15:29

run the first line of code again: df = pd.read_csv('Salaries.csv') and do print(df.dtypes) to compare the infered dtypes from the ones you're writing explicitly. May give you some insight, at least see which are the offending columns

– Robert
Nov 20 '18 at 15:59

|
show 3 more comments

I am trying to read a csv file

df = pd.read_csv('Salaries.csv')

I have this:

sys:1: DtypeWarning: Columns (3,4,5,6,12) have mixed types. Specify dtype option on import or set low_memory=False.

So, I tried:

df = pd.read_csv('Salaries.csv', sep=',', dtype={

'Id': int, 

'EmployeeName': str, 

'JobTitle': str, 

'BasePay': float,

'OvertimePay': float, 

'OtherPay': float, 

'Benefits': float, 

'TotalPay': np.float64,

'TotalPayBenefits': np.float64,

'Year': np.int64,

'Notes': np.float64,

'Agency': str,

'Status': float})

And now I have this:

Traceback (most recent call last): File "pandas_libsparsers.pyx", line 1156, in
pandas._libs.parsers.TextReader._convert_tokens TypeError: Cannot cast
array from dtype('O') to dtype('float64') according to the rule 'safe'

Also I have read previously asked questions and official docs but don't understand where the problem is.

Here is example of data from Salaries.csv

Id,EmployeeName,JobTitle,BasePay,OvertimePay,OtherPay,Benefits,TotalPay,TotalPayBenefits,Year,Notes,Agency,Status

1,NATHANIEL FORD,GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY,167411.18,0.0,400184.25,,567595.43,567595.43,2011,,San Francisco,

asked Nov 20 '18 at 14:51

oliinykmd

112

I am trying to read a csv file

df = pd.read_csv('Salaries.csv')

I have this:

sys:1: DtypeWarning: Columns (3,4,5,6,12) have mixed types. Specify dtype option on import or set low_memory=False.

So, I tried:

df = pd.read_csv('Salaries.csv', sep=',', dtype={

'Id': int, 

'EmployeeName': str, 

'JobTitle': str, 

'BasePay': float,

'OvertimePay': float, 

'OtherPay': float, 

'Benefits': float, 

'TotalPay': np.float64,

'TotalPayBenefits': np.float64,

'Year': np.int64,

'Notes': np.float64,

'Agency': str,

'Status': float})

And now I have this:

Traceback (most recent call last): File "pandas_libsparsers.pyx", line 1156, in
pandas._libs.parsers.TextReader._convert_tokens TypeError: Cannot cast
array from dtype('O') to dtype('float64') according to the rule 'safe'

Also I have read previously asked questions and official docs but don't understand where the problem is.

Here is example of data from Salaries.csv

Id,EmployeeName,JobTitle,BasePay,OvertimePay,OtherPay,Benefits,TotalPay,TotalPayBenefits,Year,Notes,Agency,Status

1,NATHANIEL FORD,GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY,167411.18,0.0,400184.25,,567595.43,567595.43,2011,,San Francisco,

python python-3.x pandas csv dataframe

asked Nov 20 '18 at 14:51

oliinykmd

112

asked Nov 20 '18 at 14:51

oliinykmd

112

asked Nov 20 '18 at 14:51

oliinykmd

112

asked Nov 20 '18 at 14:51

oliinykmd

112

asked Nov 20 '18 at 14:51

oliinykmd

112

check this

– anky_91
Nov 20 '18 at 15:01

Have you tried speficying a dtype of O for those columns with mixed types? Right now you are forcing a dtype of float64on columns that have mixed types, which of course raises and error.

– yatu
Nov 20 '18 at 15:03

The type for the last column Status should be object

– Vaishali
Nov 20 '18 at 15:05

@AlexandreNixon Thanks, it has helped. But columns 7 and 8 are casted to 567595 instead of 567595.43. Do you have any suggestions?

– oliinykmd
Nov 20 '18 at 15:29

run the first line of code again: df = pd.read_csv('Salaries.csv') and do print(df.dtypes) to compare the infered dtypes from the ones you're writing explicitly. May give you some insight, at least see which are the offending columns

– Robert
Nov 20 '18 at 15:59

|
show 3 more comments

check this

– anky_91
Nov 20 '18 at 15:01

Have you tried speficying a dtype of O for those columns with mixed types? Right now you are forcing a dtype of float64on columns that have mixed types, which of course raises and error.

– yatu
Nov 20 '18 at 15:03

The type for the last column Status should be object

– Vaishali
Nov 20 '18 at 15:05

@AlexandreNixon Thanks, it has helped. But columns 7 and 8 are casted to 567595 instead of 567595.43. Do you have any suggestions?

– oliinykmd
Nov 20 '18 at 15:29

run the first line of code again: df = pd.read_csv('Salaries.csv') and do print(df.dtypes) to compare the infered dtypes from the ones you're writing explicitly. May give you some insight, at least see which are the offending columns

– Robert
Nov 20 '18 at 15:59

check this

– anky_91
Nov 20 '18 at 15:01

Have you tried speficying a dtype of O for those columns with mixed types? Right now you are forcing a dtype of float64on columns that have mixed types, which of course raises and error.

– yatu
Nov 20 '18 at 15:03

The type for the last column Status should be object

– Vaishali
Nov 20 '18 at 15:05

@AlexandreNixon Thanks, it has helped. But columns 7 and 8 are casted to 567595 instead of 567595.43. Do you have any suggestions?

– oliinykmd
Nov 20 '18 at 15:29

run the first line of code again: df = pd.read_csv('Salaries.csv') and do print(df.dtypes) to compare the infered dtypes from the ones you're writing explicitly. May give you some insight, at least see which are the offending columns

– Robert
Nov 20 '18 at 15:59

|
show 3 more comments

1 Answer
1

active

oldest

votes

There may be nan values in your dataframe. So when you specify a dtype ensure you have filled all the columns with some value to avoid mixed dtype for that column.

For eg :

column_name

     np.nan

          1

          2

          3

Fill this nan value with df.column_name.fillna(0, inplace=True) before you write this df to a csv.
So whenever you read this df again with pd.read_csv there shouldn't be a problem.

answered Nov 20 '18 at 15:02

Mahendra Singh

405

This is not a good general solution. 0 and Nan are completely different. Moreover float64 columns allow for null values as illustrated here: import pandas as pd import numpy as np df = pd.DataFrame({'a':[1,2,np.nan]}) df.to_csv("test.csv") df2 = pd.read_csv("test.csv", dtype=np.float64)

– Robert
Nov 20 '18 at 15:54

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53395628%2fpandas-read-csv-low-memory-and-dtype-options-typeerror-cannot-cast-array-from%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

There may be nan values in your dataframe. So when you specify a dtype ensure you have filled all the columns with some value to avoid mixed dtype for that column.

For eg :

column_name

     np.nan

          1

          2

          3

Fill this nan value with df.column_name.fillna(0, inplace=True) before you write this df to a csv.
So whenever you read this df again with pd.read_csv there shouldn't be a problem.

answered Nov 20 '18 at 15:02

Mahendra Singh

405

This is not a good general solution. 0 and Nan are completely different. Moreover float64 columns allow for null values as illustrated here: import pandas as pd import numpy as np df = pd.DataFrame({'a':[1,2,np.nan]}) df.to_csv("test.csv") df2 = pd.read_csv("test.csv", dtype=np.float64)

– Robert
Nov 20 '18 at 15:54

add a comment |

There may be nan values in your dataframe. So when you specify a dtype ensure you have filled all the columns with some value to avoid mixed dtype for that column.

For eg :

column_name

     np.nan

          1

          2

          3

Fill this nan value with df.column_name.fillna(0, inplace=True) before you write this df to a csv.
So whenever you read this df again with pd.read_csv there shouldn't be a problem.

answered Nov 20 '18 at 15:02

Mahendra Singh

405

This is not a good general solution. 0 and Nan are completely different. Moreover float64 columns allow for null values as illustrated here: import pandas as pd import numpy as np df = pd.DataFrame({'a':[1,2,np.nan]}) df.to_csv("test.csv") df2 = pd.read_csv("test.csv", dtype=np.float64)

– Robert
Nov 20 '18 at 15:54

add a comment |

There may be nan values in your dataframe. So when you specify a dtype ensure you have filled all the columns with some value to avoid mixed dtype for that column.

For eg :

column_name

     np.nan

          1

          2

          3

Fill this nan value with df.column_name.fillna(0, inplace=True) before you write this df to a csv.
So whenever you read this df again with pd.read_csv there shouldn't be a problem.

answered Nov 20 '18 at 15:02

Mahendra Singh

405

There may be nan values in your dataframe. So when you specify a dtype ensure you have filled all the columns with some value to avoid mixed dtype for that column.

For eg :

column_name

     np.nan

          1

          2

          3

Fill this nan value with df.column_name.fillna(0, inplace=True) before you write this df to a csv.
So whenever you read this df again with pd.read_csv there shouldn't be a problem.

answered Nov 20 '18 at 15:02

Mahendra Singh

405

answered Nov 20 '18 at 15:02

Mahendra Singh

405

answered Nov 20 '18 at 15:02

Mahendra Singh

405

answered Nov 20 '18 at 15:02

Mahendra Singh

405

This is not a good general solution. 0 and Nan are completely different. Moreover float64 columns allow for null values as illustrated here: import pandas as pd import numpy as np df = pd.DataFrame({'a':[1,2,np.nan]}) df.to_csv("test.csv") df2 = pd.read_csv("test.csv", dtype=np.float64)

– Robert
Nov 20 '18 at 15:54

add a comment |

This is not a good general solution. 0 and Nan are completely different. Moreover float64 columns allow for null values as illustrated here: import pandas as pd import numpy as np df = pd.DataFrame({'a':[1,2,np.nan]}) df.to_csv("test.csv") df2 = pd.read_csv("test.csv", dtype=np.float64)

– Robert
Nov 20 '18 at 15:54

This is not a good general solution. 0 and Nan are completely different. Moreover float64 columns allow for null values as illustrated here:

import pandas as pd import numpy as np df = pd.DataFrame({'a':[1,2,np.nan]}) df.to_csv("test.csv") df2 = pd.read_csv("test.csv", dtype=np.float64)

– Robert
Nov 20 '18 at 15:54

This is not a good general solution. 0 and Nan are completely different. Moreover float64 columns allow for null values as illustrated here:

import pandas as pd import numpy as np df = pd.DataFrame({'a':[1,2,np.nan]}) df.to_csv("test.csv") df2 = pd.read_csv("test.csv", dtype=np.float64)

– Robert
Nov 20 '18 at 15:54

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu