Pandas read_csv low_memory and dtype options. TypeError: Cannot cast array from dtype('O') to...
I am trying to read a csv file
df = pd.read_csv('Salaries.csv')
I have this:
sys:1: DtypeWarning: Columns (3,4,5,6,12) have mixed types. Specify dtype option on import or set low_memory=False.
So, I tried:
df = pd.read_csv('Salaries.csv', sep=',', dtype={
'Id': int,
'EmployeeName': str,
'JobTitle': str,
'BasePay': float,
'OvertimePay': float,
'OtherPay': float,
'Benefits': float,
'TotalPay': np.float64,
'TotalPayBenefits': np.float64,
'Year': np.int64,
'Notes': np.float64,
'Agency': str,
'Status': float})
And now I have this:
Traceback (most recent call last): File "pandas_libsparsers.pyx", line 1156, in
pandas._libs.parsers.TextReader._convert_tokens TypeError: Cannot cast
array from dtype('O') to dtype('float64') according to the rule 'safe'
Also I have read previously asked questions and official docs but don't understand where the problem is.
Here is example of data from Salaries.csv
Id,EmployeeName,JobTitle,BasePay,OvertimePay,OtherPay,Benefits,TotalPay,TotalPayBenefits,Year,Notes,Agency,Status
1,NATHANIEL FORD,GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY,167411.18,0.0,400184.25,,567595.43,567595.43,2011,,San Francisco,
python python-3.x pandas csv dataframe
|
show 3 more comments
I am trying to read a csv file
df = pd.read_csv('Salaries.csv')
I have this:
sys:1: DtypeWarning: Columns (3,4,5,6,12) have mixed types. Specify dtype option on import or set low_memory=False.
So, I tried:
df = pd.read_csv('Salaries.csv', sep=',', dtype={
'Id': int,
'EmployeeName': str,
'JobTitle': str,
'BasePay': float,
'OvertimePay': float,
'OtherPay': float,
'Benefits': float,
'TotalPay': np.float64,
'TotalPayBenefits': np.float64,
'Year': np.int64,
'Notes': np.float64,
'Agency': str,
'Status': float})
And now I have this:
Traceback (most recent call last): File "pandas_libsparsers.pyx", line 1156, in
pandas._libs.parsers.TextReader._convert_tokens TypeError: Cannot cast
array from dtype('O') to dtype('float64') according to the rule 'safe'
Also I have read previously asked questions and official docs but don't understand where the problem is.
Here is example of data from Salaries.csv
Id,EmployeeName,JobTitle,BasePay,OvertimePay,OtherPay,Benefits,TotalPay,TotalPayBenefits,Year,Notes,Agency,Status
1,NATHANIEL FORD,GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY,167411.18,0.0,400184.25,,567595.43,567595.43,2011,,San Francisco,
python python-3.x pandas csv dataframe
check this
– anky_91
Nov 20 '18 at 15:01
Have you tried speficying a dtype ofO
for those columns with mixed types? Right now you are forcing a dtype offloat64
on columns that have mixed types, which of course raises and error.
– yatu
Nov 20 '18 at 15:03
The type for the last column Status should be object
– Vaishali
Nov 20 '18 at 15:05
@AlexandreNixon Thanks, it has helped. But columns 7 and 8 are casted to 567595 instead of 567595.43. Do you have any suggestions?
– oliinykmd
Nov 20 '18 at 15:29
run the first line of code again:df = pd.read_csv('Salaries.csv')
and doprint(df.dtypes)
to compare the infered dtypes from the ones you're writing explicitly. May give you some insight, at least see which are the offending columns
– Robert
Nov 20 '18 at 15:59
|
show 3 more comments
I am trying to read a csv file
df = pd.read_csv('Salaries.csv')
I have this:
sys:1: DtypeWarning: Columns (3,4,5,6,12) have mixed types. Specify dtype option on import or set low_memory=False.
So, I tried:
df = pd.read_csv('Salaries.csv', sep=',', dtype={
'Id': int,
'EmployeeName': str,
'JobTitle': str,
'BasePay': float,
'OvertimePay': float,
'OtherPay': float,
'Benefits': float,
'TotalPay': np.float64,
'TotalPayBenefits': np.float64,
'Year': np.int64,
'Notes': np.float64,
'Agency': str,
'Status': float})
And now I have this:
Traceback (most recent call last): File "pandas_libsparsers.pyx", line 1156, in
pandas._libs.parsers.TextReader._convert_tokens TypeError: Cannot cast
array from dtype('O') to dtype('float64') according to the rule 'safe'
Also I have read previously asked questions and official docs but don't understand where the problem is.
Here is example of data from Salaries.csv
Id,EmployeeName,JobTitle,BasePay,OvertimePay,OtherPay,Benefits,TotalPay,TotalPayBenefits,Year,Notes,Agency,Status
1,NATHANIEL FORD,GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY,167411.18,0.0,400184.25,,567595.43,567595.43,2011,,San Francisco,
python python-3.x pandas csv dataframe
I am trying to read a csv file
df = pd.read_csv('Salaries.csv')
I have this:
sys:1: DtypeWarning: Columns (3,4,5,6,12) have mixed types. Specify dtype option on import or set low_memory=False.
So, I tried:
df = pd.read_csv('Salaries.csv', sep=',', dtype={
'Id': int,
'EmployeeName': str,
'JobTitle': str,
'BasePay': float,
'OvertimePay': float,
'OtherPay': float,
'Benefits': float,
'TotalPay': np.float64,
'TotalPayBenefits': np.float64,
'Year': np.int64,
'Notes': np.float64,
'Agency': str,
'Status': float})
And now I have this:
Traceback (most recent call last): File "pandas_libsparsers.pyx", line 1156, in
pandas._libs.parsers.TextReader._convert_tokens TypeError: Cannot cast
array from dtype('O') to dtype('float64') according to the rule 'safe'
Also I have read previously asked questions and official docs but don't understand where the problem is.
Here is example of data from Salaries.csv
Id,EmployeeName,JobTitle,BasePay,OvertimePay,OtherPay,Benefits,TotalPay,TotalPayBenefits,Year,Notes,Agency,Status
1,NATHANIEL FORD,GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY,167411.18,0.0,400184.25,,567595.43,567595.43,2011,,San Francisco,
python python-3.x pandas csv dataframe
python python-3.x pandas csv dataframe
asked Nov 20 '18 at 14:51


oliinykmdoliinykmd
112
112
check this
– anky_91
Nov 20 '18 at 15:01
Have you tried speficying a dtype ofO
for those columns with mixed types? Right now you are forcing a dtype offloat64
on columns that have mixed types, which of course raises and error.
– yatu
Nov 20 '18 at 15:03
The type for the last column Status should be object
– Vaishali
Nov 20 '18 at 15:05
@AlexandreNixon Thanks, it has helped. But columns 7 and 8 are casted to 567595 instead of 567595.43. Do you have any suggestions?
– oliinykmd
Nov 20 '18 at 15:29
run the first line of code again:df = pd.read_csv('Salaries.csv')
and doprint(df.dtypes)
to compare the infered dtypes from the ones you're writing explicitly. May give you some insight, at least see which are the offending columns
– Robert
Nov 20 '18 at 15:59
|
show 3 more comments
check this
– anky_91
Nov 20 '18 at 15:01
Have you tried speficying a dtype ofO
for those columns with mixed types? Right now you are forcing a dtype offloat64
on columns that have mixed types, which of course raises and error.
– yatu
Nov 20 '18 at 15:03
The type for the last column Status should be object
– Vaishali
Nov 20 '18 at 15:05
@AlexandreNixon Thanks, it has helped. But columns 7 and 8 are casted to 567595 instead of 567595.43. Do you have any suggestions?
– oliinykmd
Nov 20 '18 at 15:29
run the first line of code again:df = pd.read_csv('Salaries.csv')
and doprint(df.dtypes)
to compare the infered dtypes from the ones you're writing explicitly. May give you some insight, at least see which are the offending columns
– Robert
Nov 20 '18 at 15:59
check this
– anky_91
Nov 20 '18 at 15:01
check this
– anky_91
Nov 20 '18 at 15:01
Have you tried speficying a dtype of
O
for those columns with mixed types? Right now you are forcing a dtype of float64
on columns that have mixed types, which of course raises and error.– yatu
Nov 20 '18 at 15:03
Have you tried speficying a dtype of
O
for those columns with mixed types? Right now you are forcing a dtype of float64
on columns that have mixed types, which of course raises and error.– yatu
Nov 20 '18 at 15:03
The type for the last column Status should be object
– Vaishali
Nov 20 '18 at 15:05
The type for the last column Status should be object
– Vaishali
Nov 20 '18 at 15:05
@AlexandreNixon Thanks, it has helped. But columns 7 and 8 are casted to 567595 instead of 567595.43. Do you have any suggestions?
– oliinykmd
Nov 20 '18 at 15:29
@AlexandreNixon Thanks, it has helped. But columns 7 and 8 are casted to 567595 instead of 567595.43. Do you have any suggestions?
– oliinykmd
Nov 20 '18 at 15:29
run the first line of code again:
df = pd.read_csv('Salaries.csv')
and do print(df.dtypes)
to compare the infered dtypes from the ones you're writing explicitly. May give you some insight, at least see which are the offending columns– Robert
Nov 20 '18 at 15:59
run the first line of code again:
df = pd.read_csv('Salaries.csv')
and do print(df.dtypes)
to compare the infered dtypes from the ones you're writing explicitly. May give you some insight, at least see which are the offending columns– Robert
Nov 20 '18 at 15:59
|
show 3 more comments
1 Answer
1
active
oldest
votes
There may be nan values in your dataframe. So when you specify a dtype ensure you have filled all the columns with some value to avoid mixed dtype for that column.
For eg :
column_name
np.nan
1
2
3
Fill this nan value with df.column_name.fillna(0, inplace=True)
before you write this df to a csv.
So whenever you read this df again with pd.read_csv
there shouldn't be a problem.
This is not a good general solution. 0 and Nan are completely different. Moreover float64 columns allow for null values as illustrated here:import pandas as pd import numpy as np df = pd.DataFrame({'a':[1,2,np.nan]}) df.to_csv("test.csv") df2 = pd.read_csv("test.csv", dtype=np.float64)
– Robert
Nov 20 '18 at 15:54
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53395628%2fpandas-read-csv-low-memory-and-dtype-options-typeerror-cannot-cast-array-from%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
There may be nan values in your dataframe. So when you specify a dtype ensure you have filled all the columns with some value to avoid mixed dtype for that column.
For eg :
column_name
np.nan
1
2
3
Fill this nan value with df.column_name.fillna(0, inplace=True)
before you write this df to a csv.
So whenever you read this df again with pd.read_csv
there shouldn't be a problem.
This is not a good general solution. 0 and Nan are completely different. Moreover float64 columns allow for null values as illustrated here:import pandas as pd import numpy as np df = pd.DataFrame({'a':[1,2,np.nan]}) df.to_csv("test.csv") df2 = pd.read_csv("test.csv", dtype=np.float64)
– Robert
Nov 20 '18 at 15:54
add a comment |
There may be nan values in your dataframe. So when you specify a dtype ensure you have filled all the columns with some value to avoid mixed dtype for that column.
For eg :
column_name
np.nan
1
2
3
Fill this nan value with df.column_name.fillna(0, inplace=True)
before you write this df to a csv.
So whenever you read this df again with pd.read_csv
there shouldn't be a problem.
This is not a good general solution. 0 and Nan are completely different. Moreover float64 columns allow for null values as illustrated here:import pandas as pd import numpy as np df = pd.DataFrame({'a':[1,2,np.nan]}) df.to_csv("test.csv") df2 = pd.read_csv("test.csv", dtype=np.float64)
– Robert
Nov 20 '18 at 15:54
add a comment |
There may be nan values in your dataframe. So when you specify a dtype ensure you have filled all the columns with some value to avoid mixed dtype for that column.
For eg :
column_name
np.nan
1
2
3
Fill this nan value with df.column_name.fillna(0, inplace=True)
before you write this df to a csv.
So whenever you read this df again with pd.read_csv
there shouldn't be a problem.
There may be nan values in your dataframe. So when you specify a dtype ensure you have filled all the columns with some value to avoid mixed dtype for that column.
For eg :
column_name
np.nan
1
2
3
Fill this nan value with df.column_name.fillna(0, inplace=True)
before you write this df to a csv.
So whenever you read this df again with pd.read_csv
there shouldn't be a problem.
answered Nov 20 '18 at 15:02


Mahendra SinghMahendra Singh
405
405
This is not a good general solution. 0 and Nan are completely different. Moreover float64 columns allow for null values as illustrated here:import pandas as pd import numpy as np df = pd.DataFrame({'a':[1,2,np.nan]}) df.to_csv("test.csv") df2 = pd.read_csv("test.csv", dtype=np.float64)
– Robert
Nov 20 '18 at 15:54
add a comment |
This is not a good general solution. 0 and Nan are completely different. Moreover float64 columns allow for null values as illustrated here:import pandas as pd import numpy as np df = pd.DataFrame({'a':[1,2,np.nan]}) df.to_csv("test.csv") df2 = pd.read_csv("test.csv", dtype=np.float64)
– Robert
Nov 20 '18 at 15:54
This is not a good general solution. 0 and Nan are completely different. Moreover float64 columns allow for null values as illustrated here:
import pandas as pd import numpy as np df = pd.DataFrame({'a':[1,2,np.nan]}) df.to_csv("test.csv") df2 = pd.read_csv("test.csv", dtype=np.float64)
– Robert
Nov 20 '18 at 15:54
This is not a good general solution. 0 and Nan are completely different. Moreover float64 columns allow for null values as illustrated here:
import pandas as pd import numpy as np df = pd.DataFrame({'a':[1,2,np.nan]}) df.to_csv("test.csv") df2 = pd.read_csv("test.csv", dtype=np.float64)
– Robert
Nov 20 '18 at 15:54
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53395628%2fpandas-read-csv-low-memory-and-dtype-options-typeerror-cannot-cast-array-from%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
check this
– anky_91
Nov 20 '18 at 15:01
Have you tried speficying a dtype of
O
for those columns with mixed types? Right now you are forcing a dtype offloat64
on columns that have mixed types, which of course raises and error.– yatu
Nov 20 '18 at 15:03
The type for the last column Status should be object
– Vaishali
Nov 20 '18 at 15:05
@AlexandreNixon Thanks, it has helped. But columns 7 and 8 are casted to 567595 instead of 567595.43. Do you have any suggestions?
– oliinykmd
Nov 20 '18 at 15:29
run the first line of code again:
df = pd.read_csv('Salaries.csv')
and doprint(df.dtypes)
to compare the infered dtypes from the ones you're writing explicitly. May give you some insight, at least see which are the offending columns– Robert
Nov 20 '18 at 15:59