How do I write code to determine whether the EOL character in a CSV file is `r` or `n` without looking at the...
I'm using Python in Jupyter Notebooks to work with a CSV file. I'm writing the same code in two different versions of Jupyter Notebook--one that's running directly on my computer and another that's running off a kind of emulator within an online lesson from Dataquest. When I open the CSV file and read it into a string on my computer's Jupyter Notebook, the EOL character is r
but when I do the same on Dataquest's emulator, the EOL character is n
. I have two questions:
Why does this happen?
How can I write a Python code that tests for the EOL character without opening the file to find out visually?
This code in in a Jupyter notebook on my own Mac.
f = open('US_births_1994-2003_CDC_NCHS.csv', 'r')
data_MyComp = f.read()
data_MyComp
This code is on Dataquest's Jupyter notebook browser emulator.
f = open('US_births_1994-2003_CDC_NCHS.csv', 'r')
data_dataquest = f.read()
data_dataquest
This is a few lines of output from my computer when I run data_MyComp
(note the EOL character is r
).
'year,month,date_of_month,day_of_week,birthsr1994,1,1,6,8096r1994,1,2,7,7772r1994,1,3,1,10142r1994,1,4,2,11248r1994,1,5,3,11053r1994,1,6,4,11406r1994,1,7,5,11251r1994,1,8,6,8653r1994,1,9,7,7910r1994,1,10,1,10498r1994,1,11,2,11706r
This is a few lines of output from the Dataquest emulator when I run data_dataquest
(note the EOL character is n
).
'year,month,date_of_month,day_of_week,birthsn1994,1,1,6,8096n1994,1,2,7,7772n1994,1,3,1,10142n1994,1,4,2,11248n1994,1,5,3,11053n1994,1,6,4,11406n
python eol end-of-line
|
show 2 more comments
I'm using Python in Jupyter Notebooks to work with a CSV file. I'm writing the same code in two different versions of Jupyter Notebook--one that's running directly on my computer and another that's running off a kind of emulator within an online lesson from Dataquest. When I open the CSV file and read it into a string on my computer's Jupyter Notebook, the EOL character is r
but when I do the same on Dataquest's emulator, the EOL character is n
. I have two questions:
Why does this happen?
How can I write a Python code that tests for the EOL character without opening the file to find out visually?
This code in in a Jupyter notebook on my own Mac.
f = open('US_births_1994-2003_CDC_NCHS.csv', 'r')
data_MyComp = f.read()
data_MyComp
This code is on Dataquest's Jupyter notebook browser emulator.
f = open('US_births_1994-2003_CDC_NCHS.csv', 'r')
data_dataquest = f.read()
data_dataquest
This is a few lines of output from my computer when I run data_MyComp
(note the EOL character is r
).
'year,month,date_of_month,day_of_week,birthsr1994,1,1,6,8096r1994,1,2,7,7772r1994,1,3,1,10142r1994,1,4,2,11248r1994,1,5,3,11053r1994,1,6,4,11406r1994,1,7,5,11251r1994,1,8,6,8653r1994,1,9,7,7910r1994,1,10,1,10498r1994,1,11,2,11706r
This is a few lines of output from the Dataquest emulator when I run data_dataquest
(note the EOL character is n
).
'year,month,date_of_month,day_of_week,birthsn1994,1,1,6,8096n1994,1,2,7,7772n1994,1,3,1,10142n1994,1,4,2,11248n1994,1,5,3,11053n1994,1,6,4,11406n
python eol end-of-line
1
docs.python.org/3/library/functions.html#open thenewline
flag handles that for you, or am I missing something?
– aws_apprentice
Jan 2 at 0:06
1
I suppose "opening the file" really means "manual inspection" here. In order to process the contents of a file you have toopen()
it.
– tripleee
Jan 2 at 0:12
1
Is your own computer by any chance running Windows? How exactly are you making the file available to Jupyter?
– tripleee
Jan 2 at 0:14
1
If you just want to read the CSV file, use thecsv
module from the standard library. It should properly handle the line endings on its own.
– mkrieger1
Jan 2 at 0:45
@tripleee Yes, I mean "manual inspection". Thanks for clarifying.
– user10200596
Jan 2 at 5:20
|
show 2 more comments
I'm using Python in Jupyter Notebooks to work with a CSV file. I'm writing the same code in two different versions of Jupyter Notebook--one that's running directly on my computer and another that's running off a kind of emulator within an online lesson from Dataquest. When I open the CSV file and read it into a string on my computer's Jupyter Notebook, the EOL character is r
but when I do the same on Dataquest's emulator, the EOL character is n
. I have two questions:
Why does this happen?
How can I write a Python code that tests for the EOL character without opening the file to find out visually?
This code in in a Jupyter notebook on my own Mac.
f = open('US_births_1994-2003_CDC_NCHS.csv', 'r')
data_MyComp = f.read()
data_MyComp
This code is on Dataquest's Jupyter notebook browser emulator.
f = open('US_births_1994-2003_CDC_NCHS.csv', 'r')
data_dataquest = f.read()
data_dataquest
This is a few lines of output from my computer when I run data_MyComp
(note the EOL character is r
).
'year,month,date_of_month,day_of_week,birthsr1994,1,1,6,8096r1994,1,2,7,7772r1994,1,3,1,10142r1994,1,4,2,11248r1994,1,5,3,11053r1994,1,6,4,11406r1994,1,7,5,11251r1994,1,8,6,8653r1994,1,9,7,7910r1994,1,10,1,10498r1994,1,11,2,11706r
This is a few lines of output from the Dataquest emulator when I run data_dataquest
(note the EOL character is n
).
'year,month,date_of_month,day_of_week,birthsn1994,1,1,6,8096n1994,1,2,7,7772n1994,1,3,1,10142n1994,1,4,2,11248n1994,1,5,3,11053n1994,1,6,4,11406n
python eol end-of-line
I'm using Python in Jupyter Notebooks to work with a CSV file. I'm writing the same code in two different versions of Jupyter Notebook--one that's running directly on my computer and another that's running off a kind of emulator within an online lesson from Dataquest. When I open the CSV file and read it into a string on my computer's Jupyter Notebook, the EOL character is r
but when I do the same on Dataquest's emulator, the EOL character is n
. I have two questions:
Why does this happen?
How can I write a Python code that tests for the EOL character without opening the file to find out visually?
This code in in a Jupyter notebook on my own Mac.
f = open('US_births_1994-2003_CDC_NCHS.csv', 'r')
data_MyComp = f.read()
data_MyComp
This code is on Dataquest's Jupyter notebook browser emulator.
f = open('US_births_1994-2003_CDC_NCHS.csv', 'r')
data_dataquest = f.read()
data_dataquest
This is a few lines of output from my computer when I run data_MyComp
(note the EOL character is r
).
'year,month,date_of_month,day_of_week,birthsr1994,1,1,6,8096r1994,1,2,7,7772r1994,1,3,1,10142r1994,1,4,2,11248r1994,1,5,3,11053r1994,1,6,4,11406r1994,1,7,5,11251r1994,1,8,6,8653r1994,1,9,7,7910r1994,1,10,1,10498r1994,1,11,2,11706r
This is a few lines of output from the Dataquest emulator when I run data_dataquest
(note the EOL character is n
).
'year,month,date_of_month,day_of_week,birthsn1994,1,1,6,8096n1994,1,2,7,7772n1994,1,3,1,10142n1994,1,4,2,11248n1994,1,5,3,11053n1994,1,6,4,11406n
python eol end-of-line
python eol end-of-line
edited Jan 2 at 5:30
tripleee
94.2k13132186
94.2k13132186
asked Jan 2 at 0:03
user10200596user10200596
93
93
1
docs.python.org/3/library/functions.html#open thenewline
flag handles that for you, or am I missing something?
– aws_apprentice
Jan 2 at 0:06
1
I suppose "opening the file" really means "manual inspection" here. In order to process the contents of a file you have toopen()
it.
– tripleee
Jan 2 at 0:12
1
Is your own computer by any chance running Windows? How exactly are you making the file available to Jupyter?
– tripleee
Jan 2 at 0:14
1
If you just want to read the CSV file, use thecsv
module from the standard library. It should properly handle the line endings on its own.
– mkrieger1
Jan 2 at 0:45
@tripleee Yes, I mean "manual inspection". Thanks for clarifying.
– user10200596
Jan 2 at 5:20
|
show 2 more comments
1
docs.python.org/3/library/functions.html#open thenewline
flag handles that for you, or am I missing something?
– aws_apprentice
Jan 2 at 0:06
1
I suppose "opening the file" really means "manual inspection" here. In order to process the contents of a file you have toopen()
it.
– tripleee
Jan 2 at 0:12
1
Is your own computer by any chance running Windows? How exactly are you making the file available to Jupyter?
– tripleee
Jan 2 at 0:14
1
If you just want to read the CSV file, use thecsv
module from the standard library. It should properly handle the line endings on its own.
– mkrieger1
Jan 2 at 0:45
@tripleee Yes, I mean "manual inspection". Thanks for clarifying.
– user10200596
Jan 2 at 5:20
1
1
docs.python.org/3/library/functions.html#open the
newline
flag handles that for you, or am I missing something?– aws_apprentice
Jan 2 at 0:06
docs.python.org/3/library/functions.html#open the
newline
flag handles that for you, or am I missing something?– aws_apprentice
Jan 2 at 0:06
1
1
I suppose "opening the file" really means "manual inspection" here. In order to process the contents of a file you have to
open()
it.– tripleee
Jan 2 at 0:12
I suppose "opening the file" really means "manual inspection" here. In order to process the contents of a file you have to
open()
it.– tripleee
Jan 2 at 0:12
1
1
Is your own computer by any chance running Windows? How exactly are you making the file available to Jupyter?
– tripleee
Jan 2 at 0:14
Is your own computer by any chance running Windows? How exactly are you making the file available to Jupyter?
– tripleee
Jan 2 at 0:14
1
1
If you just want to read the CSV file, use the
csv
module from the standard library. It should properly handle the line endings on its own.– mkrieger1
Jan 2 at 0:45
If you just want to read the CSV file, use the
csv
module from the standard library. It should properly handle the line endings on its own.– mkrieger1
Jan 2 at 0:45
@tripleee Yes, I mean "manual inspection". Thanks for clarifying.
– user10200596
Jan 2 at 5:20
@tripleee Yes, I mean "manual inspection". Thanks for clarifying.
– user10200596
Jan 2 at 5:20
|
show 2 more comments
1 Answer
1
active
oldest
votes
Without any indication of how you downloaded or otherwise made the file available to Python and Jupyter, we can't really tell why this is happening. Line endings are platform-specific but Python 3 should generally neutralize differences between platforms unless you specifically request opening a file as "binary".
You can discover the line-ending conventions by simply opening the file and reading enough of it. What's "enough" depends on the file type. Perhaps something like this in your case:
with open('US_births_1994-2003_CDC_NCHS.csv', 'rb') as peek:
buf = peek.read(1024)
if b'rn' in peek:
print("DOS CR/LF line terminator")
elif b'r' in peek:
print("Plain CR seen (legacy Mac or CP/M file)?")
elif b'n' in peek:
print("Plain LF seen (standard Unix text file)")
This doesn't attempt to do any statistical analysis, but might work well enough for your limited case. The file will be closed again after the end of the with
block so you can then just open it a second time with the parameters you actually need.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53999914%2fhow-do-i-write-code-to-determine-whether-the-eol-character-in-a-csv-file-is-r%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Without any indication of how you downloaded or otherwise made the file available to Python and Jupyter, we can't really tell why this is happening. Line endings are platform-specific but Python 3 should generally neutralize differences between platforms unless you specifically request opening a file as "binary".
You can discover the line-ending conventions by simply opening the file and reading enough of it. What's "enough" depends on the file type. Perhaps something like this in your case:
with open('US_births_1994-2003_CDC_NCHS.csv', 'rb') as peek:
buf = peek.read(1024)
if b'rn' in peek:
print("DOS CR/LF line terminator")
elif b'r' in peek:
print("Plain CR seen (legacy Mac or CP/M file)?")
elif b'n' in peek:
print("Plain LF seen (standard Unix text file)")
This doesn't attempt to do any statistical analysis, but might work well enough for your limited case. The file will be closed again after the end of the with
block so you can then just open it a second time with the parameters you actually need.
add a comment |
Without any indication of how you downloaded or otherwise made the file available to Python and Jupyter, we can't really tell why this is happening. Line endings are platform-specific but Python 3 should generally neutralize differences between platforms unless you specifically request opening a file as "binary".
You can discover the line-ending conventions by simply opening the file and reading enough of it. What's "enough" depends on the file type. Perhaps something like this in your case:
with open('US_births_1994-2003_CDC_NCHS.csv', 'rb') as peek:
buf = peek.read(1024)
if b'rn' in peek:
print("DOS CR/LF line terminator")
elif b'r' in peek:
print("Plain CR seen (legacy Mac or CP/M file)?")
elif b'n' in peek:
print("Plain LF seen (standard Unix text file)")
This doesn't attempt to do any statistical analysis, but might work well enough for your limited case. The file will be closed again after the end of the with
block so you can then just open it a second time with the parameters you actually need.
add a comment |
Without any indication of how you downloaded or otherwise made the file available to Python and Jupyter, we can't really tell why this is happening. Line endings are platform-specific but Python 3 should generally neutralize differences between platforms unless you specifically request opening a file as "binary".
You can discover the line-ending conventions by simply opening the file and reading enough of it. What's "enough" depends on the file type. Perhaps something like this in your case:
with open('US_births_1994-2003_CDC_NCHS.csv', 'rb') as peek:
buf = peek.read(1024)
if b'rn' in peek:
print("DOS CR/LF line terminator")
elif b'r' in peek:
print("Plain CR seen (legacy Mac or CP/M file)?")
elif b'n' in peek:
print("Plain LF seen (standard Unix text file)")
This doesn't attempt to do any statistical analysis, but might work well enough for your limited case. The file will be closed again after the end of the with
block so you can then just open it a second time with the parameters you actually need.
Without any indication of how you downloaded or otherwise made the file available to Python and Jupyter, we can't really tell why this is happening. Line endings are platform-specific but Python 3 should generally neutralize differences between platforms unless you specifically request opening a file as "binary".
You can discover the line-ending conventions by simply opening the file and reading enough of it. What's "enough" depends on the file type. Perhaps something like this in your case:
with open('US_births_1994-2003_CDC_NCHS.csv', 'rb') as peek:
buf = peek.read(1024)
if b'rn' in peek:
print("DOS CR/LF line terminator")
elif b'r' in peek:
print("Plain CR seen (legacy Mac or CP/M file)?")
elif b'n' in peek:
print("Plain LF seen (standard Unix text file)")
This doesn't attempt to do any statistical analysis, but might work well enough for your limited case. The file will be closed again after the end of the with
block so you can then just open it a second time with the parameters you actually need.
edited Jan 2 at 5:44
answered Jan 2 at 5:35
tripleeetripleee
94.2k13132186
94.2k13132186
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53999914%2fhow-do-i-write-code-to-determine-whether-the-eol-character-in-a-csv-file-is-r%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
docs.python.org/3/library/functions.html#open the
newline
flag handles that for you, or am I missing something?– aws_apprentice
Jan 2 at 0:06
1
I suppose "opening the file" really means "manual inspection" here. In order to process the contents of a file you have to
open()
it.– tripleee
Jan 2 at 0:12
1
Is your own computer by any chance running Windows? How exactly are you making the file available to Jupyter?
– tripleee
Jan 2 at 0:14
1
If you just want to read the CSV file, use the
csv
module from the standard library. It should properly handle the line endings on its own.– mkrieger1
Jan 2 at 0:45
@tripleee Yes, I mean "manual inspection". Thanks for clarifying.
– user10200596
Jan 2 at 5:20