How to properly convert an XLSX file to a TSV file in Python?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I'm converting large XLSX files (over 60 columns and 3000 rows) to TSV format. Some fields contain texts with 2-3 paragraphs (multi line breaks).
I want support UTF-8 and I want each row to be displayed on a line in the resulting TSV
I did the following:
import xlrd
import csv
# open the tsv file(output) in unicode format
with open('outTSV.tsv', 'w', encoding='utf-8') as TSVfile:
wr = csv.writer(TSVfile, delimiter="t")
# open the xlsx file
xlfile = xlrd.open_workbook('inXLSX.xlsx')
# retrieve sheet
sheet = xlfile.sheet_by_index(0)
# write rows into TSVfile
for row in range(sheet.nrows):
wr.writerow(sheet.row_values(row))
I expected each row in the XLSX file to transform to a row in the TSV file. However, because some cells have paragraphs, it translates them as newlines. Therefore I get deformed TSV files.
XLSX file

Deformed TSV file

python csv unicode xlsx
|
show 1 more comment
I'm converting large XLSX files (over 60 columns and 3000 rows) to TSV format. Some fields contain texts with 2-3 paragraphs (multi line breaks).
I want support UTF-8 and I want each row to be displayed on a line in the resulting TSV
I did the following:
import xlrd
import csv
# open the tsv file(output) in unicode format
with open('outTSV.tsv', 'w', encoding='utf-8') as TSVfile:
wr = csv.writer(TSVfile, delimiter="t")
# open the xlsx file
xlfile = xlrd.open_workbook('inXLSX.xlsx')
# retrieve sheet
sheet = xlfile.sheet_by_index(0)
# write rows into TSVfile
for row in range(sheet.nrows):
wr.writerow(sheet.row_values(row))
I expected each row in the XLSX file to transform to a row in the TSV file. However, because some cells have paragraphs, it translates them as newlines. Therefore I get deformed TSV files.
XLSX file

Deformed TSV file

python csv unicode xlsx
1
You get perfectly valid TSV file, fields with newline-characters are enclosed in". What do you need?
– Daniel
Jan 3 at 11:59
@Daniel yes they are enclosed in " but because I am going to process the TSV in another workflow, I need it all in a single line.
– Malyk
Jan 3 at 12:09
How does this other workflow handle newlines?
– Daniel
Jan 3 at 12:12
It reads the entire line and maps a column to a value. So in the second line it doesn't read the entire line correctly I believe.
– Malyk
Jan 3 at 13:07
1
@Malyk "...it doesn't read the entire line correctly I believe". Did you try it? The entry looks properly quoted and a correct CSV parser should handle it.
– Mark Tolonen
Jan 3 at 17:02
|
show 1 more comment
I'm converting large XLSX files (over 60 columns and 3000 rows) to TSV format. Some fields contain texts with 2-3 paragraphs (multi line breaks).
I want support UTF-8 and I want each row to be displayed on a line in the resulting TSV
I did the following:
import xlrd
import csv
# open the tsv file(output) in unicode format
with open('outTSV.tsv', 'w', encoding='utf-8') as TSVfile:
wr = csv.writer(TSVfile, delimiter="t")
# open the xlsx file
xlfile = xlrd.open_workbook('inXLSX.xlsx')
# retrieve sheet
sheet = xlfile.sheet_by_index(0)
# write rows into TSVfile
for row in range(sheet.nrows):
wr.writerow(sheet.row_values(row))
I expected each row in the XLSX file to transform to a row in the TSV file. However, because some cells have paragraphs, it translates them as newlines. Therefore I get deformed TSV files.
XLSX file

Deformed TSV file

python csv unicode xlsx
I'm converting large XLSX files (over 60 columns and 3000 rows) to TSV format. Some fields contain texts with 2-3 paragraphs (multi line breaks).
I want support UTF-8 and I want each row to be displayed on a line in the resulting TSV
I did the following:
import xlrd
import csv
# open the tsv file(output) in unicode format
with open('outTSV.tsv', 'w', encoding='utf-8') as TSVfile:
wr = csv.writer(TSVfile, delimiter="t")
# open the xlsx file
xlfile = xlrd.open_workbook('inXLSX.xlsx')
# retrieve sheet
sheet = xlfile.sheet_by_index(0)
# write rows into TSVfile
for row in range(sheet.nrows):
wr.writerow(sheet.row_values(row))
I expected each row in the XLSX file to transform to a row in the TSV file. However, because some cells have paragraphs, it translates them as newlines. Therefore I get deformed TSV files.
XLSX file

Deformed TSV file

python csv unicode xlsx
python csv unicode xlsx
asked Jan 3 at 11:50
MalykMalyk
4419
4419
1
You get perfectly valid TSV file, fields with newline-characters are enclosed in". What do you need?
– Daniel
Jan 3 at 11:59
@Daniel yes they are enclosed in " but because I am going to process the TSV in another workflow, I need it all in a single line.
– Malyk
Jan 3 at 12:09
How does this other workflow handle newlines?
– Daniel
Jan 3 at 12:12
It reads the entire line and maps a column to a value. So in the second line it doesn't read the entire line correctly I believe.
– Malyk
Jan 3 at 13:07
1
@Malyk "...it doesn't read the entire line correctly I believe". Did you try it? The entry looks properly quoted and a correct CSV parser should handle it.
– Mark Tolonen
Jan 3 at 17:02
|
show 1 more comment
1
You get perfectly valid TSV file, fields with newline-characters are enclosed in". What do you need?
– Daniel
Jan 3 at 11:59
@Daniel yes they are enclosed in " but because I am going to process the TSV in another workflow, I need it all in a single line.
– Malyk
Jan 3 at 12:09
How does this other workflow handle newlines?
– Daniel
Jan 3 at 12:12
It reads the entire line and maps a column to a value. So in the second line it doesn't read the entire line correctly I believe.
– Malyk
Jan 3 at 13:07
1
@Malyk "...it doesn't read the entire line correctly I believe". Did you try it? The entry looks properly quoted and a correct CSV parser should handle it.
– Mark Tolonen
Jan 3 at 17:02
1
1
You get perfectly valid TSV file, fields with newline-characters are enclosed in
". What do you need?– Daniel
Jan 3 at 11:59
You get perfectly valid TSV file, fields with newline-characters are enclosed in
". What do you need?– Daniel
Jan 3 at 11:59
@Daniel yes they are enclosed in " but because I am going to process the TSV in another workflow, I need it all in a single line.
– Malyk
Jan 3 at 12:09
@Daniel yes they are enclosed in " but because I am going to process the TSV in another workflow, I need it all in a single line.
– Malyk
Jan 3 at 12:09
How does this other workflow handle newlines?
– Daniel
Jan 3 at 12:12
How does this other workflow handle newlines?
– Daniel
Jan 3 at 12:12
It reads the entire line and maps a column to a value. So in the second line it doesn't read the entire line correctly I believe.
– Malyk
Jan 3 at 13:07
It reads the entire line and maps a column to a value. So in the second line it doesn't read the entire line correctly I believe.
– Malyk
Jan 3 at 13:07
1
1
@Malyk "...it doesn't read the entire line correctly I believe". Did you try it? The entry looks properly quoted and a correct CSV parser should handle it.
– Mark Tolonen
Jan 3 at 17:02
@Malyk "...it doesn't read the entire line correctly I believe". Did you try it? The entry looks properly quoted and a correct CSV parser should handle it.
– Mark Tolonen
Jan 3 at 17:02
|
show 1 more comment
1 Answer
1
active
oldest
votes
I was able to solve the problem using pandas dataframe.
import pandas as pd
#Read excel file into a dataframe
data_xlsx = pd.read_excel('excel.xlsx', 'Sheet1', index_col=None)
#Replace all columns having spaces with underscores
data_xlsx.columns = [c.replace(' ', '_') for c in data_xlsx.columns]
#Replace all fields having line breaks with space
df = data_xlsx.replace('n', ' ',regex=True)
#Write dataframe into csv
df.to_csv('fileTSV.csv', sep='t', encoding='utf-8', index=False, line_terminator='rn')
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54021712%2fhow-to-properly-convert-an-xlsx-file-to-a-tsv-file-in-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I was able to solve the problem using pandas dataframe.
import pandas as pd
#Read excel file into a dataframe
data_xlsx = pd.read_excel('excel.xlsx', 'Sheet1', index_col=None)
#Replace all columns having spaces with underscores
data_xlsx.columns = [c.replace(' ', '_') for c in data_xlsx.columns]
#Replace all fields having line breaks with space
df = data_xlsx.replace('n', ' ',regex=True)
#Write dataframe into csv
df.to_csv('fileTSV.csv', sep='t', encoding='utf-8', index=False, line_terminator='rn')
add a comment |
I was able to solve the problem using pandas dataframe.
import pandas as pd
#Read excel file into a dataframe
data_xlsx = pd.read_excel('excel.xlsx', 'Sheet1', index_col=None)
#Replace all columns having spaces with underscores
data_xlsx.columns = [c.replace(' ', '_') for c in data_xlsx.columns]
#Replace all fields having line breaks with space
df = data_xlsx.replace('n', ' ',regex=True)
#Write dataframe into csv
df.to_csv('fileTSV.csv', sep='t', encoding='utf-8', index=False, line_terminator='rn')
add a comment |
I was able to solve the problem using pandas dataframe.
import pandas as pd
#Read excel file into a dataframe
data_xlsx = pd.read_excel('excel.xlsx', 'Sheet1', index_col=None)
#Replace all columns having spaces with underscores
data_xlsx.columns = [c.replace(' ', '_') for c in data_xlsx.columns]
#Replace all fields having line breaks with space
df = data_xlsx.replace('n', ' ',regex=True)
#Write dataframe into csv
df.to_csv('fileTSV.csv', sep='t', encoding='utf-8', index=False, line_terminator='rn')
I was able to solve the problem using pandas dataframe.
import pandas as pd
#Read excel file into a dataframe
data_xlsx = pd.read_excel('excel.xlsx', 'Sheet1', index_col=None)
#Replace all columns having spaces with underscores
data_xlsx.columns = [c.replace(' ', '_') for c in data_xlsx.columns]
#Replace all fields having line breaks with space
df = data_xlsx.replace('n', ' ',regex=True)
#Write dataframe into csv
df.to_csv('fileTSV.csv', sep='t', encoding='utf-8', index=False, line_terminator='rn')
answered Jan 4 at 16:27
MalykMalyk
4419
4419
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54021712%2fhow-to-properly-convert-an-xlsx-file-to-a-tsv-file-in-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown

1
You get perfectly valid TSV file, fields with newline-characters are enclosed in
". What do you need?– Daniel
Jan 3 at 11:59
@Daniel yes they are enclosed in " but because I am going to process the TSV in another workflow, I need it all in a single line.
– Malyk
Jan 3 at 12:09
How does this other workflow handle newlines?
– Daniel
Jan 3 at 12:12
It reads the entire line and maps a column to a value. So in the second line it doesn't read the entire line correctly I believe.
– Malyk
Jan 3 at 13:07
1
@Malyk "...it doesn't read the entire line correctly I believe". Did you try it? The entry looks properly quoted and a correct CSV parser should handle it.
– Mark Tolonen
Jan 3 at 17:02