How to properly convert an XLSX file to a TSV file in Python?





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







0















I'm converting large XLSX files (over 60 columns and 3000 rows) to TSV format. Some fields contain texts with 2-3 paragraphs (multi line breaks).
I want support UTF-8 and I want each row to be displayed on a line in the resulting TSV



I did the following:



import xlrd
import csv

# open the tsv file(output) in unicode format
with open('outTSV.tsv', 'w', encoding='utf-8') as TSVfile:
wr = csv.writer(TSVfile, delimiter="t")

# open the xlsx file
xlfile = xlrd.open_workbook('inXLSX.xlsx')
# retrieve sheet
sheet = xlfile.sheet_by_index(0)

# write rows into TSVfile
for row in range(sheet.nrows):
wr.writerow(sheet.row_values(row))


I expected each row in the XLSX file to transform to a row in the TSV file. However, because some cells have paragraphs, it translates them as newlines. Therefore I get deformed TSV files.



XLSX file



XLSX file



Deformed TSV file
Deformed TSV file










share|improve this question


















  • 1





    You get perfectly valid TSV file, fields with newline-characters are enclosed in ". What do you need?

    – Daniel
    Jan 3 at 11:59











  • @Daniel yes they are enclosed in " but because I am going to process the TSV in another workflow, I need it all in a single line.

    – Malyk
    Jan 3 at 12:09











  • How does this other workflow handle newlines?

    – Daniel
    Jan 3 at 12:12











  • It reads the entire line and maps a column to a value. So in the second line it doesn't read the entire line correctly I believe.

    – Malyk
    Jan 3 at 13:07








  • 1





    @Malyk "...it doesn't read the entire line correctly I believe". Did you try it? The entry looks properly quoted and a correct CSV parser should handle it.

    – Mark Tolonen
    Jan 3 at 17:02


















0















I'm converting large XLSX files (over 60 columns and 3000 rows) to TSV format. Some fields contain texts with 2-3 paragraphs (multi line breaks).
I want support UTF-8 and I want each row to be displayed on a line in the resulting TSV



I did the following:



import xlrd
import csv

# open the tsv file(output) in unicode format
with open('outTSV.tsv', 'w', encoding='utf-8') as TSVfile:
wr = csv.writer(TSVfile, delimiter="t")

# open the xlsx file
xlfile = xlrd.open_workbook('inXLSX.xlsx')
# retrieve sheet
sheet = xlfile.sheet_by_index(0)

# write rows into TSVfile
for row in range(sheet.nrows):
wr.writerow(sheet.row_values(row))


I expected each row in the XLSX file to transform to a row in the TSV file. However, because some cells have paragraphs, it translates them as newlines. Therefore I get deformed TSV files.



XLSX file



XLSX file



Deformed TSV file
Deformed TSV file










share|improve this question


















  • 1





    You get perfectly valid TSV file, fields with newline-characters are enclosed in ". What do you need?

    – Daniel
    Jan 3 at 11:59











  • @Daniel yes they are enclosed in " but because I am going to process the TSV in another workflow, I need it all in a single line.

    – Malyk
    Jan 3 at 12:09











  • How does this other workflow handle newlines?

    – Daniel
    Jan 3 at 12:12











  • It reads the entire line and maps a column to a value. So in the second line it doesn't read the entire line correctly I believe.

    – Malyk
    Jan 3 at 13:07








  • 1





    @Malyk "...it doesn't read the entire line correctly I believe". Did you try it? The entry looks properly quoted and a correct CSV parser should handle it.

    – Mark Tolonen
    Jan 3 at 17:02














0












0








0








I'm converting large XLSX files (over 60 columns and 3000 rows) to TSV format. Some fields contain texts with 2-3 paragraphs (multi line breaks).
I want support UTF-8 and I want each row to be displayed on a line in the resulting TSV



I did the following:



import xlrd
import csv

# open the tsv file(output) in unicode format
with open('outTSV.tsv', 'w', encoding='utf-8') as TSVfile:
wr = csv.writer(TSVfile, delimiter="t")

# open the xlsx file
xlfile = xlrd.open_workbook('inXLSX.xlsx')
# retrieve sheet
sheet = xlfile.sheet_by_index(0)

# write rows into TSVfile
for row in range(sheet.nrows):
wr.writerow(sheet.row_values(row))


I expected each row in the XLSX file to transform to a row in the TSV file. However, because some cells have paragraphs, it translates them as newlines. Therefore I get deformed TSV files.



XLSX file



XLSX file



Deformed TSV file
Deformed TSV file










share|improve this question














I'm converting large XLSX files (over 60 columns and 3000 rows) to TSV format. Some fields contain texts with 2-3 paragraphs (multi line breaks).
I want support UTF-8 and I want each row to be displayed on a line in the resulting TSV



I did the following:



import xlrd
import csv

# open the tsv file(output) in unicode format
with open('outTSV.tsv', 'w', encoding='utf-8') as TSVfile:
wr = csv.writer(TSVfile, delimiter="t")

# open the xlsx file
xlfile = xlrd.open_workbook('inXLSX.xlsx')
# retrieve sheet
sheet = xlfile.sheet_by_index(0)

# write rows into TSVfile
for row in range(sheet.nrows):
wr.writerow(sheet.row_values(row))


I expected each row in the XLSX file to transform to a row in the TSV file. However, because some cells have paragraphs, it translates them as newlines. Therefore I get deformed TSV files.



XLSX file



XLSX file



Deformed TSV file
Deformed TSV file







python csv unicode xlsx






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Jan 3 at 11:50









MalykMalyk

4419




4419








  • 1





    You get perfectly valid TSV file, fields with newline-characters are enclosed in ". What do you need?

    – Daniel
    Jan 3 at 11:59











  • @Daniel yes they are enclosed in " but because I am going to process the TSV in another workflow, I need it all in a single line.

    – Malyk
    Jan 3 at 12:09











  • How does this other workflow handle newlines?

    – Daniel
    Jan 3 at 12:12











  • It reads the entire line and maps a column to a value. So in the second line it doesn't read the entire line correctly I believe.

    – Malyk
    Jan 3 at 13:07








  • 1





    @Malyk "...it doesn't read the entire line correctly I believe". Did you try it? The entry looks properly quoted and a correct CSV parser should handle it.

    – Mark Tolonen
    Jan 3 at 17:02














  • 1





    You get perfectly valid TSV file, fields with newline-characters are enclosed in ". What do you need?

    – Daniel
    Jan 3 at 11:59











  • @Daniel yes they are enclosed in " but because I am going to process the TSV in another workflow, I need it all in a single line.

    – Malyk
    Jan 3 at 12:09











  • How does this other workflow handle newlines?

    – Daniel
    Jan 3 at 12:12











  • It reads the entire line and maps a column to a value. So in the second line it doesn't read the entire line correctly I believe.

    – Malyk
    Jan 3 at 13:07








  • 1





    @Malyk "...it doesn't read the entire line correctly I believe". Did you try it? The entry looks properly quoted and a correct CSV parser should handle it.

    – Mark Tolonen
    Jan 3 at 17:02








1




1





You get perfectly valid TSV file, fields with newline-characters are enclosed in ". What do you need?

– Daniel
Jan 3 at 11:59





You get perfectly valid TSV file, fields with newline-characters are enclosed in ". What do you need?

– Daniel
Jan 3 at 11:59













@Daniel yes they are enclosed in " but because I am going to process the TSV in another workflow, I need it all in a single line.

– Malyk
Jan 3 at 12:09





@Daniel yes they are enclosed in " but because I am going to process the TSV in another workflow, I need it all in a single line.

– Malyk
Jan 3 at 12:09













How does this other workflow handle newlines?

– Daniel
Jan 3 at 12:12





How does this other workflow handle newlines?

– Daniel
Jan 3 at 12:12













It reads the entire line and maps a column to a value. So in the second line it doesn't read the entire line correctly I believe.

– Malyk
Jan 3 at 13:07







It reads the entire line and maps a column to a value. So in the second line it doesn't read the entire line correctly I believe.

– Malyk
Jan 3 at 13:07






1




1





@Malyk "...it doesn't read the entire line correctly I believe". Did you try it? The entry looks properly quoted and a correct CSV parser should handle it.

– Mark Tolonen
Jan 3 at 17:02





@Malyk "...it doesn't read the entire line correctly I believe". Did you try it? The entry looks properly quoted and a correct CSV parser should handle it.

– Mark Tolonen
Jan 3 at 17:02












1 Answer
1






active

oldest

votes


















0














I was able to solve the problem using pandas dataframe.



import pandas as pd

#Read excel file into a dataframe
data_xlsx = pd.read_excel('excel.xlsx', 'Sheet1', index_col=None)

#Replace all columns having spaces with underscores
data_xlsx.columns = [c.replace(' ', '_') for c in data_xlsx.columns]

#Replace all fields having line breaks with space
df = data_xlsx.replace('n', ' ',regex=True)

#Write dataframe into csv
df.to_csv('fileTSV.csv', sep='t', encoding='utf-8', index=False, line_terminator='rn')





share|improve this answer
























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54021712%2fhow-to-properly-convert-an-xlsx-file-to-a-tsv-file-in-python%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    I was able to solve the problem using pandas dataframe.



    import pandas as pd

    #Read excel file into a dataframe
    data_xlsx = pd.read_excel('excel.xlsx', 'Sheet1', index_col=None)

    #Replace all columns having spaces with underscores
    data_xlsx.columns = [c.replace(' ', '_') for c in data_xlsx.columns]

    #Replace all fields having line breaks with space
    df = data_xlsx.replace('n', ' ',regex=True)

    #Write dataframe into csv
    df.to_csv('fileTSV.csv', sep='t', encoding='utf-8', index=False, line_terminator='rn')





    share|improve this answer




























      0














      I was able to solve the problem using pandas dataframe.



      import pandas as pd

      #Read excel file into a dataframe
      data_xlsx = pd.read_excel('excel.xlsx', 'Sheet1', index_col=None)

      #Replace all columns having spaces with underscores
      data_xlsx.columns = [c.replace(' ', '_') for c in data_xlsx.columns]

      #Replace all fields having line breaks with space
      df = data_xlsx.replace('n', ' ',regex=True)

      #Write dataframe into csv
      df.to_csv('fileTSV.csv', sep='t', encoding='utf-8', index=False, line_terminator='rn')





      share|improve this answer


























        0












        0








        0







        I was able to solve the problem using pandas dataframe.



        import pandas as pd

        #Read excel file into a dataframe
        data_xlsx = pd.read_excel('excel.xlsx', 'Sheet1', index_col=None)

        #Replace all columns having spaces with underscores
        data_xlsx.columns = [c.replace(' ', '_') for c in data_xlsx.columns]

        #Replace all fields having line breaks with space
        df = data_xlsx.replace('n', ' ',regex=True)

        #Write dataframe into csv
        df.to_csv('fileTSV.csv', sep='t', encoding='utf-8', index=False, line_terminator='rn')





        share|improve this answer













        I was able to solve the problem using pandas dataframe.



        import pandas as pd

        #Read excel file into a dataframe
        data_xlsx = pd.read_excel('excel.xlsx', 'Sheet1', index_col=None)

        #Replace all columns having spaces with underscores
        data_xlsx.columns = [c.replace(' ', '_') for c in data_xlsx.columns]

        #Replace all fields having line breaks with space
        df = data_xlsx.replace('n', ' ',regex=True)

        #Write dataframe into csv
        df.to_csv('fileTSV.csv', sep='t', encoding='utf-8', index=False, line_terminator='rn')






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jan 4 at 16:27









        MalykMalyk

        4419




        4419
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54021712%2fhow-to-properly-convert-an-xlsx-file-to-a-tsv-file-in-python%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            MongoDB - Not Authorized To Execute Command

            How to fix TextFormField cause rebuild widget in Flutter

            in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith