How to loop lists of list obtained from python-docx where each list is a table and write the tables into a...












2















I am using python-docx to extract two tables from a document.
I have iterated over the tables and created a list of lists. Each individual list represents a table, and within that I have dictionaries per row. Each dictionary contains a key / value pair. The key is the column heading from the table and value is the cell contents for that row's data for that column.
I am facing difficulty when creating a data frame for each table and writing each table on a seperate excel sheet.



from docx.api import Document
import pandas as pd
import csv
import json
import unicodedata

document = Document('Sampletable1.docx')
tables = document.tables
print (len(tables))
big_data =


for table in document.tables:
data =
Keys = None
for i, row in enumerate(table.rows):
text = (cell.text for cell in row.cells)
if i == 0:
keys = tuple(text)
continue
dic = dict(zip(keys, text))
data.append(dic)
big_data.append(data)
print(big_data)


The output of the above code is:



2



[[{'Asset': 'Growth investments', 'Target investment mix': '66.50%', 'Actual investment mix': '66.30%', 'Variance': '-0.20%'}, {'Asset': 'Defensive investments', 'Target investment mix': '33.50%', 'Actual investment mix': '33.70%', 'Variance': '0.20%'}], [{'Owner': 'REST Super', 'Product': 'Superannuation', 'Type': 'Existing', 'Status': 'Existing', 'Customer 2': 'Customer 1'}, {'Owner': 'TWUSUPER TransPension', 'Product': 'TTR Pension', 'Type': 'New', 'Status': 'New', 'Customer 2': 'Customer 1'}, {'Owner': 'TWUSUPER', 'Product': 'Superannuation', 'Type': 'Existing', 'Status': 'Existing'}]]



How do I access the above lists??



Further I tried to create a pandas data frame



#write the data into a data frame
for thing in big_data:
#print(thing)
df = pd.DataFrame(thing)
print(df)
writer = pd.ExcelWriter('dftable3.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')
writer.save()


I got the first table on the excel but unable to work with second table.
I am expecting both the table to be in the same excel workbook(dftable3.xlsx) but in different worksheets(Sheet1,Sheet2)



I have attached the images of the tables.



Thanks in advance



enter image description hereenter image description here










share|improve this question





























    2















    I am using python-docx to extract two tables from a document.
    I have iterated over the tables and created a list of lists. Each individual list represents a table, and within that I have dictionaries per row. Each dictionary contains a key / value pair. The key is the column heading from the table and value is the cell contents for that row's data for that column.
    I am facing difficulty when creating a data frame for each table and writing each table on a seperate excel sheet.



    from docx.api import Document
    import pandas as pd
    import csv
    import json
    import unicodedata

    document = Document('Sampletable1.docx')
    tables = document.tables
    print (len(tables))
    big_data =


    for table in document.tables:
    data =
    Keys = None
    for i, row in enumerate(table.rows):
    text = (cell.text for cell in row.cells)
    if i == 0:
    keys = tuple(text)
    continue
    dic = dict(zip(keys, text))
    data.append(dic)
    big_data.append(data)
    print(big_data)


    The output of the above code is:



    2



    [[{'Asset': 'Growth investments', 'Target investment mix': '66.50%', 'Actual investment mix': '66.30%', 'Variance': '-0.20%'}, {'Asset': 'Defensive investments', 'Target investment mix': '33.50%', 'Actual investment mix': '33.70%', 'Variance': '0.20%'}], [{'Owner': 'REST Super', 'Product': 'Superannuation', 'Type': 'Existing', 'Status': 'Existing', 'Customer 2': 'Customer 1'}, {'Owner': 'TWUSUPER TransPension', 'Product': 'TTR Pension', 'Type': 'New', 'Status': 'New', 'Customer 2': 'Customer 1'}, {'Owner': 'TWUSUPER', 'Product': 'Superannuation', 'Type': 'Existing', 'Status': 'Existing'}]]



    How do I access the above lists??



    Further I tried to create a pandas data frame



    #write the data into a data frame
    for thing in big_data:
    #print(thing)
    df = pd.DataFrame(thing)
    print(df)
    writer = pd.ExcelWriter('dftable3.xlsx', engine='xlsxwriter')
    df.to_excel(writer, sheet_name='Sheet1')
    writer.save()


    I got the first table on the excel but unable to work with second table.
    I am expecting both the table to be in the same excel workbook(dftable3.xlsx) but in different worksheets(Sheet1,Sheet2)



    I have attached the images of the tables.



    Thanks in advance



    enter image description hereenter image description here










    share|improve this question



























      2












      2








      2








      I am using python-docx to extract two tables from a document.
      I have iterated over the tables and created a list of lists. Each individual list represents a table, and within that I have dictionaries per row. Each dictionary contains a key / value pair. The key is the column heading from the table and value is the cell contents for that row's data for that column.
      I am facing difficulty when creating a data frame for each table and writing each table on a seperate excel sheet.



      from docx.api import Document
      import pandas as pd
      import csv
      import json
      import unicodedata

      document = Document('Sampletable1.docx')
      tables = document.tables
      print (len(tables))
      big_data =


      for table in document.tables:
      data =
      Keys = None
      for i, row in enumerate(table.rows):
      text = (cell.text for cell in row.cells)
      if i == 0:
      keys = tuple(text)
      continue
      dic = dict(zip(keys, text))
      data.append(dic)
      big_data.append(data)
      print(big_data)


      The output of the above code is:



      2



      [[{'Asset': 'Growth investments', 'Target investment mix': '66.50%', 'Actual investment mix': '66.30%', 'Variance': '-0.20%'}, {'Asset': 'Defensive investments', 'Target investment mix': '33.50%', 'Actual investment mix': '33.70%', 'Variance': '0.20%'}], [{'Owner': 'REST Super', 'Product': 'Superannuation', 'Type': 'Existing', 'Status': 'Existing', 'Customer 2': 'Customer 1'}, {'Owner': 'TWUSUPER TransPension', 'Product': 'TTR Pension', 'Type': 'New', 'Status': 'New', 'Customer 2': 'Customer 1'}, {'Owner': 'TWUSUPER', 'Product': 'Superannuation', 'Type': 'Existing', 'Status': 'Existing'}]]



      How do I access the above lists??



      Further I tried to create a pandas data frame



      #write the data into a data frame
      for thing in big_data:
      #print(thing)
      df = pd.DataFrame(thing)
      print(df)
      writer = pd.ExcelWriter('dftable3.xlsx', engine='xlsxwriter')
      df.to_excel(writer, sheet_name='Sheet1')
      writer.save()


      I got the first table on the excel but unable to work with second table.
      I am expecting both the table to be in the same excel workbook(dftable3.xlsx) but in different worksheets(Sheet1,Sheet2)



      I have attached the images of the tables.



      Thanks in advance



      enter image description hereenter image description here










      share|improve this question
















      I am using python-docx to extract two tables from a document.
      I have iterated over the tables and created a list of lists. Each individual list represents a table, and within that I have dictionaries per row. Each dictionary contains a key / value pair. The key is the column heading from the table and value is the cell contents for that row's data for that column.
      I am facing difficulty when creating a data frame for each table and writing each table on a seperate excel sheet.



      from docx.api import Document
      import pandas as pd
      import csv
      import json
      import unicodedata

      document = Document('Sampletable1.docx')
      tables = document.tables
      print (len(tables))
      big_data =


      for table in document.tables:
      data =
      Keys = None
      for i, row in enumerate(table.rows):
      text = (cell.text for cell in row.cells)
      if i == 0:
      keys = tuple(text)
      continue
      dic = dict(zip(keys, text))
      data.append(dic)
      big_data.append(data)
      print(big_data)


      The output of the above code is:



      2



      [[{'Asset': 'Growth investments', 'Target investment mix': '66.50%', 'Actual investment mix': '66.30%', 'Variance': '-0.20%'}, {'Asset': 'Defensive investments', 'Target investment mix': '33.50%', 'Actual investment mix': '33.70%', 'Variance': '0.20%'}], [{'Owner': 'REST Super', 'Product': 'Superannuation', 'Type': 'Existing', 'Status': 'Existing', 'Customer 2': 'Customer 1'}, {'Owner': 'TWUSUPER TransPension', 'Product': 'TTR Pension', 'Type': 'New', 'Status': 'New', 'Customer 2': 'Customer 1'}, {'Owner': 'TWUSUPER', 'Product': 'Superannuation', 'Type': 'Existing', 'Status': 'Existing'}]]



      How do I access the above lists??



      Further I tried to create a pandas data frame



      #write the data into a data frame
      for thing in big_data:
      #print(thing)
      df = pd.DataFrame(thing)
      print(df)
      writer = pd.ExcelWriter('dftable3.xlsx', engine='xlsxwriter')
      df.to_excel(writer, sheet_name='Sheet1')
      writer.save()


      I got the first table on the excel but unable to work with second table.
      I am expecting both the table to be in the same excel workbook(dftable3.xlsx) but in different worksheets(Sheet1,Sheet2)



      I have attached the images of the tables.



      Thanks in advance



      enter image description hereenter image description here







      python-3.x pandas python-docx pandas.excelwriter






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Jan 3 at 2:37







      user8720570

















      asked Jan 3 at 1:33









      user8720570user8720570

      266




      266
























          1 Answer
          1






          active

          oldest

          votes


















          2















          How do I access the above lists??




          You already did, by iterating over them, or printing them.
          Consider using the pretty-print library:



          import pprint
          pprint.pprint(big_data)



          I am expecting ... different worksheets(Sheet1,Sheet2)




          Well, that's unlikely, given the constant 'Sheet1' argument you supplied.
          Here is one way to accomplish that:



          writer = pd.ExcelWriter('dftable3.xlsx', engine='xlsxwriter')
          for i, thing in enumerate(big_data):
          df = pd.DataFrame(thing)
          df.to_excel(writer, sheet_name=f'Sheet{i}')
          writer.save()


          Note the scope of writer -- it must be longer lived than each of the constituent dfs.






          share|improve this answer
























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54015290%2fhow-to-loop-lists-of-list-obtained-from-python-docx-where-each-list-is-a-table-a%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            2















            How do I access the above lists??




            You already did, by iterating over them, or printing them.
            Consider using the pretty-print library:



            import pprint
            pprint.pprint(big_data)



            I am expecting ... different worksheets(Sheet1,Sheet2)




            Well, that's unlikely, given the constant 'Sheet1' argument you supplied.
            Here is one way to accomplish that:



            writer = pd.ExcelWriter('dftable3.xlsx', engine='xlsxwriter')
            for i, thing in enumerate(big_data):
            df = pd.DataFrame(thing)
            df.to_excel(writer, sheet_name=f'Sheet{i}')
            writer.save()


            Note the scope of writer -- it must be longer lived than each of the constituent dfs.






            share|improve this answer




























              2















              How do I access the above lists??




              You already did, by iterating over them, or printing them.
              Consider using the pretty-print library:



              import pprint
              pprint.pprint(big_data)



              I am expecting ... different worksheets(Sheet1,Sheet2)




              Well, that's unlikely, given the constant 'Sheet1' argument you supplied.
              Here is one way to accomplish that:



              writer = pd.ExcelWriter('dftable3.xlsx', engine='xlsxwriter')
              for i, thing in enumerate(big_data):
              df = pd.DataFrame(thing)
              df.to_excel(writer, sheet_name=f'Sheet{i}')
              writer.save()


              Note the scope of writer -- it must be longer lived than each of the constituent dfs.






              share|improve this answer


























                2












                2








                2








                How do I access the above lists??




                You already did, by iterating over them, or printing them.
                Consider using the pretty-print library:



                import pprint
                pprint.pprint(big_data)



                I am expecting ... different worksheets(Sheet1,Sheet2)




                Well, that's unlikely, given the constant 'Sheet1' argument you supplied.
                Here is one way to accomplish that:



                writer = pd.ExcelWriter('dftable3.xlsx', engine='xlsxwriter')
                for i, thing in enumerate(big_data):
                df = pd.DataFrame(thing)
                df.to_excel(writer, sheet_name=f'Sheet{i}')
                writer.save()


                Note the scope of writer -- it must be longer lived than each of the constituent dfs.






                share|improve this answer














                How do I access the above lists??




                You already did, by iterating over them, or printing them.
                Consider using the pretty-print library:



                import pprint
                pprint.pprint(big_data)



                I am expecting ... different worksheets(Sheet1,Sheet2)




                Well, that's unlikely, given the constant 'Sheet1' argument you supplied.
                Here is one way to accomplish that:



                writer = pd.ExcelWriter('dftable3.xlsx', engine='xlsxwriter')
                for i, thing in enumerate(big_data):
                df = pd.DataFrame(thing)
                df.to_excel(writer, sheet_name=f'Sheet{i}')
                writer.save()


                Note the scope of writer -- it must be longer lived than each of the constituent dfs.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Jan 3 at 3:07









                J_HJ_H

                4,4351821




                4,4351821
































                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54015290%2fhow-to-loop-lists-of-list-obtained-from-python-docx-where-each-list-is-a-table-a%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    MongoDB - Not Authorized To Execute Command

                    in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith

                    How to fix TextFormField cause rebuild widget in Flutter