Speed Up Loop Time Iterating Over Relatively Large Excel Sheet in Python












0














I'm attempting to extract data from each row of an excel sheet when the number I am testing for each row matches the predefined ranges I have created. The code works, but moves at a crawl. I'm attempting to iterate over a sheet that contains 200,000 rows of data, and noticed that when I break the sheet down into smaller segments the program executes faster, but still not as fast as I would like. I can't quite figure out why that is either.



Any recommendations on how I might get this to speed up?



import xlsxwriter
import openpyxl
import os
from tqdm import tqdm

os.chdir(r'C:UsersJTNeumayDesktoptest folder')
xlfile = 'test.xlsx'
wb = openpyxl.load_workbook(xlfile, read_only=True)
sheet = wb['Sheet1']

newbook = xlsxwriter.Workbook('Commercial.xlsx')
newbook2 = xlsxwriter.Workbook('Industrial.xlsx')
newsheet = newbook.add_worksheet()
newsheet2 = newbook2.add_worksheet()
i = 1
j = 1

for row in tqdm(range(1, sheet.max_row + 1)):
check = sheet.cell(row=row, column=11).value
if check is not None:
if (220000 <= check <= 221310 or 221320 <= check <= 221330 or 237200 <= check <= 237210 or 334611 <= check
<= 334612 or 420000 <= check <= 493190 or 511210 <= check <= 512210 or 512240 <= check <= 541310 or
541330 <= check <= 541340 or 541370 <= check <= 541700 or 541720 <= check <= 562900 or 562920 <= check
<= 811430 or 812000 <= check <= 983000):
for column in range(1, sheet.max_column + 1):
newsheet.write(i, column, sheet.cell(row=row, column=column).value)
i += 1
elif (210000 <= check <= 213115 or 230000 <= check <= 237130 or 237300 <= check <= 334610 or 334613 <=
check <= 339999 or 510000 <= check <= 511200 or 512220 <= check <= 512230 or check == 541320 or
check == 541360 or check == 541710 or check == 562910 or check == 811490):
for column in range(1, sheet.max_column + 1):
newsheet2.write(i, column, sheet.cell(row=row, column=column).value)
j += 1
else:
pass
else:
pass
newbook.close()
newbook2.close()


Note that I also used a method in which I created two lists with the ranges in the if statements, and found that that method took much longer.










share|improve this question






















  • Can you provide a sample of the sheet? Thanks.
    – Deepak Saini
    Nov 21 '18 at 4:05










  • I unfortunately cannot, it's for work so the information is proprietary. Unless you mean a sample of the way the sheet is formatted, which is that there is a 6 digit number in the 11th column and the rest of the columns have info that are not important to the program.
    – STRAT0CAST3R
    Nov 21 '18 at 18:56


















0














I'm attempting to extract data from each row of an excel sheet when the number I am testing for each row matches the predefined ranges I have created. The code works, but moves at a crawl. I'm attempting to iterate over a sheet that contains 200,000 rows of data, and noticed that when I break the sheet down into smaller segments the program executes faster, but still not as fast as I would like. I can't quite figure out why that is either.



Any recommendations on how I might get this to speed up?



import xlsxwriter
import openpyxl
import os
from tqdm import tqdm

os.chdir(r'C:UsersJTNeumayDesktoptest folder')
xlfile = 'test.xlsx'
wb = openpyxl.load_workbook(xlfile, read_only=True)
sheet = wb['Sheet1']

newbook = xlsxwriter.Workbook('Commercial.xlsx')
newbook2 = xlsxwriter.Workbook('Industrial.xlsx')
newsheet = newbook.add_worksheet()
newsheet2 = newbook2.add_worksheet()
i = 1
j = 1

for row in tqdm(range(1, sheet.max_row + 1)):
check = sheet.cell(row=row, column=11).value
if check is not None:
if (220000 <= check <= 221310 or 221320 <= check <= 221330 or 237200 <= check <= 237210 or 334611 <= check
<= 334612 or 420000 <= check <= 493190 or 511210 <= check <= 512210 or 512240 <= check <= 541310 or
541330 <= check <= 541340 or 541370 <= check <= 541700 or 541720 <= check <= 562900 or 562920 <= check
<= 811430 or 812000 <= check <= 983000):
for column in range(1, sheet.max_column + 1):
newsheet.write(i, column, sheet.cell(row=row, column=column).value)
i += 1
elif (210000 <= check <= 213115 or 230000 <= check <= 237130 or 237300 <= check <= 334610 or 334613 <=
check <= 339999 or 510000 <= check <= 511200 or 512220 <= check <= 512230 or check == 541320 or
check == 541360 or check == 541710 or check == 562910 or check == 811490):
for column in range(1, sheet.max_column + 1):
newsheet2.write(i, column, sheet.cell(row=row, column=column).value)
j += 1
else:
pass
else:
pass
newbook.close()
newbook2.close()


Note that I also used a method in which I created two lists with the ranges in the if statements, and found that that method took much longer.










share|improve this question






















  • Can you provide a sample of the sheet? Thanks.
    – Deepak Saini
    Nov 21 '18 at 4:05










  • I unfortunately cannot, it's for work so the information is proprietary. Unless you mean a sample of the way the sheet is formatted, which is that there is a 6 digit number in the 11th column and the rest of the columns have info that are not important to the program.
    – STRAT0CAST3R
    Nov 21 '18 at 18:56
















0












0








0







I'm attempting to extract data from each row of an excel sheet when the number I am testing for each row matches the predefined ranges I have created. The code works, but moves at a crawl. I'm attempting to iterate over a sheet that contains 200,000 rows of data, and noticed that when I break the sheet down into smaller segments the program executes faster, but still not as fast as I would like. I can't quite figure out why that is either.



Any recommendations on how I might get this to speed up?



import xlsxwriter
import openpyxl
import os
from tqdm import tqdm

os.chdir(r'C:UsersJTNeumayDesktoptest folder')
xlfile = 'test.xlsx'
wb = openpyxl.load_workbook(xlfile, read_only=True)
sheet = wb['Sheet1']

newbook = xlsxwriter.Workbook('Commercial.xlsx')
newbook2 = xlsxwriter.Workbook('Industrial.xlsx')
newsheet = newbook.add_worksheet()
newsheet2 = newbook2.add_worksheet()
i = 1
j = 1

for row in tqdm(range(1, sheet.max_row + 1)):
check = sheet.cell(row=row, column=11).value
if check is not None:
if (220000 <= check <= 221310 or 221320 <= check <= 221330 or 237200 <= check <= 237210 or 334611 <= check
<= 334612 or 420000 <= check <= 493190 or 511210 <= check <= 512210 or 512240 <= check <= 541310 or
541330 <= check <= 541340 or 541370 <= check <= 541700 or 541720 <= check <= 562900 or 562920 <= check
<= 811430 or 812000 <= check <= 983000):
for column in range(1, sheet.max_column + 1):
newsheet.write(i, column, sheet.cell(row=row, column=column).value)
i += 1
elif (210000 <= check <= 213115 or 230000 <= check <= 237130 or 237300 <= check <= 334610 or 334613 <=
check <= 339999 or 510000 <= check <= 511200 or 512220 <= check <= 512230 or check == 541320 or
check == 541360 or check == 541710 or check == 562910 or check == 811490):
for column in range(1, sheet.max_column + 1):
newsheet2.write(i, column, sheet.cell(row=row, column=column).value)
j += 1
else:
pass
else:
pass
newbook.close()
newbook2.close()


Note that I also used a method in which I created two lists with the ranges in the if statements, and found that that method took much longer.










share|improve this question













I'm attempting to extract data from each row of an excel sheet when the number I am testing for each row matches the predefined ranges I have created. The code works, but moves at a crawl. I'm attempting to iterate over a sheet that contains 200,000 rows of data, and noticed that when I break the sheet down into smaller segments the program executes faster, but still not as fast as I would like. I can't quite figure out why that is either.



Any recommendations on how I might get this to speed up?



import xlsxwriter
import openpyxl
import os
from tqdm import tqdm

os.chdir(r'C:UsersJTNeumayDesktoptest folder')
xlfile = 'test.xlsx'
wb = openpyxl.load_workbook(xlfile, read_only=True)
sheet = wb['Sheet1']

newbook = xlsxwriter.Workbook('Commercial.xlsx')
newbook2 = xlsxwriter.Workbook('Industrial.xlsx')
newsheet = newbook.add_worksheet()
newsheet2 = newbook2.add_worksheet()
i = 1
j = 1

for row in tqdm(range(1, sheet.max_row + 1)):
check = sheet.cell(row=row, column=11).value
if check is not None:
if (220000 <= check <= 221310 or 221320 <= check <= 221330 or 237200 <= check <= 237210 or 334611 <= check
<= 334612 or 420000 <= check <= 493190 or 511210 <= check <= 512210 or 512240 <= check <= 541310 or
541330 <= check <= 541340 or 541370 <= check <= 541700 or 541720 <= check <= 562900 or 562920 <= check
<= 811430 or 812000 <= check <= 983000):
for column in range(1, sheet.max_column + 1):
newsheet.write(i, column, sheet.cell(row=row, column=column).value)
i += 1
elif (210000 <= check <= 213115 or 230000 <= check <= 237130 or 237300 <= check <= 334610 or 334613 <=
check <= 339999 or 510000 <= check <= 511200 or 512220 <= check <= 512230 or check == 541320 or
check == 541360 or check == 541710 or check == 562910 or check == 811490):
for column in range(1, sheet.max_column + 1):
newsheet2.write(i, column, sheet.cell(row=row, column=column).value)
j += 1
else:
pass
else:
pass
newbook.close()
newbook2.close()


Note that I also used a method in which I created two lists with the ranges in the if statements, and found that that method took much longer.







python excel python-3.x performance






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 19 '18 at 17:41









STRAT0CAST3R

13




13












  • Can you provide a sample of the sheet? Thanks.
    – Deepak Saini
    Nov 21 '18 at 4:05










  • I unfortunately cannot, it's for work so the information is proprietary. Unless you mean a sample of the way the sheet is formatted, which is that there is a 6 digit number in the 11th column and the rest of the columns have info that are not important to the program.
    – STRAT0CAST3R
    Nov 21 '18 at 18:56




















  • Can you provide a sample of the sheet? Thanks.
    – Deepak Saini
    Nov 21 '18 at 4:05










  • I unfortunately cannot, it's for work so the information is proprietary. Unless you mean a sample of the way the sheet is formatted, which is that there is a 6 digit number in the 11th column and the rest of the columns have info that are not important to the program.
    – STRAT0CAST3R
    Nov 21 '18 at 18:56


















Can you provide a sample of the sheet? Thanks.
– Deepak Saini
Nov 21 '18 at 4:05




Can you provide a sample of the sheet? Thanks.
– Deepak Saini
Nov 21 '18 at 4:05












I unfortunately cannot, it's for work so the information is proprietary. Unless you mean a sample of the way the sheet is formatted, which is that there is a 6 digit number in the 11th column and the rest of the columns have info that are not important to the program.
– STRAT0CAST3R
Nov 21 '18 at 18:56






I unfortunately cannot, it's for work so the information is proprietary. Unless you mean a sample of the way the sheet is formatted, which is that there is a 6 digit number in the 11th column and the rest of the columns have info that are not important to the program.
– STRAT0CAST3R
Nov 21 '18 at 18:56














1 Answer
1






active

oldest

votes


















0















Question: Speed Up Loop Time Iterating




Try this openpyxl only solution:




Come back and report your speed experience.




# Create new Workbooks
wb1 = Workbook()
ws1 = wb1.active
wb2 = Workbook()
ws2 = wb2.active

# Predefine Ranges and Target Worksheet
# You can gain additional speed, by sorting the range tuples
# of the most expected Ranges, to the beginning.
ranges = [(2, 3, ws1), (221320, 221330, ws2), (237200, 237210, ws1), (812000, 983000, ws2)]

# The Row to start
min_row = 2

# Iterate all Cells in Column 'min/max_col'
for row, cell in enumerate(next(ws.iter_cols(min_row=min_row, min_col=2, max_col=2)), min_row):

if cell.value:
# Condition: If cell.value in Range(...)
for r in ranges:
if (r[0] <= cell.value <= r[1]):
# Match: Append this 'row' to the given Worksheet
r[2].append([cell.value for cell in next(ws.iter_rows(min_row=row, max_row=row))])
break

# Save Worksheets
wb1.save('../test/test1.xlsx')
wb2.save('../test/test2.xlsx')


Tested with Python: 3.4.2 - openpyxl: 2.4.1 - LibreOffice: 4.3.3.2






share|improve this answer

















  • 1




    Gives me just about the same speed that I was getting previously
    – STRAT0CAST3R
    Nov 19 '18 at 22:37










  • As it's possible to split your input xlsx in chunks, therefore multiprocessing is a option. But, you have to pay for it, means 2 processes does not result in half excution time.
    – stovfl
    Nov 20 '18 at 16:15










  • I ended up using a concurrent.futures.ProcessPoolExecutor map and broke up the source data into multiple sheets to get it to multiprocess, and they did run concurrently. The thing I can't figure out is why analyzing a larger number of cells causes the program to run exponentially longer. it's as if it's loading the whole sheet with each iteration
    – STRAT0CAST3R
    Nov 21 '18 at 22:01












  • "why analyzing a larger number of cells ... run exponentially longer": Read openpyxl-read-excel-too-slow and openpyxl-optimizing-cells-search-speed
    – stovfl
    Nov 22 '18 at 8:09











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53379998%2fspeed-up-loop-time-iterating-over-relatively-large-excel-sheet-in-python%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0















Question: Speed Up Loop Time Iterating




Try this openpyxl only solution:




Come back and report your speed experience.




# Create new Workbooks
wb1 = Workbook()
ws1 = wb1.active
wb2 = Workbook()
ws2 = wb2.active

# Predefine Ranges and Target Worksheet
# You can gain additional speed, by sorting the range tuples
# of the most expected Ranges, to the beginning.
ranges = [(2, 3, ws1), (221320, 221330, ws2), (237200, 237210, ws1), (812000, 983000, ws2)]

# The Row to start
min_row = 2

# Iterate all Cells in Column 'min/max_col'
for row, cell in enumerate(next(ws.iter_cols(min_row=min_row, min_col=2, max_col=2)), min_row):

if cell.value:
# Condition: If cell.value in Range(...)
for r in ranges:
if (r[0] <= cell.value <= r[1]):
# Match: Append this 'row' to the given Worksheet
r[2].append([cell.value for cell in next(ws.iter_rows(min_row=row, max_row=row))])
break

# Save Worksheets
wb1.save('../test/test1.xlsx')
wb2.save('../test/test2.xlsx')


Tested with Python: 3.4.2 - openpyxl: 2.4.1 - LibreOffice: 4.3.3.2






share|improve this answer

















  • 1




    Gives me just about the same speed that I was getting previously
    – STRAT0CAST3R
    Nov 19 '18 at 22:37










  • As it's possible to split your input xlsx in chunks, therefore multiprocessing is a option. But, you have to pay for it, means 2 processes does not result in half excution time.
    – stovfl
    Nov 20 '18 at 16:15










  • I ended up using a concurrent.futures.ProcessPoolExecutor map and broke up the source data into multiple sheets to get it to multiprocess, and they did run concurrently. The thing I can't figure out is why analyzing a larger number of cells causes the program to run exponentially longer. it's as if it's loading the whole sheet with each iteration
    – STRAT0CAST3R
    Nov 21 '18 at 22:01












  • "why analyzing a larger number of cells ... run exponentially longer": Read openpyxl-read-excel-too-slow and openpyxl-optimizing-cells-search-speed
    – stovfl
    Nov 22 '18 at 8:09
















0















Question: Speed Up Loop Time Iterating




Try this openpyxl only solution:




Come back and report your speed experience.




# Create new Workbooks
wb1 = Workbook()
ws1 = wb1.active
wb2 = Workbook()
ws2 = wb2.active

# Predefine Ranges and Target Worksheet
# You can gain additional speed, by sorting the range tuples
# of the most expected Ranges, to the beginning.
ranges = [(2, 3, ws1), (221320, 221330, ws2), (237200, 237210, ws1), (812000, 983000, ws2)]

# The Row to start
min_row = 2

# Iterate all Cells in Column 'min/max_col'
for row, cell in enumerate(next(ws.iter_cols(min_row=min_row, min_col=2, max_col=2)), min_row):

if cell.value:
# Condition: If cell.value in Range(...)
for r in ranges:
if (r[0] <= cell.value <= r[1]):
# Match: Append this 'row' to the given Worksheet
r[2].append([cell.value for cell in next(ws.iter_rows(min_row=row, max_row=row))])
break

# Save Worksheets
wb1.save('../test/test1.xlsx')
wb2.save('../test/test2.xlsx')


Tested with Python: 3.4.2 - openpyxl: 2.4.1 - LibreOffice: 4.3.3.2






share|improve this answer

















  • 1




    Gives me just about the same speed that I was getting previously
    – STRAT0CAST3R
    Nov 19 '18 at 22:37










  • As it's possible to split your input xlsx in chunks, therefore multiprocessing is a option. But, you have to pay for it, means 2 processes does not result in half excution time.
    – stovfl
    Nov 20 '18 at 16:15










  • I ended up using a concurrent.futures.ProcessPoolExecutor map and broke up the source data into multiple sheets to get it to multiprocess, and they did run concurrently. The thing I can't figure out is why analyzing a larger number of cells causes the program to run exponentially longer. it's as if it's loading the whole sheet with each iteration
    – STRAT0CAST3R
    Nov 21 '18 at 22:01












  • "why analyzing a larger number of cells ... run exponentially longer": Read openpyxl-read-excel-too-slow and openpyxl-optimizing-cells-search-speed
    – stovfl
    Nov 22 '18 at 8:09














0












0








0







Question: Speed Up Loop Time Iterating




Try this openpyxl only solution:




Come back and report your speed experience.




# Create new Workbooks
wb1 = Workbook()
ws1 = wb1.active
wb2 = Workbook()
ws2 = wb2.active

# Predefine Ranges and Target Worksheet
# You can gain additional speed, by sorting the range tuples
# of the most expected Ranges, to the beginning.
ranges = [(2, 3, ws1), (221320, 221330, ws2), (237200, 237210, ws1), (812000, 983000, ws2)]

# The Row to start
min_row = 2

# Iterate all Cells in Column 'min/max_col'
for row, cell in enumerate(next(ws.iter_cols(min_row=min_row, min_col=2, max_col=2)), min_row):

if cell.value:
# Condition: If cell.value in Range(...)
for r in ranges:
if (r[0] <= cell.value <= r[1]):
# Match: Append this 'row' to the given Worksheet
r[2].append([cell.value for cell in next(ws.iter_rows(min_row=row, max_row=row))])
break

# Save Worksheets
wb1.save('../test/test1.xlsx')
wb2.save('../test/test2.xlsx')


Tested with Python: 3.4.2 - openpyxl: 2.4.1 - LibreOffice: 4.3.3.2






share|improve this answer













Question: Speed Up Loop Time Iterating




Try this openpyxl only solution:




Come back and report your speed experience.




# Create new Workbooks
wb1 = Workbook()
ws1 = wb1.active
wb2 = Workbook()
ws2 = wb2.active

# Predefine Ranges and Target Worksheet
# You can gain additional speed, by sorting the range tuples
# of the most expected Ranges, to the beginning.
ranges = [(2, 3, ws1), (221320, 221330, ws2), (237200, 237210, ws1), (812000, 983000, ws2)]

# The Row to start
min_row = 2

# Iterate all Cells in Column 'min/max_col'
for row, cell in enumerate(next(ws.iter_cols(min_row=min_row, min_col=2, max_col=2)), min_row):

if cell.value:
# Condition: If cell.value in Range(...)
for r in ranges:
if (r[0] <= cell.value <= r[1]):
# Match: Append this 'row' to the given Worksheet
r[2].append([cell.value for cell in next(ws.iter_rows(min_row=row, max_row=row))])
break

# Save Worksheets
wb1.save('../test/test1.xlsx')
wb2.save('../test/test2.xlsx')


Tested with Python: 3.4.2 - openpyxl: 2.4.1 - LibreOffice: 4.3.3.2







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 19 '18 at 21:18









stovfl

7,3953931




7,3953931








  • 1




    Gives me just about the same speed that I was getting previously
    – STRAT0CAST3R
    Nov 19 '18 at 22:37










  • As it's possible to split your input xlsx in chunks, therefore multiprocessing is a option. But, you have to pay for it, means 2 processes does not result in half excution time.
    – stovfl
    Nov 20 '18 at 16:15










  • I ended up using a concurrent.futures.ProcessPoolExecutor map and broke up the source data into multiple sheets to get it to multiprocess, and they did run concurrently. The thing I can't figure out is why analyzing a larger number of cells causes the program to run exponentially longer. it's as if it's loading the whole sheet with each iteration
    – STRAT0CAST3R
    Nov 21 '18 at 22:01












  • "why analyzing a larger number of cells ... run exponentially longer": Read openpyxl-read-excel-too-slow and openpyxl-optimizing-cells-search-speed
    – stovfl
    Nov 22 '18 at 8:09














  • 1




    Gives me just about the same speed that I was getting previously
    – STRAT0CAST3R
    Nov 19 '18 at 22:37










  • As it's possible to split your input xlsx in chunks, therefore multiprocessing is a option. But, you have to pay for it, means 2 processes does not result in half excution time.
    – stovfl
    Nov 20 '18 at 16:15










  • I ended up using a concurrent.futures.ProcessPoolExecutor map and broke up the source data into multiple sheets to get it to multiprocess, and they did run concurrently. The thing I can't figure out is why analyzing a larger number of cells causes the program to run exponentially longer. it's as if it's loading the whole sheet with each iteration
    – STRAT0CAST3R
    Nov 21 '18 at 22:01












  • "why analyzing a larger number of cells ... run exponentially longer": Read openpyxl-read-excel-too-slow and openpyxl-optimizing-cells-search-speed
    – stovfl
    Nov 22 '18 at 8:09








1




1




Gives me just about the same speed that I was getting previously
– STRAT0CAST3R
Nov 19 '18 at 22:37




Gives me just about the same speed that I was getting previously
– STRAT0CAST3R
Nov 19 '18 at 22:37












As it's possible to split your input xlsx in chunks, therefore multiprocessing is a option. But, you have to pay for it, means 2 processes does not result in half excution time.
– stovfl
Nov 20 '18 at 16:15




As it's possible to split your input xlsx in chunks, therefore multiprocessing is a option. But, you have to pay for it, means 2 processes does not result in half excution time.
– stovfl
Nov 20 '18 at 16:15












I ended up using a concurrent.futures.ProcessPoolExecutor map and broke up the source data into multiple sheets to get it to multiprocess, and they did run concurrently. The thing I can't figure out is why analyzing a larger number of cells causes the program to run exponentially longer. it's as if it's loading the whole sheet with each iteration
– STRAT0CAST3R
Nov 21 '18 at 22:01






I ended up using a concurrent.futures.ProcessPoolExecutor map and broke up the source data into multiple sheets to get it to multiprocess, and they did run concurrently. The thing I can't figure out is why analyzing a larger number of cells causes the program to run exponentially longer. it's as if it's loading the whole sheet with each iteration
– STRAT0CAST3R
Nov 21 '18 at 22:01














"why analyzing a larger number of cells ... run exponentially longer": Read openpyxl-read-excel-too-slow and openpyxl-optimizing-cells-search-speed
– stovfl
Nov 22 '18 at 8:09




"why analyzing a larger number of cells ... run exponentially longer": Read openpyxl-read-excel-too-slow and openpyxl-optimizing-cells-search-speed
– stovfl
Nov 22 '18 at 8:09


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53379998%2fspeed-up-loop-time-iterating-over-relatively-large-excel-sheet-in-python%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

MongoDB - Not Authorized To Execute Command

Npm cannot find a required file even through it is in the searched directory

in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith