Using multiprocessing to preprocess images
I am trying to get my feet wet using multiprocessing in python. As such, I am trying to operate an image preprocessing pipeline using multiprocessing. I have all my images in a directory called image_files
and I have a list of all the filenames that are inside this directory. I split the list into two chunks a
and b
and pass each to its own multiprocessing.Process
where a method called preprocess_image
is doing the preprocessing on each image.
Following a tutorial on how to calculate square roots using multiprocessing I came up with a working code (see below).
This code works, however, speed matters and I am not sure whether it is appropriate to define two methods doing basically the same or if it would be faster to use only a single method and simply pass a
and b
to the same target in multiprocessing.Process(target=work...
.
Hence my question is whether this is the right way to use multiprocessing or if I could speed it up somehow?
def work1(array):
for i in tqdm(array):
image_path = "C:/Users/aaron/Desktop/image_files/"+i
image = preprocess_image(image_path)
cv2.imwrite("C:/Users/aaron/Desktop/destination/"+i, image)
def work2(array):
for i in tqdm(array):
image_path = "C:/Users/aaron/Desktop/image_files/"+i
image = preprocess_image(image_path)
cv2.imwrite("C:/Users/aaron/Desktop/destination/"+i, image)
if __name__ == "__main__":
p1 = multiprocessing.Process(target=work1, args=(a,))
p2 = multiprocessing.Process(target=work2, args=(b,))
p1.start()
p2.start()
p1.join()
p2.join()
print("Done!")
python
add a comment |
I am trying to get my feet wet using multiprocessing in python. As such, I am trying to operate an image preprocessing pipeline using multiprocessing. I have all my images in a directory called image_files
and I have a list of all the filenames that are inside this directory. I split the list into two chunks a
and b
and pass each to its own multiprocessing.Process
where a method called preprocess_image
is doing the preprocessing on each image.
Following a tutorial on how to calculate square roots using multiprocessing I came up with a working code (see below).
This code works, however, speed matters and I am not sure whether it is appropriate to define two methods doing basically the same or if it would be faster to use only a single method and simply pass a
and b
to the same target in multiprocessing.Process(target=work...
.
Hence my question is whether this is the right way to use multiprocessing or if I could speed it up somehow?
def work1(array):
for i in tqdm(array):
image_path = "C:/Users/aaron/Desktop/image_files/"+i
image = preprocess_image(image_path)
cv2.imwrite("C:/Users/aaron/Desktop/destination/"+i, image)
def work2(array):
for i in tqdm(array):
image_path = "C:/Users/aaron/Desktop/image_files/"+i
image = preprocess_image(image_path)
cv2.imwrite("C:/Users/aaron/Desktop/destination/"+i, image)
if __name__ == "__main__":
p1 = multiprocessing.Process(target=work1, args=(a,))
p2 = multiprocessing.Process(target=work2, args=(b,))
p1.start()
p2.start()
p1.join()
p2.join()
print("Done!")
python
add a comment |
I am trying to get my feet wet using multiprocessing in python. As such, I am trying to operate an image preprocessing pipeline using multiprocessing. I have all my images in a directory called image_files
and I have a list of all the filenames that are inside this directory. I split the list into two chunks a
and b
and pass each to its own multiprocessing.Process
where a method called preprocess_image
is doing the preprocessing on each image.
Following a tutorial on how to calculate square roots using multiprocessing I came up with a working code (see below).
This code works, however, speed matters and I am not sure whether it is appropriate to define two methods doing basically the same or if it would be faster to use only a single method and simply pass a
and b
to the same target in multiprocessing.Process(target=work...
.
Hence my question is whether this is the right way to use multiprocessing or if I could speed it up somehow?
def work1(array):
for i in tqdm(array):
image_path = "C:/Users/aaron/Desktop/image_files/"+i
image = preprocess_image(image_path)
cv2.imwrite("C:/Users/aaron/Desktop/destination/"+i, image)
def work2(array):
for i in tqdm(array):
image_path = "C:/Users/aaron/Desktop/image_files/"+i
image = preprocess_image(image_path)
cv2.imwrite("C:/Users/aaron/Desktop/destination/"+i, image)
if __name__ == "__main__":
p1 = multiprocessing.Process(target=work1, args=(a,))
p2 = multiprocessing.Process(target=work2, args=(b,))
p1.start()
p2.start()
p1.join()
p2.join()
print("Done!")
python
I am trying to get my feet wet using multiprocessing in python. As such, I am trying to operate an image preprocessing pipeline using multiprocessing. I have all my images in a directory called image_files
and I have a list of all the filenames that are inside this directory. I split the list into two chunks a
and b
and pass each to its own multiprocessing.Process
where a method called preprocess_image
is doing the preprocessing on each image.
Following a tutorial on how to calculate square roots using multiprocessing I came up with a working code (see below).
This code works, however, speed matters and I am not sure whether it is appropriate to define two methods doing basically the same or if it would be faster to use only a single method and simply pass a
and b
to the same target in multiprocessing.Process(target=work...
.
Hence my question is whether this is the right way to use multiprocessing or if I could speed it up somehow?
def work1(array):
for i in tqdm(array):
image_path = "C:/Users/aaron/Desktop/image_files/"+i
image = preprocess_image(image_path)
cv2.imwrite("C:/Users/aaron/Desktop/destination/"+i, image)
def work2(array):
for i in tqdm(array):
image_path = "C:/Users/aaron/Desktop/image_files/"+i
image = preprocess_image(image_path)
cv2.imwrite("C:/Users/aaron/Desktop/destination/"+i, image)
if __name__ == "__main__":
p1 = multiprocessing.Process(target=work1, args=(a,))
p2 = multiprocessing.Process(target=work2, args=(b,))
p1.start()
p2.start()
p1.join()
p2.join()
print("Done!")
python
python
asked Nov 20 '18 at 18:59
AaronDTAaronDT
8532526
8532526
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Since all of your process output seem to be independent, you should use MultiProcessing.Pool
:
from multiprocessing import Pool
l = # list of all your image files
f = # function to modify each of these, taking element of l as input.
p = Pool(10) # however many process you want to spawn
p.map(f, l)
That's it, you don't need to define the same function twice or manually split the list. It'll be automatically assigned and managed for you.
Thank you! This works great! Is the number of processes I can spawn limited by the number of cores I have available? Also I wonder how to stop the entire process while running - using cmd + c doesn't do the trick it seems...
– AaronDT
Nov 20 '18 at 19:52
No, it is not limited, however in this case, more processes than CPU cores are probably not going to help performance, as they'll just wait in a queue. If you have IO intensive application then it make sense to have more process than number of cores. As to stopping it, thePool
generatedaemons
which will eventually be killed, but you might have to wait for the daemon to come back finished with current task - hence not immediate.
– Rocky Li
Nov 20 '18 at 20:09
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53399779%2fusing-multiprocessing-to-preprocess-images%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Since all of your process output seem to be independent, you should use MultiProcessing.Pool
:
from multiprocessing import Pool
l = # list of all your image files
f = # function to modify each of these, taking element of l as input.
p = Pool(10) # however many process you want to spawn
p.map(f, l)
That's it, you don't need to define the same function twice or manually split the list. It'll be automatically assigned and managed for you.
Thank you! This works great! Is the number of processes I can spawn limited by the number of cores I have available? Also I wonder how to stop the entire process while running - using cmd + c doesn't do the trick it seems...
– AaronDT
Nov 20 '18 at 19:52
No, it is not limited, however in this case, more processes than CPU cores are probably not going to help performance, as they'll just wait in a queue. If you have IO intensive application then it make sense to have more process than number of cores. As to stopping it, thePool
generatedaemons
which will eventually be killed, but you might have to wait for the daemon to come back finished with current task - hence not immediate.
– Rocky Li
Nov 20 '18 at 20:09
add a comment |
Since all of your process output seem to be independent, you should use MultiProcessing.Pool
:
from multiprocessing import Pool
l = # list of all your image files
f = # function to modify each of these, taking element of l as input.
p = Pool(10) # however many process you want to spawn
p.map(f, l)
That's it, you don't need to define the same function twice or manually split the list. It'll be automatically assigned and managed for you.
Thank you! This works great! Is the number of processes I can spawn limited by the number of cores I have available? Also I wonder how to stop the entire process while running - using cmd + c doesn't do the trick it seems...
– AaronDT
Nov 20 '18 at 19:52
No, it is not limited, however in this case, more processes than CPU cores are probably not going to help performance, as they'll just wait in a queue. If you have IO intensive application then it make sense to have more process than number of cores. As to stopping it, thePool
generatedaemons
which will eventually be killed, but you might have to wait for the daemon to come back finished with current task - hence not immediate.
– Rocky Li
Nov 20 '18 at 20:09
add a comment |
Since all of your process output seem to be independent, you should use MultiProcessing.Pool
:
from multiprocessing import Pool
l = # list of all your image files
f = # function to modify each of these, taking element of l as input.
p = Pool(10) # however many process you want to spawn
p.map(f, l)
That's it, you don't need to define the same function twice or manually split the list. It'll be automatically assigned and managed for you.
Since all of your process output seem to be independent, you should use MultiProcessing.Pool
:
from multiprocessing import Pool
l = # list of all your image files
f = # function to modify each of these, taking element of l as input.
p = Pool(10) # however many process you want to spawn
p.map(f, l)
That's it, you don't need to define the same function twice or manually split the list. It'll be automatically assigned and managed for you.
answered Nov 20 '18 at 19:03
Rocky LiRocky Li
2,8731316
2,8731316
Thank you! This works great! Is the number of processes I can spawn limited by the number of cores I have available? Also I wonder how to stop the entire process while running - using cmd + c doesn't do the trick it seems...
– AaronDT
Nov 20 '18 at 19:52
No, it is not limited, however in this case, more processes than CPU cores are probably not going to help performance, as they'll just wait in a queue. If you have IO intensive application then it make sense to have more process than number of cores. As to stopping it, thePool
generatedaemons
which will eventually be killed, but you might have to wait for the daemon to come back finished with current task - hence not immediate.
– Rocky Li
Nov 20 '18 at 20:09
add a comment |
Thank you! This works great! Is the number of processes I can spawn limited by the number of cores I have available? Also I wonder how to stop the entire process while running - using cmd + c doesn't do the trick it seems...
– AaronDT
Nov 20 '18 at 19:52
No, it is not limited, however in this case, more processes than CPU cores are probably not going to help performance, as they'll just wait in a queue. If you have IO intensive application then it make sense to have more process than number of cores. As to stopping it, thePool
generatedaemons
which will eventually be killed, but you might have to wait for the daemon to come back finished with current task - hence not immediate.
– Rocky Li
Nov 20 '18 at 20:09
Thank you! This works great! Is the number of processes I can spawn limited by the number of cores I have available? Also I wonder how to stop the entire process while running - using cmd + c doesn't do the trick it seems...
– AaronDT
Nov 20 '18 at 19:52
Thank you! This works great! Is the number of processes I can spawn limited by the number of cores I have available? Also I wonder how to stop the entire process while running - using cmd + c doesn't do the trick it seems...
– AaronDT
Nov 20 '18 at 19:52
No, it is not limited, however in this case, more processes than CPU cores are probably not going to help performance, as they'll just wait in a queue. If you have IO intensive application then it make sense to have more process than number of cores. As to stopping it, the
Pool
generate daemons
which will eventually be killed, but you might have to wait for the daemon to come back finished with current task - hence not immediate.– Rocky Li
Nov 20 '18 at 20:09
No, it is not limited, however in this case, more processes than CPU cores are probably not going to help performance, as they'll just wait in a queue. If you have IO intensive application then it make sense to have more process than number of cores. As to stopping it, the
Pool
generate daemons
which will eventually be killed, but you might have to wait for the daemon to come back finished with current task - hence not immediate.– Rocky Li
Nov 20 '18 at 20:09
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53399779%2fusing-multiprocessing-to-preprocess-images%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown