How to share a customized variable when using multiprocessing in Python?
There is a bloom filter object that created by pybloom
, a Python module. Assume that I have over 10 million strings that waiting for add into this object and the general way to do so is:
from pybloom import BloomFilter
# initialize a bloomfilter object
bf = BloomFilter(int(2e7))
for i in string_list:
bf.add(i)
But this costs too much time specially when the string_list
is really long. Since my computer(windows7) is 4-core CPU and I want to know if there is a multi-process way to make fully use of CPU and fast the add
method.
I know a little about multiprocessing
, but I cannot solve the problem that exchanging customized objects, such as bf
above, between processes.
Forgive my poor English and show me the code if you can. Thanks.
python multiprocessing
add a comment |
There is a bloom filter object that created by pybloom
, a Python module. Assume that I have over 10 million strings that waiting for add into this object and the general way to do so is:
from pybloom import BloomFilter
# initialize a bloomfilter object
bf = BloomFilter(int(2e7))
for i in string_list:
bf.add(i)
But this costs too much time specially when the string_list
is really long. Since my computer(windows7) is 4-core CPU and I want to know if there is a multi-process way to make fully use of CPU and fast the add
method.
I know a little about multiprocessing
, but I cannot solve the problem that exchanging customized objects, such as bf
above, between processes.
Forgive my poor English and show me the code if you can. Thanks.
python multiprocessing
Maybe try to use Queue in Python, which is designed for multi-processing.
– Menglong Li
Jan 2 at 8:43
This would be helpful: stackoverflow.com/questions/21968278/…
– Xiwei Wang
Jan 2 at 9:47
add a comment |
There is a bloom filter object that created by pybloom
, a Python module. Assume that I have over 10 million strings that waiting for add into this object and the general way to do so is:
from pybloom import BloomFilter
# initialize a bloomfilter object
bf = BloomFilter(int(2e7))
for i in string_list:
bf.add(i)
But this costs too much time specially when the string_list
is really long. Since my computer(windows7) is 4-core CPU and I want to know if there is a multi-process way to make fully use of CPU and fast the add
method.
I know a little about multiprocessing
, but I cannot solve the problem that exchanging customized objects, such as bf
above, between processes.
Forgive my poor English and show me the code if you can. Thanks.
python multiprocessing
There is a bloom filter object that created by pybloom
, a Python module. Assume that I have over 10 million strings that waiting for add into this object and the general way to do so is:
from pybloom import BloomFilter
# initialize a bloomfilter object
bf = BloomFilter(int(2e7))
for i in string_list:
bf.add(i)
But this costs too much time specially when the string_list
is really long. Since my computer(windows7) is 4-core CPU and I want to know if there is a multi-process way to make fully use of CPU and fast the add
method.
I know a little about multiprocessing
, but I cannot solve the problem that exchanging customized objects, such as bf
above, between processes.
Forgive my poor English and show me the code if you can. Thanks.
python multiprocessing
python multiprocessing
asked Jan 2 at 8:30


CoffeeSunCoffeeSun
183
183
Maybe try to use Queue in Python, which is designed for multi-processing.
– Menglong Li
Jan 2 at 8:43
This would be helpful: stackoverflow.com/questions/21968278/…
– Xiwei Wang
Jan 2 at 9:47
add a comment |
Maybe try to use Queue in Python, which is designed for multi-processing.
– Menglong Li
Jan 2 at 8:43
This would be helpful: stackoverflow.com/questions/21968278/…
– Xiwei Wang
Jan 2 at 9:47
Maybe try to use Queue in Python, which is designed for multi-processing.
– Menglong Li
Jan 2 at 8:43
Maybe try to use Queue in Python, which is designed for multi-processing.
– Menglong Li
Jan 2 at 8:43
This would be helpful: stackoverflow.com/questions/21968278/…
– Xiwei Wang
Jan 2 at 9:47
This would be helpful: stackoverflow.com/questions/21968278/…
– Xiwei Wang
Jan 2 at 9:47
add a comment |
1 Answer
1
active
oldest
votes
I'm not really familiar with pybloom
or BloomFilter
objects, but a quick look at the code reveals that you can union
multiple BloomFilter
objects.
Based on your size of your string_list
you may create a Pool
of n
. For simplicity, say n=2
. Logic here is: For, say x
strings in string_list
- divide it in 2
lists of size x/2
each. Then create a separate process to process these.
You can have something like this:
from multiprocessing import Pool
with Pool(n) as p:
bloom_filter_parts = p.map(add_str_to_bloomfilter, divide_list_in_parts(string_list))
# Now you have a list of BloomFilter objects with parts of string_list in them, concatenate them
res_bloom_filter=concat_bf_list(bloom_filter_parts)
Code for add_str_to_bloomfilter
:
def add_str_to_bloomfilter(str_list_slice):
res_bf = BloomFilter(capacity=100)
for i in str_list_slice:
res_bf.add(i)
return res_bf
You have to add code for divide_list_in_parts
and concat_bf_list
. But I hope you get the logic.
Also, read this: https://docs.python.org/3.4/library/multiprocessing.html
I solve the problem with your logic. It does not match my original thoughts very much, but it works. Thank your again for the answer.
– CoffeeSun
Jan 3 at 2:24
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54003292%2fhow-to-share-a-customized-variable-when-using-multiprocessing-in-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I'm not really familiar with pybloom
or BloomFilter
objects, but a quick look at the code reveals that you can union
multiple BloomFilter
objects.
Based on your size of your string_list
you may create a Pool
of n
. For simplicity, say n=2
. Logic here is: For, say x
strings in string_list
- divide it in 2
lists of size x/2
each. Then create a separate process to process these.
You can have something like this:
from multiprocessing import Pool
with Pool(n) as p:
bloom_filter_parts = p.map(add_str_to_bloomfilter, divide_list_in_parts(string_list))
# Now you have a list of BloomFilter objects with parts of string_list in them, concatenate them
res_bloom_filter=concat_bf_list(bloom_filter_parts)
Code for add_str_to_bloomfilter
:
def add_str_to_bloomfilter(str_list_slice):
res_bf = BloomFilter(capacity=100)
for i in str_list_slice:
res_bf.add(i)
return res_bf
You have to add code for divide_list_in_parts
and concat_bf_list
. But I hope you get the logic.
Also, read this: https://docs.python.org/3.4/library/multiprocessing.html
I solve the problem with your logic. It does not match my original thoughts very much, but it works. Thank your again for the answer.
– CoffeeSun
Jan 3 at 2:24
add a comment |
I'm not really familiar with pybloom
or BloomFilter
objects, but a quick look at the code reveals that you can union
multiple BloomFilter
objects.
Based on your size of your string_list
you may create a Pool
of n
. For simplicity, say n=2
. Logic here is: For, say x
strings in string_list
- divide it in 2
lists of size x/2
each. Then create a separate process to process these.
You can have something like this:
from multiprocessing import Pool
with Pool(n) as p:
bloom_filter_parts = p.map(add_str_to_bloomfilter, divide_list_in_parts(string_list))
# Now you have a list of BloomFilter objects with parts of string_list in them, concatenate them
res_bloom_filter=concat_bf_list(bloom_filter_parts)
Code for add_str_to_bloomfilter
:
def add_str_to_bloomfilter(str_list_slice):
res_bf = BloomFilter(capacity=100)
for i in str_list_slice:
res_bf.add(i)
return res_bf
You have to add code for divide_list_in_parts
and concat_bf_list
. But I hope you get the logic.
Also, read this: https://docs.python.org/3.4/library/multiprocessing.html
I solve the problem with your logic. It does not match my original thoughts very much, but it works. Thank your again for the answer.
– CoffeeSun
Jan 3 at 2:24
add a comment |
I'm not really familiar with pybloom
or BloomFilter
objects, but a quick look at the code reveals that you can union
multiple BloomFilter
objects.
Based on your size of your string_list
you may create a Pool
of n
. For simplicity, say n=2
. Logic here is: For, say x
strings in string_list
- divide it in 2
lists of size x/2
each. Then create a separate process to process these.
You can have something like this:
from multiprocessing import Pool
with Pool(n) as p:
bloom_filter_parts = p.map(add_str_to_bloomfilter, divide_list_in_parts(string_list))
# Now you have a list of BloomFilter objects with parts of string_list in them, concatenate them
res_bloom_filter=concat_bf_list(bloom_filter_parts)
Code for add_str_to_bloomfilter
:
def add_str_to_bloomfilter(str_list_slice):
res_bf = BloomFilter(capacity=100)
for i in str_list_slice:
res_bf.add(i)
return res_bf
You have to add code for divide_list_in_parts
and concat_bf_list
. But I hope you get the logic.
Also, read this: https://docs.python.org/3.4/library/multiprocessing.html
I'm not really familiar with pybloom
or BloomFilter
objects, but a quick look at the code reveals that you can union
multiple BloomFilter
objects.
Based on your size of your string_list
you may create a Pool
of n
. For simplicity, say n=2
. Logic here is: For, say x
strings in string_list
- divide it in 2
lists of size x/2
each. Then create a separate process to process these.
You can have something like this:
from multiprocessing import Pool
with Pool(n) as p:
bloom_filter_parts = p.map(add_str_to_bloomfilter, divide_list_in_parts(string_list))
# Now you have a list of BloomFilter objects with parts of string_list in them, concatenate them
res_bloom_filter=concat_bf_list(bloom_filter_parts)
Code for add_str_to_bloomfilter
:
def add_str_to_bloomfilter(str_list_slice):
res_bf = BloomFilter(capacity=100)
for i in str_list_slice:
res_bf.add(i)
return res_bf
You have to add code for divide_list_in_parts
and concat_bf_list
. But I hope you get the logic.
Also, read this: https://docs.python.org/3.4/library/multiprocessing.html
answered Jan 2 at 9:51
MoonStruckHorrorsMoonStruckHorrors
4021618
4021618
I solve the problem with your logic. It does not match my original thoughts very much, but it works. Thank your again for the answer.
– CoffeeSun
Jan 3 at 2:24
add a comment |
I solve the problem with your logic. It does not match my original thoughts very much, but it works. Thank your again for the answer.
– CoffeeSun
Jan 3 at 2:24
I solve the problem with your logic. It does not match my original thoughts very much, but it works. Thank your again for the answer.
– CoffeeSun
Jan 3 at 2:24
I solve the problem with your logic. It does not match my original thoughts very much, but it works. Thank your again for the answer.
– CoffeeSun
Jan 3 at 2:24
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54003292%2fhow-to-share-a-customized-variable-when-using-multiprocessing-in-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Maybe try to use Queue in Python, which is designed for multi-processing.
– Menglong Li
Jan 2 at 8:43
This would be helpful: stackoverflow.com/questions/21968278/…
– Xiwei Wang
Jan 2 at 9:47