How to share a customized variable when using multiprocessing in Python?












3















There is a bloom filter object that created by pybloom, a Python module. Assume that I have over 10 million strings that waiting for add into this object and the general way to do so is:



from pybloom import BloomFilter

# initialize a bloomfilter object
bf = BloomFilter(int(2e7))

for i in string_list:
bf.add(i)


But this costs too much time specially when the string_list is really long. Since my computer(windows7) is 4-core CPU and I want to know if there is a multi-process way to make fully use of CPU and fast the add method.



I know a little about multiprocessing, but I cannot solve the problem that exchanging customized objects, such as bf above, between processes.



Forgive my poor English and show me the code if you can. Thanks.










share|improve this question























  • Maybe try to use Queue in Python, which is designed for multi-processing.

    – Menglong Li
    Jan 2 at 8:43











  • This would be helpful: stackoverflow.com/questions/21968278/…

    – Xiwei Wang
    Jan 2 at 9:47
















3















There is a bloom filter object that created by pybloom, a Python module. Assume that I have over 10 million strings that waiting for add into this object and the general way to do so is:



from pybloom import BloomFilter

# initialize a bloomfilter object
bf = BloomFilter(int(2e7))

for i in string_list:
bf.add(i)


But this costs too much time specially when the string_list is really long. Since my computer(windows7) is 4-core CPU and I want to know if there is a multi-process way to make fully use of CPU and fast the add method.



I know a little about multiprocessing, but I cannot solve the problem that exchanging customized objects, such as bf above, between processes.



Forgive my poor English and show me the code if you can. Thanks.










share|improve this question























  • Maybe try to use Queue in Python, which is designed for multi-processing.

    – Menglong Li
    Jan 2 at 8:43











  • This would be helpful: stackoverflow.com/questions/21968278/…

    – Xiwei Wang
    Jan 2 at 9:47














3












3








3


1






There is a bloom filter object that created by pybloom, a Python module. Assume that I have over 10 million strings that waiting for add into this object and the general way to do so is:



from pybloom import BloomFilter

# initialize a bloomfilter object
bf = BloomFilter(int(2e7))

for i in string_list:
bf.add(i)


But this costs too much time specially when the string_list is really long. Since my computer(windows7) is 4-core CPU and I want to know if there is a multi-process way to make fully use of CPU and fast the add method.



I know a little about multiprocessing, but I cannot solve the problem that exchanging customized objects, such as bf above, between processes.



Forgive my poor English and show me the code if you can. Thanks.










share|improve this question














There is a bloom filter object that created by pybloom, a Python module. Assume that I have over 10 million strings that waiting for add into this object and the general way to do so is:



from pybloom import BloomFilter

# initialize a bloomfilter object
bf = BloomFilter(int(2e7))

for i in string_list:
bf.add(i)


But this costs too much time specially when the string_list is really long. Since my computer(windows7) is 4-core CPU and I want to know if there is a multi-process way to make fully use of CPU and fast the add method.



I know a little about multiprocessing, but I cannot solve the problem that exchanging customized objects, such as bf above, between processes.



Forgive my poor English and show me the code if you can. Thanks.







python multiprocessing






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Jan 2 at 8:30









CoffeeSunCoffeeSun

183




183













  • Maybe try to use Queue in Python, which is designed for multi-processing.

    – Menglong Li
    Jan 2 at 8:43











  • This would be helpful: stackoverflow.com/questions/21968278/…

    – Xiwei Wang
    Jan 2 at 9:47



















  • Maybe try to use Queue in Python, which is designed for multi-processing.

    – Menglong Li
    Jan 2 at 8:43











  • This would be helpful: stackoverflow.com/questions/21968278/…

    – Xiwei Wang
    Jan 2 at 9:47

















Maybe try to use Queue in Python, which is designed for multi-processing.

– Menglong Li
Jan 2 at 8:43





Maybe try to use Queue in Python, which is designed for multi-processing.

– Menglong Li
Jan 2 at 8:43













This would be helpful: stackoverflow.com/questions/21968278/…

– Xiwei Wang
Jan 2 at 9:47





This would be helpful: stackoverflow.com/questions/21968278/…

– Xiwei Wang
Jan 2 at 9:47












1 Answer
1






active

oldest

votes


















0














I'm not really familiar with pybloom or BloomFilter objects, but a quick look at the code reveals that you can union multiple BloomFilter objects.



Based on your size of your string_list you may create a Pool of n. For simplicity, say n=2. Logic here is: For, say x strings in string_list - divide it in 2 lists of size x/2 each. Then create a separate process to process these.



You can have something like this:



from multiprocessing import Pool
with Pool(n) as p:
bloom_filter_parts = p.map(add_str_to_bloomfilter, divide_list_in_parts(string_list))
# Now you have a list of BloomFilter objects with parts of string_list in them, concatenate them
res_bloom_filter=concat_bf_list(bloom_filter_parts)


Code for add_str_to_bloomfilter:



def add_str_to_bloomfilter(str_list_slice):
res_bf = BloomFilter(capacity=100)
for i in str_list_slice:
res_bf.add(i)
return res_bf


You have to add code for divide_list_in_parts and concat_bf_list. But I hope you get the logic.



Also, read this: https://docs.python.org/3.4/library/multiprocessing.html






share|improve this answer
























  • I solve the problem with your logic. It does not match my original thoughts very much, but it works. Thank your again for the answer.

    – CoffeeSun
    Jan 3 at 2:24











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54003292%2fhow-to-share-a-customized-variable-when-using-multiprocessing-in-python%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














I'm not really familiar with pybloom or BloomFilter objects, but a quick look at the code reveals that you can union multiple BloomFilter objects.



Based on your size of your string_list you may create a Pool of n. For simplicity, say n=2. Logic here is: For, say x strings in string_list - divide it in 2 lists of size x/2 each. Then create a separate process to process these.



You can have something like this:



from multiprocessing import Pool
with Pool(n) as p:
bloom_filter_parts = p.map(add_str_to_bloomfilter, divide_list_in_parts(string_list))
# Now you have a list of BloomFilter objects with parts of string_list in them, concatenate them
res_bloom_filter=concat_bf_list(bloom_filter_parts)


Code for add_str_to_bloomfilter:



def add_str_to_bloomfilter(str_list_slice):
res_bf = BloomFilter(capacity=100)
for i in str_list_slice:
res_bf.add(i)
return res_bf


You have to add code for divide_list_in_parts and concat_bf_list. But I hope you get the logic.



Also, read this: https://docs.python.org/3.4/library/multiprocessing.html






share|improve this answer
























  • I solve the problem with your logic. It does not match my original thoughts very much, but it works. Thank your again for the answer.

    – CoffeeSun
    Jan 3 at 2:24
















0














I'm not really familiar with pybloom or BloomFilter objects, but a quick look at the code reveals that you can union multiple BloomFilter objects.



Based on your size of your string_list you may create a Pool of n. For simplicity, say n=2. Logic here is: For, say x strings in string_list - divide it in 2 lists of size x/2 each. Then create a separate process to process these.



You can have something like this:



from multiprocessing import Pool
with Pool(n) as p:
bloom_filter_parts = p.map(add_str_to_bloomfilter, divide_list_in_parts(string_list))
# Now you have a list of BloomFilter objects with parts of string_list in them, concatenate them
res_bloom_filter=concat_bf_list(bloom_filter_parts)


Code for add_str_to_bloomfilter:



def add_str_to_bloomfilter(str_list_slice):
res_bf = BloomFilter(capacity=100)
for i in str_list_slice:
res_bf.add(i)
return res_bf


You have to add code for divide_list_in_parts and concat_bf_list. But I hope you get the logic.



Also, read this: https://docs.python.org/3.4/library/multiprocessing.html






share|improve this answer
























  • I solve the problem with your logic. It does not match my original thoughts very much, but it works. Thank your again for the answer.

    – CoffeeSun
    Jan 3 at 2:24














0












0








0







I'm not really familiar with pybloom or BloomFilter objects, but a quick look at the code reveals that you can union multiple BloomFilter objects.



Based on your size of your string_list you may create a Pool of n. For simplicity, say n=2. Logic here is: For, say x strings in string_list - divide it in 2 lists of size x/2 each. Then create a separate process to process these.



You can have something like this:



from multiprocessing import Pool
with Pool(n) as p:
bloom_filter_parts = p.map(add_str_to_bloomfilter, divide_list_in_parts(string_list))
# Now you have a list of BloomFilter objects with parts of string_list in them, concatenate them
res_bloom_filter=concat_bf_list(bloom_filter_parts)


Code for add_str_to_bloomfilter:



def add_str_to_bloomfilter(str_list_slice):
res_bf = BloomFilter(capacity=100)
for i in str_list_slice:
res_bf.add(i)
return res_bf


You have to add code for divide_list_in_parts and concat_bf_list. But I hope you get the logic.



Also, read this: https://docs.python.org/3.4/library/multiprocessing.html






share|improve this answer













I'm not really familiar with pybloom or BloomFilter objects, but a quick look at the code reveals that you can union multiple BloomFilter objects.



Based on your size of your string_list you may create a Pool of n. For simplicity, say n=2. Logic here is: For, say x strings in string_list - divide it in 2 lists of size x/2 each. Then create a separate process to process these.



You can have something like this:



from multiprocessing import Pool
with Pool(n) as p:
bloom_filter_parts = p.map(add_str_to_bloomfilter, divide_list_in_parts(string_list))
# Now you have a list of BloomFilter objects with parts of string_list in them, concatenate them
res_bloom_filter=concat_bf_list(bloom_filter_parts)


Code for add_str_to_bloomfilter:



def add_str_to_bloomfilter(str_list_slice):
res_bf = BloomFilter(capacity=100)
for i in str_list_slice:
res_bf.add(i)
return res_bf


You have to add code for divide_list_in_parts and concat_bf_list. But I hope you get the logic.



Also, read this: https://docs.python.org/3.4/library/multiprocessing.html







share|improve this answer












share|improve this answer



share|improve this answer










answered Jan 2 at 9:51









MoonStruckHorrorsMoonStruckHorrors

4021618




4021618













  • I solve the problem with your logic. It does not match my original thoughts very much, but it works. Thank your again for the answer.

    – CoffeeSun
    Jan 3 at 2:24



















  • I solve the problem with your logic. It does not match my original thoughts very much, but it works. Thank your again for the answer.

    – CoffeeSun
    Jan 3 at 2:24

















I solve the problem with your logic. It does not match my original thoughts very much, but it works. Thank your again for the answer.

– CoffeeSun
Jan 3 at 2:24





I solve the problem with your logic. It does not match my original thoughts very much, but it works. Thank your again for the answer.

– CoffeeSun
Jan 3 at 2:24




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54003292%2fhow-to-share-a-customized-variable-when-using-multiprocessing-in-python%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

MongoDB - Not Authorized To Execute Command

How to fix TextFormField cause rebuild widget in Flutter

Npm cannot find a required file even through it is in the searched directory