Good practice for parallel tasks in python

I have one python script which is generating data and one which is training a neural network with tensorflow and keras on this data. Both need an instance of the neural network.

Since I haven't set the flag "allow growth" each process takes the full GPU memory. Therefore I simply give each process it's own GPU. (Maybe not a good solution for people with only one GPU... yet another unsolved problem)

The actual problem is as follow: Both instances need access to the networks weights file. I recently had a bunch of crashes because both processes tried to access the weights. A flag or something similar should stop each process from accessing it, whilst the other process is accessing. Hopefully this doesn't create a bottle neck.
I tried to come up with a solution like semaphores in C, but today I found this post in stack-exchange.

The idea with renaming seems quite simple and effective to me. Is this good practice in my case? I'll just create the weight file with my own function

self.model.save_weights(filepath='weights.h5$$$')

in the learning process, rename them after saving with

os.rename('weights.h5$$$', 'weights.h5')

and load them in my data generating process with function

self.model.load_weights(filepath='weights.h5')

Will this renaming overwrite the old file? And what happens if the other process is currently loading? I would appreciate other ideas how I could multithread / multiprocess my script. Just realized that generating data, learn, generating data,... in a sequential script is not really performant.

EDIT 1: Forgot to mention that the weights are stored in a .h5 file by keras' save function

edited Jan 2 at 9:16

asked Jan 2 at 8:56

Mr.Sh4nnon

7111

add a comment |

I have one python script which is generating data and one which is training a neural network with tensorflow and keras on this data. Both need an instance of the neural network.

The idea with renaming seems quite simple and effective to me. Is this good practice in my case? I'll just create the weight file with my own function

self.model.save_weights(filepath='weights.h5$$$')

in the learning process, rename them after saving with

os.rename('weights.h5$$$', 'weights.h5')

and load them in my data generating process with function

self.model.load_weights(filepath='weights.h5')

EDIT 1: Forgot to mention that the weights are stored in a .h5 file by keras' save function

edited Jan 2 at 9:16

asked Jan 2 at 8:56

Mr.Sh4nnon

7111

add a comment |

I have one python script which is generating data and one which is training a neural network with tensorflow and keras on this data. Both need an instance of the neural network.

The idea with renaming seems quite simple and effective to me. Is this good practice in my case? I'll just create the weight file with my own function

self.model.save_weights(filepath='weights.h5$$$')

in the learning process, rename them after saving with

os.rename('weights.h5$$$', 'weights.h5')

and load them in my data generating process with function

self.model.load_weights(filepath='weights.h5')

EDIT 1: Forgot to mention that the weights are stored in a .h5 file by keras' save function

edited Jan 2 at 9:16

asked Jan 2 at 8:56

Mr.Sh4nnon

7111

I have one python script which is generating data and one which is training a neural network with tensorflow and keras on this data. Both need an instance of the neural network.

The idea with renaming seems quite simple and effective to me. Is this good practice in my case? I'll just create the weight file with my own function

self.model.save_weights(filepath='weights.h5$$$')

in the learning process, rename them after saving with

os.rename('weights.h5$$$', 'weights.h5')

and load them in my data generating process with function

self.model.load_weights(filepath='weights.h5')

EDIT 1: Forgot to mention that the weights are stored in a .h5 file by keras' save function

python multithreading multiprocessing

edited Jan 2 at 9:16

asked Jan 2 at 8:56

Mr.Sh4nnon

7111

edited Jan 2 at 9:16

asked Jan 2 at 8:56

Mr.Sh4nnon

7111

edited Jan 2 at 9:16

asked Jan 2 at 8:56

Mr.Sh4nnon

7111

asked Jan 2 at 8:56

Mr.Sh4nnon

7111

asked Jan 2 at 8:56

Mr.Sh4nnon

7111

add a comment |

1 Answer
1

active

oldest

votes

The multiprocessing module has a RLock class that you can use to regulate access to a sharded resource. This also works for files if you remember to acquire the lock before reading and writing and release it afterwards. Using a lock implies that some of the time one of the processes cannot read or write the file. How much of a problem this is depends on how much both processes have to access the file.

Note that for this to work, one of the scripts has to start the other script as a Process after creating the lock.

If the weights are a Python data structure, you could put that under control of a multiprocessing.Manager. That will manage access to the objects under its control for you. Note that a Manager is not meant for use with files, just in-memory objects.

Additionally on UNIX-like operating systems Python has os.lockf to lock (part of) a file. Note that this is an advisory lock only. That is, if another process calls lockf, the return value indicates that the file is already locked. It does not actually prevent you from reading the file.

Note:
Files can be read and written. Only when two processes are reading the same file (read/read) does this work well. Every other combination (read/write, write/read, write/write) can and eventually will result in undefined behavior and data corruption.

Note2:
Another possible solution involves inter process communication.
Process 1 writes a new h5 file (with a random filename), closes it, and then sends a message (using a Pipe or Queue to Process 2 "I've written a new parameter file pathtofile".
Process 2 then reads the file and deletes it. This can work both ways but requires that both processes check for and process messages every so often. This prevents file corruption because the writing process only notifies the reading process after it has finished the file.

edited Jan 2 at 10:04

answered Jan 2 at 9:15

Roland Smith

26.8k33256

Thank you for your answer! So RLock is similar to the semaphores in C right? Will this create an artificial bottleneck? What do you think about the renaming idea from the other post? It's mentioned that this works instantaneous which would be nice. The weight file is saved in a .h5 file. I quickly googled the manager combined with .h5 file which gave me this link. The author mentions that using multiple processes will "results in undefined behavior". So this won't work in my case will it?

– Mr.Sh4nnon
Jan 2 at 9:27

1

@Mr.Sh4nnon see updated answer.

– Roland Smith
Jan 2 at 9:47

Thanks! So far I have the impression that looking it and waiting whilst saving is the only way.

– Mr.Sh4nnon
Jan 2 at 9:50

@Mr.Sh4nnon You could also combine files with inter process communication. See Note2.

– Roland Smith
Jan 2 at 9:58

1

@Mr.Sh4nnon Using multiprocessing.Pipe indeed only works if one process starts the other. There are ways for unrelated processes to talk to each other. For example named pipes. But those are operating system dependant.

– Roland Smith
Jan 2 at 10:46

|
show 1 more comment

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54003532%2fgood-practice-for-parallel-tasks-in-python%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Note that for this to work, one of the scripts has to start the other script as a Process after creating the lock.

edited Jan 2 at 10:04

answered Jan 2 at 9:15

Roland Smith

26.8k33256

Thank you for your answer! So RLock is similar to the semaphores in C right? Will this create an artificial bottleneck? What do you think about the renaming idea from the other post? It's mentioned that this works instantaneous which would be nice. The weight file is saved in a .h5 file. I quickly googled the manager combined with .h5 file which gave me this link. The author mentions that using multiple processes will "results in undefined behavior". So this won't work in my case will it?

– Mr.Sh4nnon
Jan 2 at 9:27

1

@Mr.Sh4nnon see updated answer.

– Roland Smith
Jan 2 at 9:47

Thanks! So far I have the impression that looking it and waiting whilst saving is the only way.

– Mr.Sh4nnon
Jan 2 at 9:50

@Mr.Sh4nnon You could also combine files with inter process communication. See Note2.

– Roland Smith
Jan 2 at 9:58

1

@Mr.Sh4nnon Using multiprocessing.Pipe indeed only works if one process starts the other. There are ways for unrelated processes to talk to each other. For example named pipes. But those are operating system dependant.

– Roland Smith
Jan 2 at 10:46

|
show 1 more comment

Note that for this to work, one of the scripts has to start the other script as a Process after creating the lock.

edited Jan 2 at 10:04

answered Jan 2 at 9:15

Roland Smith

26.8k33256

Thank you for your answer! So RLock is similar to the semaphores in C right? Will this create an artificial bottleneck? What do you think about the renaming idea from the other post? It's mentioned that this works instantaneous which would be nice. The weight file is saved in a .h5 file. I quickly googled the manager combined with .h5 file which gave me this link. The author mentions that using multiple processes will "results in undefined behavior". So this won't work in my case will it?

– Mr.Sh4nnon
Jan 2 at 9:27

1

@Mr.Sh4nnon see updated answer.

– Roland Smith
Jan 2 at 9:47

Thanks! So far I have the impression that looking it and waiting whilst saving is the only way.

– Mr.Sh4nnon
Jan 2 at 9:50

@Mr.Sh4nnon You could also combine files with inter process communication. See Note2.

– Roland Smith
Jan 2 at 9:58

1

@Mr.Sh4nnon Using multiprocessing.Pipe indeed only works if one process starts the other. There are ways for unrelated processes to talk to each other. For example named pipes. But those are operating system dependant.

– Roland Smith
Jan 2 at 10:46

|
show 1 more comment

Note that for this to work, one of the scripts has to start the other script as a Process after creating the lock.

edited Jan 2 at 10:04

answered Jan 2 at 9:15

Roland Smith

26.8k33256

Note that for this to work, one of the scripts has to start the other script as a Process after creating the lock.

edited Jan 2 at 10:04

answered Jan 2 at 9:15

Roland Smith

26.8k33256

edited Jan 2 at 10:04

answered Jan 2 at 9:15

Roland Smith

26.8k33256

answered Jan 2 at 9:15

Roland Smith

26.8k33256

answered Jan 2 at 9:15

Roland Smith

26.8k33256

Thank you for your answer! So RLock is similar to the semaphores in C right? Will this create an artificial bottleneck? What do you think about the renaming idea from the other post? It's mentioned that this works instantaneous which would be nice. The weight file is saved in a .h5 file. I quickly googled the manager combined with .h5 file which gave me this link. The author mentions that using multiple processes will "results in undefined behavior". So this won't work in my case will it?

– Mr.Sh4nnon
Jan 2 at 9:27

1

@Mr.Sh4nnon see updated answer.

– Roland Smith
Jan 2 at 9:47

Thanks! So far I have the impression that looking it and waiting whilst saving is the only way.

– Mr.Sh4nnon
Jan 2 at 9:50

@Mr.Sh4nnon You could also combine files with inter process communication. See Note2.

– Roland Smith
Jan 2 at 9:58

1

@Mr.Sh4nnon Using multiprocessing.Pipe indeed only works if one process starts the other. There are ways for unrelated processes to talk to each other. For example named pipes. But those are operating system dependant.

– Roland Smith
Jan 2 at 10:46

|
show 1 more comment

Thank you for your answer! So RLock is similar to the semaphores in C right? Will this create an artificial bottleneck? What do you think about the renaming idea from the other post? It's mentioned that this works instantaneous which would be nice. The weight file is saved in a .h5 file. I quickly googled the manager combined with .h5 file which gave me this link. The author mentions that using multiple processes will "results in undefined behavior". So this won't work in my case will it?

– Mr.Sh4nnon
Jan 2 at 9:27

1

@Mr.Sh4nnon see updated answer.

– Roland Smith
Jan 2 at 9:47

Thanks! So far I have the impression that looking it and waiting whilst saving is the only way.

– Mr.Sh4nnon
Jan 2 at 9:50

@Mr.Sh4nnon You could also combine files with inter process communication. See Note2.

– Roland Smith
Jan 2 at 9:58

1

@Mr.Sh4nnon Using multiprocessing.Pipe indeed only works if one process starts the other. There are ways for unrelated processes to talk to each other. For example named pipes. But those are operating system dependant.

– Roland Smith
Jan 2 at 10:46

Thank you for your answer! So RLock is similar to the semaphores in C right? Will this create an artificial bottleneck? What do you think about the renaming idea from the other post? It's mentioned that this works instantaneous which would be nice. The weight file is saved in a .h5 file. I quickly googled the manager combined with .h5 file which gave me this link. The author mentions that using multiple processes will "results in undefined behavior". So this won't work in my case will it?

– Mr.Sh4nnon
Jan 2 at 9:27

@Mr.Sh4nnon see updated answer.

– Roland Smith
Jan 2 at 9:47

Thanks! So far I have the impression that looking it and waiting whilst saving is the only way.

– Mr.Sh4nnon
Jan 2 at 9:50

@Mr.Sh4nnon You could also combine files with inter process communication. See Note2.

– Roland Smith
Jan 2 at 9:58

@Mr.Sh4nnon Using multiprocessing.Pipe indeed only works if one process starts the other. There are ways for unrelated processes to talk to each other. For example named pipes. But those are operating system dependant.

– Roland Smith
Jan 2 at 10:46

|
show 1 more comment

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu