Good practice for parallel tasks in python












2















I have one python script which is generating data and one which is training a neural network with tensorflow and keras on this data. Both need an instance of the neural network.



Since I haven't set the flag "allow growth" each process takes the full GPU memory. Therefore I simply give each process it's own GPU. (Maybe not a good solution for people with only one GPU... yet another unsolved problem)



The actual problem is as follow: Both instances need access to the networks weights file. I recently had a bunch of crashes because both processes tried to access the weights. A flag or something similar should stop each process from accessing it, whilst the other process is accessing. Hopefully this doesn't create a bottle neck.
I tried to come up with a solution like semaphores in C, but today I found this post in stack-exchange.



The idea with renaming seems quite simple and effective to me. Is this good practice in my case? I'll just create the weight file with my own function



self.model.save_weights(filepath='weights.h5$$$')


in the learning process, rename them after saving with



os.rename('weights.h5$$$', 'weights.h5')


and load them in my data generating process with function



self.model.load_weights(filepath='weights.h5')


?



Will this renaming overwrite the old file? And what happens if the other process is currently loading? I would appreciate other ideas how I could multithread / multiprocess my script. Just realized that generating data, learn, generating data,... in a sequential script is not really performant.



EDIT 1: Forgot to mention that the weights are stored in a .h5 file by keras' save function










share|improve this question





























    2















    I have one python script which is generating data and one which is training a neural network with tensorflow and keras on this data. Both need an instance of the neural network.



    Since I haven't set the flag "allow growth" each process takes the full GPU memory. Therefore I simply give each process it's own GPU. (Maybe not a good solution for people with only one GPU... yet another unsolved problem)



    The actual problem is as follow: Both instances need access to the networks weights file. I recently had a bunch of crashes because both processes tried to access the weights. A flag or something similar should stop each process from accessing it, whilst the other process is accessing. Hopefully this doesn't create a bottle neck.
    I tried to come up with a solution like semaphores in C, but today I found this post in stack-exchange.



    The idea with renaming seems quite simple and effective to me. Is this good practice in my case? I'll just create the weight file with my own function



    self.model.save_weights(filepath='weights.h5$$$')


    in the learning process, rename them after saving with



    os.rename('weights.h5$$$', 'weights.h5')


    and load them in my data generating process with function



    self.model.load_weights(filepath='weights.h5')


    ?



    Will this renaming overwrite the old file? And what happens if the other process is currently loading? I would appreciate other ideas how I could multithread / multiprocess my script. Just realized that generating data, learn, generating data,... in a sequential script is not really performant.



    EDIT 1: Forgot to mention that the weights are stored in a .h5 file by keras' save function










    share|improve this question



























      2












      2








      2








      I have one python script which is generating data and one which is training a neural network with tensorflow and keras on this data. Both need an instance of the neural network.



      Since I haven't set the flag "allow growth" each process takes the full GPU memory. Therefore I simply give each process it's own GPU. (Maybe not a good solution for people with only one GPU... yet another unsolved problem)



      The actual problem is as follow: Both instances need access to the networks weights file. I recently had a bunch of crashes because both processes tried to access the weights. A flag or something similar should stop each process from accessing it, whilst the other process is accessing. Hopefully this doesn't create a bottle neck.
      I tried to come up with a solution like semaphores in C, but today I found this post in stack-exchange.



      The idea with renaming seems quite simple and effective to me. Is this good practice in my case? I'll just create the weight file with my own function



      self.model.save_weights(filepath='weights.h5$$$')


      in the learning process, rename them after saving with



      os.rename('weights.h5$$$', 'weights.h5')


      and load them in my data generating process with function



      self.model.load_weights(filepath='weights.h5')


      ?



      Will this renaming overwrite the old file? And what happens if the other process is currently loading? I would appreciate other ideas how I could multithread / multiprocess my script. Just realized that generating data, learn, generating data,... in a sequential script is not really performant.



      EDIT 1: Forgot to mention that the weights are stored in a .h5 file by keras' save function










      share|improve this question
















      I have one python script which is generating data and one which is training a neural network with tensorflow and keras on this data. Both need an instance of the neural network.



      Since I haven't set the flag "allow growth" each process takes the full GPU memory. Therefore I simply give each process it's own GPU. (Maybe not a good solution for people with only one GPU... yet another unsolved problem)



      The actual problem is as follow: Both instances need access to the networks weights file. I recently had a bunch of crashes because both processes tried to access the weights. A flag or something similar should stop each process from accessing it, whilst the other process is accessing. Hopefully this doesn't create a bottle neck.
      I tried to come up with a solution like semaphores in C, but today I found this post in stack-exchange.



      The idea with renaming seems quite simple and effective to me. Is this good practice in my case? I'll just create the weight file with my own function



      self.model.save_weights(filepath='weights.h5$$$')


      in the learning process, rename them after saving with



      os.rename('weights.h5$$$', 'weights.h5')


      and load them in my data generating process with function



      self.model.load_weights(filepath='weights.h5')


      ?



      Will this renaming overwrite the old file? And what happens if the other process is currently loading? I would appreciate other ideas how I could multithread / multiprocess my script. Just realized that generating data, learn, generating data,... in a sequential script is not really performant.



      EDIT 1: Forgot to mention that the weights are stored in a .h5 file by keras' save function







      python multithreading multiprocessing






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Jan 2 at 9:16







      Mr.Sh4nnon

















      asked Jan 2 at 8:56









      Mr.Sh4nnonMr.Sh4nnon

      7111




      7111
























          1 Answer
          1






          active

          oldest

          votes


















          3














          The multiprocessing module has a RLock class that you can use to regulate access to a sharded resource. This also works for files if you remember to acquire the lock before reading and writing and release it afterwards. Using a lock implies that some of the time one of the processes cannot read or write the file. How much of a problem this is depends on how much both processes have to access the file.



          Note that for this to work, one of the scripts has to start the other script as a Process after creating the lock.



          If the weights are a Python data structure, you could put that under control of a multiprocessing.Manager. That will manage access to the objects under its control for you. Note that a Manager is not meant for use with files, just in-memory objects.



          Additionally on UNIX-like operating systems Python has os.lockf to lock (part of) a file. Note that this is an advisory lock only. That is, if another process calls lockf, the return value indicates that the file is already locked. It does not actually prevent you from reading the file.



          Note:
          Files can be read and written. Only when two processes are reading the same file (read/read) does this work well. Every other combination (read/write, write/read, write/write) can and eventually will result in undefined behavior and data corruption.



          Note2:
          Another possible solution involves inter process communication.
          Process 1 writes a new h5 file (with a random filename), closes it, and then sends a message (using a Pipe or Queue to Process 2 "I've written a new parameter file pathtofile".
          Process 2 then reads the file and deletes it. This can work both ways but requires that both processes check for and process messages every so often. This prevents file corruption because the writing process only notifies the reading process after it has finished the file.






          share|improve this answer


























          • Thank you for your answer! So RLock is similar to the semaphores in C right? Will this create an artificial bottleneck? What do you think about the renaming idea from the other post? It's mentioned that this works instantaneous which would be nice. The weight file is saved in a .h5 file. I quickly googled the manager combined with .h5 file which gave me this link. The author mentions that using multiple processes will "results in undefined behavior". So this won't work in my case will it?

            – Mr.Sh4nnon
            Jan 2 at 9:27






          • 1





            @Mr.Sh4nnon see updated answer.

            – Roland Smith
            Jan 2 at 9:47











          • Thanks! So far I have the impression that looking it and waiting whilst saving is the only way.

            – Mr.Sh4nnon
            Jan 2 at 9:50











          • @Mr.Sh4nnon You could also combine files with inter process communication. See Note2.

            – Roland Smith
            Jan 2 at 9:58






          • 1





            @Mr.Sh4nnon Using multiprocessing.Pipe indeed only works if one process starts the other. There are ways for unrelated processes to talk to each other. For example named pipes. But those are operating system dependant.

            – Roland Smith
            Jan 2 at 10:46













          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54003532%2fgood-practice-for-parallel-tasks-in-python%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          3














          The multiprocessing module has a RLock class that you can use to regulate access to a sharded resource. This also works for files if you remember to acquire the lock before reading and writing and release it afterwards. Using a lock implies that some of the time one of the processes cannot read or write the file. How much of a problem this is depends on how much both processes have to access the file.



          Note that for this to work, one of the scripts has to start the other script as a Process after creating the lock.



          If the weights are a Python data structure, you could put that under control of a multiprocessing.Manager. That will manage access to the objects under its control for you. Note that a Manager is not meant for use with files, just in-memory objects.



          Additionally on UNIX-like operating systems Python has os.lockf to lock (part of) a file. Note that this is an advisory lock only. That is, if another process calls lockf, the return value indicates that the file is already locked. It does not actually prevent you from reading the file.



          Note:
          Files can be read and written. Only when two processes are reading the same file (read/read) does this work well. Every other combination (read/write, write/read, write/write) can and eventually will result in undefined behavior and data corruption.



          Note2:
          Another possible solution involves inter process communication.
          Process 1 writes a new h5 file (with a random filename), closes it, and then sends a message (using a Pipe or Queue to Process 2 "I've written a new parameter file pathtofile".
          Process 2 then reads the file and deletes it. This can work both ways but requires that both processes check for and process messages every so often. This prevents file corruption because the writing process only notifies the reading process after it has finished the file.






          share|improve this answer


























          • Thank you for your answer! So RLock is similar to the semaphores in C right? Will this create an artificial bottleneck? What do you think about the renaming idea from the other post? It's mentioned that this works instantaneous which would be nice. The weight file is saved in a .h5 file. I quickly googled the manager combined with .h5 file which gave me this link. The author mentions that using multiple processes will "results in undefined behavior". So this won't work in my case will it?

            – Mr.Sh4nnon
            Jan 2 at 9:27






          • 1





            @Mr.Sh4nnon see updated answer.

            – Roland Smith
            Jan 2 at 9:47











          • Thanks! So far I have the impression that looking it and waiting whilst saving is the only way.

            – Mr.Sh4nnon
            Jan 2 at 9:50











          • @Mr.Sh4nnon You could also combine files with inter process communication. See Note2.

            – Roland Smith
            Jan 2 at 9:58






          • 1





            @Mr.Sh4nnon Using multiprocessing.Pipe indeed only works if one process starts the other. There are ways for unrelated processes to talk to each other. For example named pipes. But those are operating system dependant.

            – Roland Smith
            Jan 2 at 10:46


















          3














          The multiprocessing module has a RLock class that you can use to regulate access to a sharded resource. This also works for files if you remember to acquire the lock before reading and writing and release it afterwards. Using a lock implies that some of the time one of the processes cannot read or write the file. How much of a problem this is depends on how much both processes have to access the file.



          Note that for this to work, one of the scripts has to start the other script as a Process after creating the lock.



          If the weights are a Python data structure, you could put that under control of a multiprocessing.Manager. That will manage access to the objects under its control for you. Note that a Manager is not meant for use with files, just in-memory objects.



          Additionally on UNIX-like operating systems Python has os.lockf to lock (part of) a file. Note that this is an advisory lock only. That is, if another process calls lockf, the return value indicates that the file is already locked. It does not actually prevent you from reading the file.



          Note:
          Files can be read and written. Only when two processes are reading the same file (read/read) does this work well. Every other combination (read/write, write/read, write/write) can and eventually will result in undefined behavior and data corruption.



          Note2:
          Another possible solution involves inter process communication.
          Process 1 writes a new h5 file (with a random filename), closes it, and then sends a message (using a Pipe or Queue to Process 2 "I've written a new parameter file pathtofile".
          Process 2 then reads the file and deletes it. This can work both ways but requires that both processes check for and process messages every so often. This prevents file corruption because the writing process only notifies the reading process after it has finished the file.






          share|improve this answer


























          • Thank you for your answer! So RLock is similar to the semaphores in C right? Will this create an artificial bottleneck? What do you think about the renaming idea from the other post? It's mentioned that this works instantaneous which would be nice. The weight file is saved in a .h5 file. I quickly googled the manager combined with .h5 file which gave me this link. The author mentions that using multiple processes will "results in undefined behavior". So this won't work in my case will it?

            – Mr.Sh4nnon
            Jan 2 at 9:27






          • 1





            @Mr.Sh4nnon see updated answer.

            – Roland Smith
            Jan 2 at 9:47











          • Thanks! So far I have the impression that looking it and waiting whilst saving is the only way.

            – Mr.Sh4nnon
            Jan 2 at 9:50











          • @Mr.Sh4nnon You could also combine files with inter process communication. See Note2.

            – Roland Smith
            Jan 2 at 9:58






          • 1





            @Mr.Sh4nnon Using multiprocessing.Pipe indeed only works if one process starts the other. There are ways for unrelated processes to talk to each other. For example named pipes. But those are operating system dependant.

            – Roland Smith
            Jan 2 at 10:46
















          3












          3








          3







          The multiprocessing module has a RLock class that you can use to regulate access to a sharded resource. This also works for files if you remember to acquire the lock before reading and writing and release it afterwards. Using a lock implies that some of the time one of the processes cannot read or write the file. How much of a problem this is depends on how much both processes have to access the file.



          Note that for this to work, one of the scripts has to start the other script as a Process after creating the lock.



          If the weights are a Python data structure, you could put that under control of a multiprocessing.Manager. That will manage access to the objects under its control for you. Note that a Manager is not meant for use with files, just in-memory objects.



          Additionally on UNIX-like operating systems Python has os.lockf to lock (part of) a file. Note that this is an advisory lock only. That is, if another process calls lockf, the return value indicates that the file is already locked. It does not actually prevent you from reading the file.



          Note:
          Files can be read and written. Only when two processes are reading the same file (read/read) does this work well. Every other combination (read/write, write/read, write/write) can and eventually will result in undefined behavior and data corruption.



          Note2:
          Another possible solution involves inter process communication.
          Process 1 writes a new h5 file (with a random filename), closes it, and then sends a message (using a Pipe or Queue to Process 2 "I've written a new parameter file pathtofile".
          Process 2 then reads the file and deletes it. This can work both ways but requires that both processes check for and process messages every so often. This prevents file corruption because the writing process only notifies the reading process after it has finished the file.






          share|improve this answer















          The multiprocessing module has a RLock class that you can use to regulate access to a sharded resource. This also works for files if you remember to acquire the lock before reading and writing and release it afterwards. Using a lock implies that some of the time one of the processes cannot read or write the file. How much of a problem this is depends on how much both processes have to access the file.



          Note that for this to work, one of the scripts has to start the other script as a Process after creating the lock.



          If the weights are a Python data structure, you could put that under control of a multiprocessing.Manager. That will manage access to the objects under its control for you. Note that a Manager is not meant for use with files, just in-memory objects.



          Additionally on UNIX-like operating systems Python has os.lockf to lock (part of) a file. Note that this is an advisory lock only. That is, if another process calls lockf, the return value indicates that the file is already locked. It does not actually prevent you from reading the file.



          Note:
          Files can be read and written. Only when two processes are reading the same file (read/read) does this work well. Every other combination (read/write, write/read, write/write) can and eventually will result in undefined behavior and data corruption.



          Note2:
          Another possible solution involves inter process communication.
          Process 1 writes a new h5 file (with a random filename), closes it, and then sends a message (using a Pipe or Queue to Process 2 "I've written a new parameter file pathtofile".
          Process 2 then reads the file and deletes it. This can work both ways but requires that both processes check for and process messages every so often. This prevents file corruption because the writing process only notifies the reading process after it has finished the file.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Jan 2 at 10:04

























          answered Jan 2 at 9:15









          Roland SmithRoland Smith

          26.8k33256




          26.8k33256













          • Thank you for your answer! So RLock is similar to the semaphores in C right? Will this create an artificial bottleneck? What do you think about the renaming idea from the other post? It's mentioned that this works instantaneous which would be nice. The weight file is saved in a .h5 file. I quickly googled the manager combined with .h5 file which gave me this link. The author mentions that using multiple processes will "results in undefined behavior". So this won't work in my case will it?

            – Mr.Sh4nnon
            Jan 2 at 9:27






          • 1





            @Mr.Sh4nnon see updated answer.

            – Roland Smith
            Jan 2 at 9:47











          • Thanks! So far I have the impression that looking it and waiting whilst saving is the only way.

            – Mr.Sh4nnon
            Jan 2 at 9:50











          • @Mr.Sh4nnon You could also combine files with inter process communication. See Note2.

            – Roland Smith
            Jan 2 at 9:58






          • 1





            @Mr.Sh4nnon Using multiprocessing.Pipe indeed only works if one process starts the other. There are ways for unrelated processes to talk to each other. For example named pipes. But those are operating system dependant.

            – Roland Smith
            Jan 2 at 10:46





















          • Thank you for your answer! So RLock is similar to the semaphores in C right? Will this create an artificial bottleneck? What do you think about the renaming idea from the other post? It's mentioned that this works instantaneous which would be nice. The weight file is saved in a .h5 file. I quickly googled the manager combined with .h5 file which gave me this link. The author mentions that using multiple processes will "results in undefined behavior". So this won't work in my case will it?

            – Mr.Sh4nnon
            Jan 2 at 9:27






          • 1





            @Mr.Sh4nnon see updated answer.

            – Roland Smith
            Jan 2 at 9:47











          • Thanks! So far I have the impression that looking it and waiting whilst saving is the only way.

            – Mr.Sh4nnon
            Jan 2 at 9:50











          • @Mr.Sh4nnon You could also combine files with inter process communication. See Note2.

            – Roland Smith
            Jan 2 at 9:58






          • 1





            @Mr.Sh4nnon Using multiprocessing.Pipe indeed only works if one process starts the other. There are ways for unrelated processes to talk to each other. For example named pipes. But those are operating system dependant.

            – Roland Smith
            Jan 2 at 10:46



















          Thank you for your answer! So RLock is similar to the semaphores in C right? Will this create an artificial bottleneck? What do you think about the renaming idea from the other post? It's mentioned that this works instantaneous which would be nice. The weight file is saved in a .h5 file. I quickly googled the manager combined with .h5 file which gave me this link. The author mentions that using multiple processes will "results in undefined behavior". So this won't work in my case will it?

          – Mr.Sh4nnon
          Jan 2 at 9:27





          Thank you for your answer! So RLock is similar to the semaphores in C right? Will this create an artificial bottleneck? What do you think about the renaming idea from the other post? It's mentioned that this works instantaneous which would be nice. The weight file is saved in a .h5 file. I quickly googled the manager combined with .h5 file which gave me this link. The author mentions that using multiple processes will "results in undefined behavior". So this won't work in my case will it?

          – Mr.Sh4nnon
          Jan 2 at 9:27




          1




          1





          @Mr.Sh4nnon see updated answer.

          – Roland Smith
          Jan 2 at 9:47





          @Mr.Sh4nnon see updated answer.

          – Roland Smith
          Jan 2 at 9:47













          Thanks! So far I have the impression that looking it and waiting whilst saving is the only way.

          – Mr.Sh4nnon
          Jan 2 at 9:50





          Thanks! So far I have the impression that looking it and waiting whilst saving is the only way.

          – Mr.Sh4nnon
          Jan 2 at 9:50













          @Mr.Sh4nnon You could also combine files with inter process communication. See Note2.

          – Roland Smith
          Jan 2 at 9:58





          @Mr.Sh4nnon You could also combine files with inter process communication. See Note2.

          – Roland Smith
          Jan 2 at 9:58




          1




          1





          @Mr.Sh4nnon Using multiprocessing.Pipe indeed only works if one process starts the other. There are ways for unrelated processes to talk to each other. For example named pipes. But those are operating system dependant.

          – Roland Smith
          Jan 2 at 10:46







          @Mr.Sh4nnon Using multiprocessing.Pipe indeed only works if one process starts the other. There are ways for unrelated processes to talk to each other. For example named pipes. But those are operating system dependant.

          – Roland Smith
          Jan 2 at 10:46






















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54003532%2fgood-practice-for-parallel-tasks-in-python%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          MongoDB - Not Authorized To Execute Command

          How to fix TextFormField cause rebuild widget in Flutter

          in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith