mpiexec won't run mpi4py script when two hosts are utilized in an MPI cluster established through LAN












2















So I have this other desktop PC, that serves as my server, primesystem and a laptop as my client, zerosystem that is connected to it. They both serve as my ssh-server and ssh-client respectively, and is connected through an ethernet (not crossover) cable.



I both followed the instructions stated in these tutorials:
Running an MPI Cluster within a LAN and Setting Up an MPICH2 Cluster in Ubuntu, only that I would like to use the MPI implementation of python, so I instead used mpi4py to test if both PC could utilize MPI.



I set up a directory /cloud in primesystem that shall be shared in my network, and this is mounted in my zerosystem, as instructed by the first tutorial (so I could also work in either both systems without the need to log-in through ssh).



In the server or primesystem, if I run the sample helloworld script, it works fine:



one@primesystem:/cloud$ mpirun -np 5 -hosts primesystem python -m mpi4py helloworld
Hello, World! I am process 0 of 5 on primesystem.
Hello, World! I am process 1 of 5 on primesystem.
Hello, World! I am process 2 of 5 on primesystem.
Hello, World! I am process 3 of 5 on primesystem.
Hello, World! I am process 4 of 5 on primesystem.


Same goes if I run it through the host zerosystem (but it should be noted there's a noticeable delay in execution due to utilizing external CPU from zerosystem):



one@primesystem:/cloud$ mpirun -np 5 -hosts zerosystem python -m mpi4py helloworld
Hello, World! I am process 0 of 5 on zerosystem.
Hello, World! I am process 1 of 5 on zerosystem.
Hello, World! I am process 2 of 5 on zerosystem.
Hello, World! I am process 3 of 5 on zerosystem.
Hello, World! I am process 4 of 5 on zerosystem.


But if I utilized the two hosts, it doesn't seem to respond at all:



one@primesystem:/cloud$ mpirun -np 5 -hosts primesystem,zerosystem python -m mpi4py helloworld
Hello, World! I am process 0 of 5 on primesystem.


(If I interchanged the order of hosts, zerosystem being first, no Hello World response is shown)



I tried entering the lists of hosts in a .mpi-config file and their respective processes to spawn, then utilizing the -f parameter instead of -hosts



zerosystem:4
primesystem:2


but it still gets the same response, and after several seconds or minute, this is the error output:



one@primesystem:/cloud$ mpirun -np 6 -f .mpi-config python -m mpi4py helloworld
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 23329 RUNNING AT primesystem
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:1@zerosystem] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[proxy:0:1@zerosystem] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:1@zerosystem] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
[mpiexec@primesystem] HYDT_bscu_wait_for_completion (tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
[mpiexec@primesystem] HYDT_bsci_wait_for_completion (tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
[mpiexec@primesystem] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:218): launcher returned error waiting for completion
[mpiexec@primesystem] main (ui/mpich/mpiexec.c:336): process manager error waiting for completion


Why is this? Any ideas?










share|improve this question



























    2















    So I have this other desktop PC, that serves as my server, primesystem and a laptop as my client, zerosystem that is connected to it. They both serve as my ssh-server and ssh-client respectively, and is connected through an ethernet (not crossover) cable.



    I both followed the instructions stated in these tutorials:
    Running an MPI Cluster within a LAN and Setting Up an MPICH2 Cluster in Ubuntu, only that I would like to use the MPI implementation of python, so I instead used mpi4py to test if both PC could utilize MPI.



    I set up a directory /cloud in primesystem that shall be shared in my network, and this is mounted in my zerosystem, as instructed by the first tutorial (so I could also work in either both systems without the need to log-in through ssh).



    In the server or primesystem, if I run the sample helloworld script, it works fine:



    one@primesystem:/cloud$ mpirun -np 5 -hosts primesystem python -m mpi4py helloworld
    Hello, World! I am process 0 of 5 on primesystem.
    Hello, World! I am process 1 of 5 on primesystem.
    Hello, World! I am process 2 of 5 on primesystem.
    Hello, World! I am process 3 of 5 on primesystem.
    Hello, World! I am process 4 of 5 on primesystem.


    Same goes if I run it through the host zerosystem (but it should be noted there's a noticeable delay in execution due to utilizing external CPU from zerosystem):



    one@primesystem:/cloud$ mpirun -np 5 -hosts zerosystem python -m mpi4py helloworld
    Hello, World! I am process 0 of 5 on zerosystem.
    Hello, World! I am process 1 of 5 on zerosystem.
    Hello, World! I am process 2 of 5 on zerosystem.
    Hello, World! I am process 3 of 5 on zerosystem.
    Hello, World! I am process 4 of 5 on zerosystem.


    But if I utilized the two hosts, it doesn't seem to respond at all:



    one@primesystem:/cloud$ mpirun -np 5 -hosts primesystem,zerosystem python -m mpi4py helloworld
    Hello, World! I am process 0 of 5 on primesystem.


    (If I interchanged the order of hosts, zerosystem being first, no Hello World response is shown)



    I tried entering the lists of hosts in a .mpi-config file and their respective processes to spawn, then utilizing the -f parameter instead of -hosts



    zerosystem:4
    primesystem:2


    but it still gets the same response, and after several seconds or minute, this is the error output:



    one@primesystem:/cloud$ mpirun -np 6 -f .mpi-config python -m mpi4py helloworld
    ===================================================================================
    = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
    = PID 23329 RUNNING AT primesystem
    = EXIT CODE: 139
    = CLEANING UP REMAINING PROCESSES
    = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
    ===================================================================================
    [proxy:0:1@zerosystem] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
    [proxy:0:1@zerosystem] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
    [proxy:0:1@zerosystem] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
    [mpiexec@primesystem] HYDT_bscu_wait_for_completion (tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
    [mpiexec@primesystem] HYDT_bsci_wait_for_completion (tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
    [mpiexec@primesystem] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:218): launcher returned error waiting for completion
    [mpiexec@primesystem] main (ui/mpich/mpiexec.c:336): process manager error waiting for completion


    Why is this? Any ideas?










    share|improve this question

























      2












      2








      2








      So I have this other desktop PC, that serves as my server, primesystem and a laptop as my client, zerosystem that is connected to it. They both serve as my ssh-server and ssh-client respectively, and is connected through an ethernet (not crossover) cable.



      I both followed the instructions stated in these tutorials:
      Running an MPI Cluster within a LAN and Setting Up an MPICH2 Cluster in Ubuntu, only that I would like to use the MPI implementation of python, so I instead used mpi4py to test if both PC could utilize MPI.



      I set up a directory /cloud in primesystem that shall be shared in my network, and this is mounted in my zerosystem, as instructed by the first tutorial (so I could also work in either both systems without the need to log-in through ssh).



      In the server or primesystem, if I run the sample helloworld script, it works fine:



      one@primesystem:/cloud$ mpirun -np 5 -hosts primesystem python -m mpi4py helloworld
      Hello, World! I am process 0 of 5 on primesystem.
      Hello, World! I am process 1 of 5 on primesystem.
      Hello, World! I am process 2 of 5 on primesystem.
      Hello, World! I am process 3 of 5 on primesystem.
      Hello, World! I am process 4 of 5 on primesystem.


      Same goes if I run it through the host zerosystem (but it should be noted there's a noticeable delay in execution due to utilizing external CPU from zerosystem):



      one@primesystem:/cloud$ mpirun -np 5 -hosts zerosystem python -m mpi4py helloworld
      Hello, World! I am process 0 of 5 on zerosystem.
      Hello, World! I am process 1 of 5 on zerosystem.
      Hello, World! I am process 2 of 5 on zerosystem.
      Hello, World! I am process 3 of 5 on zerosystem.
      Hello, World! I am process 4 of 5 on zerosystem.


      But if I utilized the two hosts, it doesn't seem to respond at all:



      one@primesystem:/cloud$ mpirun -np 5 -hosts primesystem,zerosystem python -m mpi4py helloworld
      Hello, World! I am process 0 of 5 on primesystem.


      (If I interchanged the order of hosts, zerosystem being first, no Hello World response is shown)



      I tried entering the lists of hosts in a .mpi-config file and their respective processes to spawn, then utilizing the -f parameter instead of -hosts



      zerosystem:4
      primesystem:2


      but it still gets the same response, and after several seconds or minute, this is the error output:



      one@primesystem:/cloud$ mpirun -np 6 -f .mpi-config python -m mpi4py helloworld
      ===================================================================================
      = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
      = PID 23329 RUNNING AT primesystem
      = EXIT CODE: 139
      = CLEANING UP REMAINING PROCESSES
      = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
      ===================================================================================
      [proxy:0:1@zerosystem] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
      [proxy:0:1@zerosystem] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
      [proxy:0:1@zerosystem] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
      [mpiexec@primesystem] HYDT_bscu_wait_for_completion (tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
      [mpiexec@primesystem] HYDT_bsci_wait_for_completion (tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
      [mpiexec@primesystem] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:218): launcher returned error waiting for completion
      [mpiexec@primesystem] main (ui/mpich/mpiexec.c:336): process manager error waiting for completion


      Why is this? Any ideas?










      share|improve this question














      So I have this other desktop PC, that serves as my server, primesystem and a laptop as my client, zerosystem that is connected to it. They both serve as my ssh-server and ssh-client respectively, and is connected through an ethernet (not crossover) cable.



      I both followed the instructions stated in these tutorials:
      Running an MPI Cluster within a LAN and Setting Up an MPICH2 Cluster in Ubuntu, only that I would like to use the MPI implementation of python, so I instead used mpi4py to test if both PC could utilize MPI.



      I set up a directory /cloud in primesystem that shall be shared in my network, and this is mounted in my zerosystem, as instructed by the first tutorial (so I could also work in either both systems without the need to log-in through ssh).



      In the server or primesystem, if I run the sample helloworld script, it works fine:



      one@primesystem:/cloud$ mpirun -np 5 -hosts primesystem python -m mpi4py helloworld
      Hello, World! I am process 0 of 5 on primesystem.
      Hello, World! I am process 1 of 5 on primesystem.
      Hello, World! I am process 2 of 5 on primesystem.
      Hello, World! I am process 3 of 5 on primesystem.
      Hello, World! I am process 4 of 5 on primesystem.


      Same goes if I run it through the host zerosystem (but it should be noted there's a noticeable delay in execution due to utilizing external CPU from zerosystem):



      one@primesystem:/cloud$ mpirun -np 5 -hosts zerosystem python -m mpi4py helloworld
      Hello, World! I am process 0 of 5 on zerosystem.
      Hello, World! I am process 1 of 5 on zerosystem.
      Hello, World! I am process 2 of 5 on zerosystem.
      Hello, World! I am process 3 of 5 on zerosystem.
      Hello, World! I am process 4 of 5 on zerosystem.


      But if I utilized the two hosts, it doesn't seem to respond at all:



      one@primesystem:/cloud$ mpirun -np 5 -hosts primesystem,zerosystem python -m mpi4py helloworld
      Hello, World! I am process 0 of 5 on primesystem.


      (If I interchanged the order of hosts, zerosystem being first, no Hello World response is shown)



      I tried entering the lists of hosts in a .mpi-config file and their respective processes to spawn, then utilizing the -f parameter instead of -hosts



      zerosystem:4
      primesystem:2


      but it still gets the same response, and after several seconds or minute, this is the error output:



      one@primesystem:/cloud$ mpirun -np 6 -f .mpi-config python -m mpi4py helloworld
      ===================================================================================
      = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
      = PID 23329 RUNNING AT primesystem
      = EXIT CODE: 139
      = CLEANING UP REMAINING PROCESSES
      = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
      ===================================================================================
      [proxy:0:1@zerosystem] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
      [proxy:0:1@zerosystem] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
      [proxy:0:1@zerosystem] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
      [mpiexec@primesystem] HYDT_bscu_wait_for_completion (tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
      [mpiexec@primesystem] HYDT_bsci_wait_for_completion (tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
      [mpiexec@primesystem] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:218): launcher returned error waiting for completion
      [mpiexec@primesystem] main (ui/mpich/mpiexec.c:336): process manager error waiting for completion


      Why is this? Any ideas?







      python ssh mpi hosting






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Feb 22 '16 at 16:05









      anobilisgorseanobilisgorse

      431620




      431620
























          0






          active

          oldest

          votes











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f35558309%2fmpiexec-wont-run-mpi4py-script-when-two-hosts-are-utilized-in-an-mpi-cluster-es%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes
















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f35558309%2fmpiexec-wont-run-mpi4py-script-when-two-hosts-are-utilized-in-an-mpi-cluster-es%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          MongoDB - Not Authorized To Execute Command

          How to fix TextFormField cause rebuild widget in Flutter

          in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith