mpiexec won't run mpi4py script when two hosts are utilized in an MPI cluster established through LAN
So I have this other desktop PC, that serves as my server, primesystem
and a laptop as my client, zerosystem
that is connected to it. They both serve as my ssh-server
and ssh-client
respectively, and is connected through an ethernet (not crossover) cable.
I both followed the instructions stated in these tutorials:
Running an MPI Cluster within a LAN and Setting Up an MPICH2 Cluster in Ubuntu, only that I would like to use the MPI implementation of python
, so I instead used mpi4py
to test if both PC could utilize MPI.
I set up a directory /cloud
in primesystem
that shall be shared in my network, and this is mounted in my zerosystem
, as instructed by the first tutorial (so I could also work in either both systems without the need to log-in through ssh
).
In the server or primesystem
, if I run the sample helloworld script, it works fine:
one@primesystem:/cloud$ mpirun -np 5 -hosts primesystem python -m mpi4py helloworld
Hello, World! I am process 0 of 5 on primesystem.
Hello, World! I am process 1 of 5 on primesystem.
Hello, World! I am process 2 of 5 on primesystem.
Hello, World! I am process 3 of 5 on primesystem.
Hello, World! I am process 4 of 5 on primesystem.
Same goes if I run it through the host zerosystem
(but it should be noted there's a noticeable delay in execution due to utilizing external CPU from zerosystem
):
one@primesystem:/cloud$ mpirun -np 5 -hosts zerosystem python -m mpi4py helloworld
Hello, World! I am process 0 of 5 on zerosystem.
Hello, World! I am process 1 of 5 on zerosystem.
Hello, World! I am process 2 of 5 on zerosystem.
Hello, World! I am process 3 of 5 on zerosystem.
Hello, World! I am process 4 of 5 on zerosystem.
But if I utilized the two hosts, it doesn't seem to respond at all:
one@primesystem:/cloud$ mpirun -np 5 -hosts primesystem,zerosystem python -m mpi4py helloworld
Hello, World! I am process 0 of 5 on primesystem.
(If I interchanged the order of hosts, zerosystem
being first, no Hello World response is shown)
I tried entering the lists of hosts in a .mpi-config
file and their respective processes to spawn, then utilizing the -f
parameter instead of -hosts
zerosystem:4
primesystem:2
but it still gets the same response, and after several seconds or minute, this is the error output:
one@primesystem:/cloud$ mpirun -np 6 -f .mpi-config python -m mpi4py helloworld
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 23329 RUNNING AT primesystem
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:1@zerosystem] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[proxy:0:1@zerosystem] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:1@zerosystem] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
[mpiexec@primesystem] HYDT_bscu_wait_for_completion (tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
[mpiexec@primesystem] HYDT_bsci_wait_for_completion (tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
[mpiexec@primesystem] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:218): launcher returned error waiting for completion
[mpiexec@primesystem] main (ui/mpich/mpiexec.c:336): process manager error waiting for completion
Why is this? Any ideas?
python ssh mpi hosting
add a comment |
So I have this other desktop PC, that serves as my server, primesystem
and a laptop as my client, zerosystem
that is connected to it. They both serve as my ssh-server
and ssh-client
respectively, and is connected through an ethernet (not crossover) cable.
I both followed the instructions stated in these tutorials:
Running an MPI Cluster within a LAN and Setting Up an MPICH2 Cluster in Ubuntu, only that I would like to use the MPI implementation of python
, so I instead used mpi4py
to test if both PC could utilize MPI.
I set up a directory /cloud
in primesystem
that shall be shared in my network, and this is mounted in my zerosystem
, as instructed by the first tutorial (so I could also work in either both systems without the need to log-in through ssh
).
In the server or primesystem
, if I run the sample helloworld script, it works fine:
one@primesystem:/cloud$ mpirun -np 5 -hosts primesystem python -m mpi4py helloworld
Hello, World! I am process 0 of 5 on primesystem.
Hello, World! I am process 1 of 5 on primesystem.
Hello, World! I am process 2 of 5 on primesystem.
Hello, World! I am process 3 of 5 on primesystem.
Hello, World! I am process 4 of 5 on primesystem.
Same goes if I run it through the host zerosystem
(but it should be noted there's a noticeable delay in execution due to utilizing external CPU from zerosystem
):
one@primesystem:/cloud$ mpirun -np 5 -hosts zerosystem python -m mpi4py helloworld
Hello, World! I am process 0 of 5 on zerosystem.
Hello, World! I am process 1 of 5 on zerosystem.
Hello, World! I am process 2 of 5 on zerosystem.
Hello, World! I am process 3 of 5 on zerosystem.
Hello, World! I am process 4 of 5 on zerosystem.
But if I utilized the two hosts, it doesn't seem to respond at all:
one@primesystem:/cloud$ mpirun -np 5 -hosts primesystem,zerosystem python -m mpi4py helloworld
Hello, World! I am process 0 of 5 on primesystem.
(If I interchanged the order of hosts, zerosystem
being first, no Hello World response is shown)
I tried entering the lists of hosts in a .mpi-config
file and their respective processes to spawn, then utilizing the -f
parameter instead of -hosts
zerosystem:4
primesystem:2
but it still gets the same response, and after several seconds or minute, this is the error output:
one@primesystem:/cloud$ mpirun -np 6 -f .mpi-config python -m mpi4py helloworld
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 23329 RUNNING AT primesystem
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:1@zerosystem] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[proxy:0:1@zerosystem] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:1@zerosystem] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
[mpiexec@primesystem] HYDT_bscu_wait_for_completion (tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
[mpiexec@primesystem] HYDT_bsci_wait_for_completion (tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
[mpiexec@primesystem] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:218): launcher returned error waiting for completion
[mpiexec@primesystem] main (ui/mpich/mpiexec.c:336): process manager error waiting for completion
Why is this? Any ideas?
python ssh mpi hosting
add a comment |
So I have this other desktop PC, that serves as my server, primesystem
and a laptop as my client, zerosystem
that is connected to it. They both serve as my ssh-server
and ssh-client
respectively, and is connected through an ethernet (not crossover) cable.
I both followed the instructions stated in these tutorials:
Running an MPI Cluster within a LAN and Setting Up an MPICH2 Cluster in Ubuntu, only that I would like to use the MPI implementation of python
, so I instead used mpi4py
to test if both PC could utilize MPI.
I set up a directory /cloud
in primesystem
that shall be shared in my network, and this is mounted in my zerosystem
, as instructed by the first tutorial (so I could also work in either both systems without the need to log-in through ssh
).
In the server or primesystem
, if I run the sample helloworld script, it works fine:
one@primesystem:/cloud$ mpirun -np 5 -hosts primesystem python -m mpi4py helloworld
Hello, World! I am process 0 of 5 on primesystem.
Hello, World! I am process 1 of 5 on primesystem.
Hello, World! I am process 2 of 5 on primesystem.
Hello, World! I am process 3 of 5 on primesystem.
Hello, World! I am process 4 of 5 on primesystem.
Same goes if I run it through the host zerosystem
(but it should be noted there's a noticeable delay in execution due to utilizing external CPU from zerosystem
):
one@primesystem:/cloud$ mpirun -np 5 -hosts zerosystem python -m mpi4py helloworld
Hello, World! I am process 0 of 5 on zerosystem.
Hello, World! I am process 1 of 5 on zerosystem.
Hello, World! I am process 2 of 5 on zerosystem.
Hello, World! I am process 3 of 5 on zerosystem.
Hello, World! I am process 4 of 5 on zerosystem.
But if I utilized the two hosts, it doesn't seem to respond at all:
one@primesystem:/cloud$ mpirun -np 5 -hosts primesystem,zerosystem python -m mpi4py helloworld
Hello, World! I am process 0 of 5 on primesystem.
(If I interchanged the order of hosts, zerosystem
being first, no Hello World response is shown)
I tried entering the lists of hosts in a .mpi-config
file and their respective processes to spawn, then utilizing the -f
parameter instead of -hosts
zerosystem:4
primesystem:2
but it still gets the same response, and after several seconds or minute, this is the error output:
one@primesystem:/cloud$ mpirun -np 6 -f .mpi-config python -m mpi4py helloworld
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 23329 RUNNING AT primesystem
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:1@zerosystem] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[proxy:0:1@zerosystem] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:1@zerosystem] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
[mpiexec@primesystem] HYDT_bscu_wait_for_completion (tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
[mpiexec@primesystem] HYDT_bsci_wait_for_completion (tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
[mpiexec@primesystem] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:218): launcher returned error waiting for completion
[mpiexec@primesystem] main (ui/mpich/mpiexec.c:336): process manager error waiting for completion
Why is this? Any ideas?
python ssh mpi hosting
So I have this other desktop PC, that serves as my server, primesystem
and a laptop as my client, zerosystem
that is connected to it. They both serve as my ssh-server
and ssh-client
respectively, and is connected through an ethernet (not crossover) cable.
I both followed the instructions stated in these tutorials:
Running an MPI Cluster within a LAN and Setting Up an MPICH2 Cluster in Ubuntu, only that I would like to use the MPI implementation of python
, so I instead used mpi4py
to test if both PC could utilize MPI.
I set up a directory /cloud
in primesystem
that shall be shared in my network, and this is mounted in my zerosystem
, as instructed by the first tutorial (so I could also work in either both systems without the need to log-in through ssh
).
In the server or primesystem
, if I run the sample helloworld script, it works fine:
one@primesystem:/cloud$ mpirun -np 5 -hosts primesystem python -m mpi4py helloworld
Hello, World! I am process 0 of 5 on primesystem.
Hello, World! I am process 1 of 5 on primesystem.
Hello, World! I am process 2 of 5 on primesystem.
Hello, World! I am process 3 of 5 on primesystem.
Hello, World! I am process 4 of 5 on primesystem.
Same goes if I run it through the host zerosystem
(but it should be noted there's a noticeable delay in execution due to utilizing external CPU from zerosystem
):
one@primesystem:/cloud$ mpirun -np 5 -hosts zerosystem python -m mpi4py helloworld
Hello, World! I am process 0 of 5 on zerosystem.
Hello, World! I am process 1 of 5 on zerosystem.
Hello, World! I am process 2 of 5 on zerosystem.
Hello, World! I am process 3 of 5 on zerosystem.
Hello, World! I am process 4 of 5 on zerosystem.
But if I utilized the two hosts, it doesn't seem to respond at all:
one@primesystem:/cloud$ mpirun -np 5 -hosts primesystem,zerosystem python -m mpi4py helloworld
Hello, World! I am process 0 of 5 on primesystem.
(If I interchanged the order of hosts, zerosystem
being first, no Hello World response is shown)
I tried entering the lists of hosts in a .mpi-config
file and their respective processes to spawn, then utilizing the -f
parameter instead of -hosts
zerosystem:4
primesystem:2
but it still gets the same response, and after several seconds or minute, this is the error output:
one@primesystem:/cloud$ mpirun -np 6 -f .mpi-config python -m mpi4py helloworld
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 23329 RUNNING AT primesystem
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:1@zerosystem] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[proxy:0:1@zerosystem] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:1@zerosystem] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
[mpiexec@primesystem] HYDT_bscu_wait_for_completion (tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
[mpiexec@primesystem] HYDT_bsci_wait_for_completion (tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
[mpiexec@primesystem] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:218): launcher returned error waiting for completion
[mpiexec@primesystem] main (ui/mpich/mpiexec.c:336): process manager error waiting for completion
Why is this? Any ideas?
python ssh mpi hosting
python ssh mpi hosting
asked Feb 22 '16 at 16:05


anobilisgorseanobilisgorse
431620
431620
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f35558309%2fmpiexec-wont-run-mpi4py-script-when-two-hosts-are-utilized-in-an-mpi-cluster-es%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f35558309%2fmpiexec-wont-run-mpi4py-script-when-two-hosts-are-utilized-in-an-mpi-cluster-es%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown