Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Dirk Eddelbuettel (edd_at_[hidden])
Date: 2007-07-19 10:14:43


On 18 July 2007 at 19:14, Dirk Eddelbuettel wrote:
|
| Hi Tim,
|
| Thanks for the follow-up
|
| On 18 July 2007 at 17:22, Tim Prins wrote:
| | <snip>
| | > Yes, this helps tremendously. I installed rsh, and now it pretty much
| | > works.
| | Glad this worked out for you.
| |
| | >
| | > The one missing detail is that I can't seem to get the stdout/stderr
| | > output. For example:
| | >
| | > $ orterun -np 1 uptime
| | > $ uptime
| | > 18:24:27 up 13 days, 3:03, 0 users, load average: 0.00, 0.03, 0.00
| | >
| | > The man page indicates that stdout/stderr is supposed to come back to
| | > the stdout/stderr of the orterun process. Any ideas on why this isn't
| | > working?
| | It should work. However, we currently have some I/O forwarding problems which
| | show up in some environments that will (hopefully) be fixed in the next
| | release. As far as I know, the problem seems to happen mostly with non-mpi
| | applications.
| |
| | Try running a simple mpi application, such as:
| |
| | #include <stdio.h>
| | #include "mpi.h"
| |
| | int main(int argc, char* argv[])
| | {
| | int rank, size;
| |
| | MPI_Init(&argc, &argv);
| | MPI_Comm_rank(MPI_COMM_WORLD, &rank);
| | MPI_Comm_size(MPI_COMM_WORLD, &size);
| | printf("Hello, world, I am %d of %d\n", rank, size);
| | MPI_Finalize();
| |
| | return 0;
| | }
| |
| | If that works fine, then it is probably our problem, and not a problem with
| | your setup.
| |
| | Sorry I don't have a better answer :(
|
| That works (and I use the same Debian openmpi 1.2.3-1 set of packages Adam
| has):
|
| edd_at_basebud:~> opalcc -o /tmp/openmpitest /tmp/openmpitest.c -lmpi
| edd_at_basebud:~> orterun -np 4 /tmp/openmpitest
| Hello, world, I am 2 of 4
| Hello, world, I am 1 of 4
| Hello, world, I am 0 of 4
| Hello, world, I am 3 of 4
| edd_at_basebud:~>
|
| I was toying with this at work earlier, and it was hanging there (using
| hostname or uptime as the token binaries) as soon as I increased the np
| parameter beyond 1.
|
| It works here:
|
| edd_at_basebud:~> orterun -np 4 hostname
| basebud
| basebud
| basebud
| basebud
| edd_at_basebud:~>
|
| I have slurm-llnl test packages installed at work but not here. Maybe I need
| to a dig a bit more into slurm. (Adam: slurm package should be forthcoming.
| I can point you to the snapshots from the fellow whom I mentor on this.)

Indeed, at work it hangs once it up the np parameter:

foo:~> orterun -np 4 ./openmpitest
Hello, world, I am 0 of 4
Hello, world, I am 1 of 4
Hello, world, I am 2 of 4
Hello, world, I am 3 of 4
orterun: killing job...

Killed
foo:~> orterun -np 4 -H localhost ./openmpitest
Hello, world, I am 1 of 4
Hello, world, I am 0 of 4
Hello, world, I am 2 of 4
Hello, world, I am 3 of 4
foo:~>

Restricting it to localhost helps. Any ideas?

x86 multicore/multicpu, Open MPI 1.2.3, Slurm 1.2.11, Ubuntu 7.04 plus a
handful of handcompiled packages from Debian unstable. More details available
just tell what is needed and how best to compile it.

Dirk

-- 
Hell, there are no rules here - we're trying to accomplish something. 
                                                  -- Thomas A. Edison