Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Test works with 3 computers, but not 4?
From: Nifty Tom Mitchell (niftyompi_at_[hidden])
Date: 2009-07-29 16:22:45


On Wed, Jul 29, 2009 at 01:42:39PM -0600, Ralph Castain wrote:
>
> It sounds like perhaps IOF messages aren't getting relayed along the
> daemons. Note that the daemon on each node does have to be able to send
> TCP messages to all other nodes, not just mpirun.
>
> Couple of things you can do to check:
>
> 1. -mca routed direct - this will send all messages direct instead of
> across the daemons
>
> 2. --leave-session-attached - will allow you to see any errors reported
> by the daemons, including those from attempting to relay messages
>
> Ralph
>
> On Jul 29, 2009, at 1:19 PM, David Doria wrote:
>
>> I wrote a simple program to display "hello world" from each process.
>>
>> When I run this (126 - my machine, 122, and 123), everything works
.....
>> However, when I run this (126 - my machine, 122, 123, AND 125), I get
>> no output at all.
>>
>> Is there any way to check what is going on / does anyone know what

All of the above good stuff and:

Since the set of hosts all work in most of the possible permutations for
the case of three but not four it is possible that your simple program
has an issue in the way it exit(s).

Please post your simple program..... I am looking for the omission of
MPI_Finalize() or a funny return/exit status.

    http://www.mcs.anl.gov/research/projects/mpi/mpi-standard/mpi-report-2.0/node32.htm

Also, Try adding a sleep(1) after the printf(...---"hello world"...)
and/ or after MPI_Finalize() on the chance that there is a race on exit.

Try the "hello world" example in the source package for Open MPI or at:

        http://www.dartmouth.edu/~rc/classes/intro_mpi/hello_world_ex.html

You can also add gethostbyname() or environment variable checks etc
to make sure that each host is involved as you expect in contrast to
nearly anonymous rank number. Also double check to see which mpirun
you are using. i.e alternatives on your system may be "interesting"
since various versions of MPI are naturally in some distros $PATH/$path
may be important.
    $ file /usr/bin/mpirun
    /usr/bin/mpirun: symbolic link to `/etc/alternatives/mpi-run'
    $ locate bin/mpirun
    /usr/bin/mpirun
    /usr/bin/mpirun.py
    $ rpm -qf /usr/bin/mpirun.py
    mpich2-1.1-1.fc10.x86_64

-- 
	T o m  M i t c h e l l 
	Found me a new hat, now what?