Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] tcp connectivity OS X and 1.3.3
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-08-11 18:38:17


I can't speak to the tcp problem, but the following:

> [xserve02.local:43625] [[28627,0],2] orte:daemon:send_relay -
> recipient list is empty!

is not an error message. It is perfectly normal operation.

Ralph

On Aug 11, 2009, at 1:54 PM, Jody Klymak wrote:

> Hello,
>
>
> On Aug 11, 2009, at 8:15 AM, Ralph Castain wrote:
>
>> You can turn off those mca params I gave you as you are now past
>> that point. I know there are others that can help debug that TCP
>> btl error, but they can help you there.
>
> Just to eliminate the mitgcm from the debugging I compiled example/
> hello_c.c and run as:
>
> /usr/local/openmpi/bin/mpirun --debug-daemons -n 8 -host xserve01
> hello_c >& hello_c4_1host.txt
>
> There is no ostensible problem. If I run as:
>
> /usr/local/openmpi/bin/mpirun --debug-daemons -n 8 -host
> xserve01,xserve02 hello_c >& hello_c4_2host.txt
>
> The process says Hello, but hangs at the end, and needs to be killed
> with ^C.
>
> I then modified connectivity_c to include a printf as MPI is
> initialized, and hardwired verbose=1. This completes, and appears
> to work fine..
>
> /usr/local/openmpi/bin/mpirun --debug-daemons -n 8 -host xserve01
> connectivity_c >& connectivity_c8_1host.txt
>
> However, again, two hosts sours the mix:
>
> /usr/local/openmpi/bin/mpirun --debug-daemons -n 8 -host
> xserve01,xserve02 connectivity_c >& connectivity_c8_2host.txt
>
> This hangs, and after waiting a minute or so we see that rank 0--4
> on xserve01 cannot contact rank 5 (presumably on xserve02).
>
> It seems that I have something wrong in my tcp setup, but
> communication between these servers worked yesterday using 1.1.5,
> and ping etc all work fine, so something else is up. Some sort of
> port permissions?
>
> Th most glaring error I see in these is:
>
> [xserve02.local:43625] [[28627,0],2] orte:daemon:send_relay -
> recipient list is empty!
>
> I see reference in the archives to a similar error where
> "contacts.txt" could not be found. I've had trouble with 10.5.7
> with temporary directories, so maybe that is the issue?
>
> Thanks Jody
>
> <hello_c8_1host.txt>
> <hello_c8_2host.txt>
> <connectivity_c8_1host.txt>
> <connectivity_c8_2host.txt>
>
>
> --
> Jody Klymak
> http://web.uvic.ca/~jklymak/
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users