Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] torque pbs behaviour...
From: Klymak Jody (jklymak_at_[hidden])
Date: 2009-08-11 09:43:02


On 11-Aug-09, at 6:28 AM, Ralph Castain wrote:

> The reason your job is hanging is sitting in the orte-ps output. You
> have multiple processes declaring themselves to be the same MPI
> rank. That definitely won't work.

Its the "local rank" if that makes any difference...

Any thoughts on this output?

[xserve03.local][[61029,1],4][btl_tcp_endpoint.c:
486:mca_btl_tcp_endpoint_recv_connect_ack] received unexpected process
identifier [[61029,1],3]

> The question is why is that happening? We use Torque all the time,
> so we know that the basic support is correct. It -could- be related
> to lib confusion, but I can't tell for sure.

Just to be clear, this is not going through torque at this point. Its
just vanilla ssh, for which this code worked with 1.1.5.

> Can you rebuild OMPI with --enable-debug, and rerun the job with the
> following added to your cmd line?
>
> -mca plm_base_verbose 5 --debug-daemons -mca odls_base_verbose 5

Working on this...

Thanks, Jody