Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Gleb Natapov (glebn_at_[hidden])
Date: 2007-06-14 06:32:48


On Wed, Jun 13, 2007 at 01:54:28PM -0400, Jeff Squyres wrote:
> On Jun 13, 2007, at 1:37 PM, Gleb Natapov wrote:
>
> >> I have 2 hosts: one with 3 active ports and one with 2 active ports.
> >> If I run an MPI job between them, the openib BTL wireup got badly and
> >> it aborts. So handling a heterogeneous number of ports is not
> >> currently handled properly in the code.
> > Are the all in the same subnet? If not I fixed some bug yesterday that
> > may help.
>
> No, they are not all on the same subnet:
>
> host svbu-mpi002:
> port 1: DDR, subnet A
> ports 2 and 3: SDR, subnet B
>
> host svbu-mpi003:
> port 1: DDR, subnet A
> port 2: SDR, subnet B
>
> With today's trunk, I still see the problem:
>
> [10:52] svbu-mpi:~/mpi % mpirun --mca btl openib,self -np 2 --host
> svbu-mpi002,svbu-mpi003 ring
> Process 1 waiting to receive from 0: tag 201
> Process 0 sending 10 to 1, tag 201
> [svbu-mpi002][0,1,0][btl_openib_endpoint.c:
> 794:mca_btl_openib_endpoint_recv] can't find suitable endpoint for
> this peer
>
Now I see that my fix was in the right place, but still a little bit
wrong. I committed a fix to my fix in r15073. Can you check it?

--
			Gleb.