Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-06-13 13:54:28

On Jun 13, 2007, at 1:37 PM, Gleb Natapov wrote:

>> I have 2 hosts: one with 3 active ports and one with 2 active ports.
>> If I run an MPI job between them, the openib BTL wireup got badly and
>> it aborts. So handling a heterogeneous number of ports is not
>> currently handled properly in the code.
> Are the all in the same subnet? If not I fixed some bug yesterday that
> may help.

No, they are not all on the same subnet:

host svbu-mpi002:
port 1: DDR, subnet A
ports 2 and 3: SDR, subnet B

host svbu-mpi003:
port 1: DDR, subnet A
port 2: SDR, subnet B

With today's trunk, I still see the problem:

[10:52] svbu-mpi:~/mpi % mpirun --mca btl openib,self -np 2 --host
svbu-mpi002,svbu-mpi003 ring
Process 1 waiting to receive from 0: tag 201
Process 0 sending 10 to 1, tag 201
794:mca_btl_openib_endpoint_recv] can't find suitable endpoint for
this peer

I'll try to look into this today or tomorrow...

Jeff Squyres
Cisco Systems