Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand
From: Philippe (philmpi_at_[hidden])
Date: 2010-06-25 09:23:41


Hi,

I'm trying to run a test program which consists of a server creating a
port using MPI_Open_port and N clients using MPI_Comm_connect to
connect to the server.

I'm able to do so with 1 server and 2 clients, but with 1 server + 3
clients, I get the following error message:

   [node003:32274] [[37084,0],0]:route_callback tried routing message
from [[37084,1],0] to [[40912,1],0]:102, can't find route

This is only happening with the openib BTL. With tcp BTL it works
perfectly fine (ofud also works as a matter of fact...). This has been
tested on two completely different clusters, with identical results.
In either cases, the IB frabic works normally.

Any help would be greatly appreciated! Several people in my team
looked at the problem. Google and the mailing list archive did not
provide any clue. I believe that from an MPI standpoint, my test
program is valid (and it works with TCP, which make me feel better
about the sequence of MPI calls)

Regards,
Philippe.

Background:

I intend to use openMPI to transport data inside a much larger
application. Because of that, I cannot used mpiexec. Each process is
started by our own "job management" and use a name server to find
about each others. Once all the clients are connected, I would like
the server to do MPI_Recv to get the data from all the client. I dont
care about the order or which client are sending data, as long as I
can receive it with on call. Do do that, the clients and the server
are going through a series of Comm_accept/Conn_connect/Intercomm_merge
so that at the end, all the clients and the server are inside the same
intracomm.

Steps:

I have a sample program that show the issue. I tried to make it as
short as possible. It needs to be executed on a shared file system
like NFS because the server write the port info to a file that the
client will read. To reproduce the issue, the following steps should
be performed:

 0. compile the test with "mpicc -o ben12 ben12.c"
 1. ssh to the machine that will be the server
 2. run ./ben12 3 1
 3. ssh to the machine that will be the client #1
 4. run ./ben12 3 0
 5. repeat step 3-4 for client #2 and #3

the server accept the connection from client #1 and merge it in a new
intracomm. It then accept connection from client #2 and merge it. when
the client #3 arrives, the server accept the connection, but that
cause client #1 and #2 to die with the error above (see the complete
trace in the tarball).

The exact steps are:

     - server open port
     - server does accept
     - client #1 does connect
     - server and client #1 do merge
     - server does accept
     - client #2 does connect
     - server, client #1 and client #2 do merge
     - server does accept
     - client #3 does connect
     - server, client #1, client #2 and client #3 do merge

My infiniband network works normally with other test programs or
applications (MPI or others like Verbs).

Info about my setup:

    openMPI version = 1.4.1 (I also tried 1.4.2, nightly snapshot of
1.4.3, nightly snapshot of 1.5 --- all show the same error)
    config.log in the tarball
    "ompi_info --all" in the tarball
    OFED version = 1.3 installed from RHEL 5.3
    Distro = RedHat Entreprise Linux 5.3
    Kernel = 2.6.18-128.4.1.el5 x86_64
    subnet manager = built-in SM from the cisco/topspin switch
    output of ibv_devinfo included in the tarball (there are no "bad" nodes)
    "ulimit -l" says "unlimited"

The tarball contains:

   - ben12.c: my test program showing the behavior
   - config.log / config.out / make.out / make-install.out /
ifconfig.txt / ibv-devinfo.txt / ompi_info.txt
   - trace-tcp.txt: output of the server and each client when it works
with TCP (I added "btl = tcp,self" in ~/.openmpi/mca-params.conf)
   - trace-ib.txt: output of the server and each client when it fails
with IB (I added "btl = openib,self" in ~/.openmpi/mca-params.conf)

I hope I provided enough info for somebody to reproduce the problem...