Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand
From: Ralph Castain (rhc_at_[hidden])
Date: 2010-07-12 12:22:21


Sorry for the delayed response - Brad asked if I could comment on this.

I'm afraid your application, as written, isn't going to work because the rendezvous protocol isn't correct. You cannot just write a port to a file and have the other side of a connect/accept read it. The reason for this is that OMPI needs to route its out-of-band communications, and needs some handshake to get that setup. If we don't route those communications, we consume way too many ports on nodes of large machines, and thus cannot run large jobs.

If you want to do this, you need three things:

1. you have to run our "ompi-server" program on a node where all MPI processes can reach it. This program serves as the central rendezvous point. See "man ompi-server" for info.

2. you'll need a patch I provided to some other users that allows singletons to connect to ompi-server without first spawning their own daemon. Otherwise, you get an OMPI daemon ("orted") started for every one of your clients.

3. you'll need the patch I'm just completing that allows you to have more than 64 singletons connecting together, otherwise you'll just segfault. Each of your clients looks like a singleton to us because it wasn't started with mpiexec.

I suspect your test works because (a) TCP interconnects differently than IB and doesn't talk via OOB to do it, and thus you made it further (but would still fail at some point when OOB was required), and (b) you were running fewer than 64 clients.

HTH
Ralph

On Jun 25, 2010, at 7:23 AM, Philippe wrote:

> Hi,
>
> I'm trying to run a test program which consists of a server creating a
> port using MPI_Open_port and N clients using MPI_Comm_connect to
> connect to the server.
>
> I'm able to do so with 1 server and 2 clients, but with 1 server + 3
> clients, I get the following error message:
>
> [node003:32274] [[37084,0],0]:route_callback tried routing message
> from [[37084,1],0] to [[40912,1],0]:102, can't find route
>
> This is only happening with the openib BTL. With tcp BTL it works
> perfectly fine (ofud also works as a matter of fact...). This has been
> tested on two completely different clusters, with identical results.
> In either cases, the IB frabic works normally.
>
> Any help would be greatly appreciated! Several people in my team
> looked at the problem. Google and the mailing list archive did not
> provide any clue. I believe that from an MPI standpoint, my test
> program is valid (and it works with TCP, which make me feel better
> about the sequence of MPI calls)
>
> Regards,
> Philippe.
>
>
>
> Background:
>
> I intend to use openMPI to transport data inside a much larger
> application. Because of that, I cannot used mpiexec. Each process is
> started by our own "job management" and use a name server to find
> about each others. Once all the clients are connected, I would like
> the server to do MPI_Recv to get the data from all the client. I dont
> care about the order or which client are sending data, as long as I
> can receive it with on call. Do do that, the clients and the server
> are going through a series of Comm_accept/Conn_connect/Intercomm_merge
> so that at the end, all the clients and the server are inside the same
> intracomm.
>
> Steps:
>
> I have a sample program that show the issue. I tried to make it as
> short as possible. It needs to be executed on a shared file system
> like NFS because the server write the port info to a file that the
> client will read. To reproduce the issue, the following steps should
> be performed:
>
> 0. compile the test with "mpicc -o ben12 ben12.c"
> 1. ssh to the machine that will be the server
> 2. run ./ben12 3 1
> 3. ssh to the machine that will be the client #1
> 4. run ./ben12 3 0
> 5. repeat step 3-4 for client #2 and #3
>
> the server accept the connection from client #1 and merge it in a new
> intracomm. It then accept connection from client #2 and merge it. when
> the client #3 arrives, the server accept the connection, but that
> cause client #1 and #2 to die with the error above (see the complete
> trace in the tarball).
>
> The exact steps are:
>
> - server open port
> - server does accept
> - client #1 does connect
> - server and client #1 do merge
> - server does accept
> - client #2 does connect
> - server, client #1 and client #2 do merge
> - server does accept
> - client #3 does connect
> - server, client #1, client #2 and client #3 do merge
>
>
> My infiniband network works normally with other test programs or
> applications (MPI or others like Verbs).
>
> Info about my setup:
>
> openMPI version = 1.4.1 (I also tried 1.4.2, nightly snapshot of
> 1.4.3, nightly snapshot of 1.5 --- all show the same error)
> config.log in the tarball
> "ompi_info --all" in the tarball
> OFED version = 1.3 installed from RHEL 5.3
> Distro = RedHat Entreprise Linux 5.3
> Kernel = 2.6.18-128.4.1.el5 x86_64
> subnet manager = built-in SM from the cisco/topspin switch
> output of ibv_devinfo included in the tarball (there are no "bad" nodes)
> "ulimit -l" says "unlimited"
>
> The tarball contains:
>
> - ben12.c: my test program showing the behavior
> - config.log / config.out / make.out / make-install.out /
> ifconfig.txt / ibv-devinfo.txt / ompi_info.txt
> - trace-tcp.txt: output of the server and each client when it works
> with TCP (I added "btl = tcp,self" in ~/.openmpi/mca-params.conf)
> - trace-ib.txt: output of the server and each client when it fails
> with IB (I added "btl = openib,self" in ~/.openmpi/mca-params.conf)
>
> I hope I provided enough info for somebody to reproduce the problem...
> <ompi-output.tar.bz2>_______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users