Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand
From: Ralph Castain (rhc_at_[hidden])
Date: 2010-07-20 00:51:15


Well, I finally managed to make this work without the required ompi-server rendezvous point. The fix is only in the devel trunk right now - I'll have to ask the release managers for 1.5 and 1.4 if they want it ported to those series.

On the notion of integrating OMPI to your launch environment: remember that we don't necessarily require that you use mpiexec for that purpose. If your launch environment provides just a little info in the environment of the launched procs, we can usually devise a method that allows the procs to perform an MPI_Init as a single job without all this work you are doing.

Only difference is that your procs will all block in MPI_Init until they -all- have executed that function. If that isn't a problem, this would be a much more scalable and reliable method than doing it thru massive calls to MPI_Port_connect.

On Jul 18, 2010, at 4:09 PM, Philippe wrote:

> Ralph,
>
> thanks for investigating.
>
> I've applied the two patches you mentioned earlier and ran with the
> ompi server. Although i was able to runn our standalone test, when I
> integrated the changes to our code, the processes entered a crazy loop
> and allocated all the memory available when calling MPI_Port_Connect.
> I was not able to identify why it works standalone but not integrated
> with our code. If I found why, I'll let your know.
>
> looking forward to your findings. We'll be happy to test any patches
> if you have some!
>
> p.
>
> On Sat, Jul 17, 2010 at 9:47 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>> Okay, I can reproduce this problem. Frankly, I don't think this ever worked with OMPI, and I'm not sure how the choice of BTL makes a difference.
>>
>> The program is crashing in the communicator definition, which involves a communication over our internal out-of-band messaging system. That system has zero connection to any BTL, so it should crash either way.
>>
>> Regardless, I will play with this a little as time allows. Thanks for the reproducer!
>>
>>
>> On Jun 25, 2010, at 7:23 AM, Philippe wrote:
>>
>>> Hi,
>>>
>>> I'm trying to run a test program which consists of a server creating a
>>> port using MPI_Open_port and N clients using MPI_Comm_connect to
>>> connect to the server.
>>>
>>> I'm able to do so with 1 server and 2 clients, but with 1 server + 3
>>> clients, I get the following error message:
>>>
>>> [node003:32274] [[37084,0],0]:route_callback tried routing message
>>> from [[37084,1],0] to [[40912,1],0]:102, can't find route
>>>
>>> This is only happening with the openib BTL. With tcp BTL it works
>>> perfectly fine (ofud also works as a matter of fact...). This has been
>>> tested on two completely different clusters, with identical results.
>>> In either cases, the IB frabic works normally.
>>>
>>> Any help would be greatly appreciated! Several people in my team
>>> looked at the problem. Google and the mailing list archive did not
>>> provide any clue. I believe that from an MPI standpoint, my test
>>> program is valid (and it works with TCP, which make me feel better
>>> about the sequence of MPI calls)
>>>
>>> Regards,
>>> Philippe.
>>>
>>>
>>>
>>> Background:
>>>
>>> I intend to use openMPI to transport data inside a much larger
>>> application. Because of that, I cannot used mpiexec. Each process is
>>> started by our own "job management" and use a name server to find
>>> about each others. Once all the clients are connected, I would like
>>> the server to do MPI_Recv to get the data from all the client. I dont
>>> care about the order or which client are sending data, as long as I
>>> can receive it with on call. Do do that, the clients and the server
>>> are going through a series of Comm_accept/Conn_connect/Intercomm_merge
>>> so that at the end, all the clients and the server are inside the same
>>> intracomm.
>>>
>>> Steps:
>>>
>>> I have a sample program that show the issue. I tried to make it as
>>> short as possible. It needs to be executed on a shared file system
>>> like NFS because the server write the port info to a file that the
>>> client will read. To reproduce the issue, the following steps should
>>> be performed:
>>>
>>> 0. compile the test with "mpicc -o ben12 ben12.c"
>>> 1. ssh to the machine that will be the server
>>> 2. run ./ben12 3 1
>>> 3. ssh to the machine that will be the client #1
>>> 4. run ./ben12 3 0
>>> 5. repeat step 3-4 for client #2 and #3
>>>
>>> the server accept the connection from client #1 and merge it in a new
>>> intracomm. It then accept connection from client #2 and merge it. when
>>> the client #3 arrives, the server accept the connection, but that
>>> cause client #1 and #2 to die with the error above (see the complete
>>> trace in the tarball).
>>>
>>> The exact steps are:
>>>
>>> - server open port
>>> - server does accept
>>> - client #1 does connect
>>> - server and client #1 do merge
>>> - server does accept
>>> - client #2 does connect
>>> - server, client #1 and client #2 do merge
>>> - server does accept
>>> - client #3 does connect
>>> - server, client #1, client #2 and client #3 do merge
>>>
>>>
>>> My infiniband network works normally with other test programs or
>>> applications (MPI or others like Verbs).
>>>
>>> Info about my setup:
>>>
>>> openMPI version = 1.4.1 (I also tried 1.4.2, nightly snapshot of
>>> 1.4.3, nightly snapshot of 1.5 --- all show the same error)
>>> config.log in the tarball
>>> "ompi_info --all" in the tarball
>>> OFED version = 1.3 installed from RHEL 5.3
>>> Distro = RedHat Entreprise Linux 5.3
>>> Kernel = 2.6.18-128.4.1.el5 x86_64
>>> subnet manager = built-in SM from the cisco/topspin switch
>>> output of ibv_devinfo included in the tarball (there are no "bad" nodes)
>>> "ulimit -l" says "unlimited"
>>>
>>> The tarball contains:
>>>
>>> - ben12.c: my test program showing the behavior
>>> - config.log / config.out / make.out / make-install.out /
>>> ifconfig.txt / ibv-devinfo.txt / ompi_info.txt
>>> - trace-tcp.txt: output of the server and each client when it works
>>> with TCP (I added "btl = tcp,self" in ~/.openmpi/mca-params.conf)
>>> - trace-ib.txt: output of the server and each client when it fails
>>> with IB (I added "btl = openib,self" in ~/.openmpi/mca-params.conf)
>>>
>>> I hope I provided enough info for somebody to reproduce the problem...
>>> <ompi-output.tar.bz2>_______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users