Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand
From: Philippe (philmpi_at_[hidden])
Date: 2010-07-21 09:44:39


Sorry for the late reply -- I was away on vacation.

regarding your earlier question about how many processes where
involved when the memory was entirely allocated, it was only two, a
sender and a receiver. I'm still trying to pinpoint what can be
different between the standalone case and the "integrated" case. I
will try to find out what part of the code is allocating memory in a

On Tue, Jul 20, 2010 at 12:51 AM, Ralph Castain <rhc_at_[hidden]> wrote:
> Well, I finally managed to make this work without the required ompi-server rendezvous point. The fix is only in the devel trunk right now - I'll have to ask the release managers for 1.5 and 1.4 if they want it ported to those series.

great -- i'll give it a try

> On the notion of integrating OMPI to your launch environment: remember that we don't necessarily require that you use mpiexec for that purpose. If your launch environment provides just a little info in the environment of the launched procs, we can usually devise a method that allows the procs to perform an MPI_Init as a single job without all this work you are doing.

I'm working on creating operators using MPI for the IBM product
"InfoSphere Streams". It has its own launching mechanism to start the
processes. However I can pass some information to the processes that
belong to the same job (Streams job -- which should neatly map to MPI

> Only difference is that your procs will all block in MPI_Init until they -all- have executed that function. If that isn't a problem, this would be a much more scalable and reliable method than doing it thru massive calls to MPI_Port_connect.

in the general case, that would be a problem, but for my prototype,
this is acceptable.

In general, each process is composed of operators, some may be MPI
related and some may not. But in my case, I know ahead of time which
processes will be part of the MPI job, so I can easily deal with the
fact that they would block on MPI_init (actually -- MPI_thread_init
since its using a lot of threads).

Is there a documentation or example I can use to see what information
I can pass to the processes to enable that? Is it just environment

Many thanks!

> On Jul 18, 2010, at 4:09 PM, Philippe wrote:
>> Ralph,
>> thanks for investigating.
>> I've applied the two patches you mentioned earlier and ran with the
>> ompi server. Although i was able to runn our standalone test, when I
>> integrated the changes to our code, the processes entered a crazy loop
>> and allocated all the memory available when calling MPI_Port_Connect.
>> I was not able to identify why it works standalone but not integrated
>> with our code. If I found why, I'll let your know.
>> looking forward to your findings. We'll be happy to test any patches
>> if you have some!
>> p.
>> On Sat, Jul 17, 2010 at 9:47 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>> Okay, I can reproduce this problem. Frankly, I don't think this ever worked with OMPI, and I'm not sure how the choice of BTL makes a difference.
>>> The program is crashing in the communicator definition, which involves a communication over our internal out-of-band messaging system. That system has zero connection to any BTL, so it should crash either way.
>>> Regardless, I will play with this a little as time allows. Thanks for the reproducer!
>>> On Jun 25, 2010, at 7:23 AM, Philippe wrote:
>>>> Hi,
>>>> I'm trying to run a test program which consists of a server creating a
>>>> port using MPI_Open_port and N clients using MPI_Comm_connect to
>>>> connect to the server.
>>>> I'm able to do so with 1 server and 2 clients, but with 1 server + 3
>>>> clients, I get the following error message:
>>>>   [node003:32274] [[37084,0],0]:route_callback tried routing message
>>>> from [[37084,1],0] to [[40912,1],0]:102, can't find route
>>>> This is only happening with the openib BTL. With tcp BTL it works
>>>> perfectly fine (ofud also works as a matter of fact...). This has been
>>>> tested on two completely different clusters, with identical results.
>>>> In either cases, the IB frabic works normally.
>>>> Any help would be greatly appreciated! Several people in my team
>>>> looked at the problem. Google and the mailing list archive did not
>>>> provide any clue. I believe that from an MPI standpoint, my test
>>>> program is valid (and it works with TCP, which make me feel better
>>>> about the sequence of MPI calls)
>>>> Regards,
>>>> Philippe.
>>>> Background:
>>>> I intend to use openMPI to transport data inside a much larger
>>>> application. Because of that, I cannot used mpiexec. Each process is
>>>> started by our own "job management" and use a name server to find
>>>> about each others. Once all the clients are connected, I would like
>>>> the server to do MPI_Recv to get the data from all the client. I dont
>>>> care about the order or which client are sending data, as long as I
>>>> can receive it with on call. Do do that, the clients and the server
>>>> are going through a series of Comm_accept/Conn_connect/Intercomm_merge
>>>> so that at the end, all the clients and the server are inside the same
>>>> intracomm.
>>>> Steps:
>>>> I have a sample program that show the issue. I tried to make it as
>>>> short as possible. It needs to be executed on a shared file system
>>>> like NFS because the server write the port info to a file that the
>>>> client will read. To reproduce the issue, the following steps should
>>>> be performed:
>>>> 0. compile the test with "mpicc -o ben12 ben12.c"
>>>> 1. ssh to the machine that will be the server
>>>> 2. run ./ben12 3 1
>>>> 3. ssh to the machine that will be the client #1
>>>> 4. run ./ben12 3 0
>>>> 5. repeat step 3-4 for client #2 and #3
>>>> the server accept the connection from client #1 and merge it in a new
>>>> intracomm. It then accept connection from client #2 and merge it. when
>>>> the client #3 arrives, the server accept the connection, but that
>>>> cause client #1 and #2 to die with the error above (see the complete
>>>> trace in the tarball).
>>>> The exact steps are:
>>>>     - server open port
>>>>     - server does accept
>>>>     - client #1 does connect
>>>>     - server and client #1 do merge
>>>>     - server does accept
>>>>     - client #2 does connect
>>>>     - server, client #1 and client #2 do merge
>>>>     - server does accept
>>>>     - client #3 does connect
>>>>     - server, client #1, client #2 and client #3 do merge
>>>> My infiniband network works normally with other test programs or
>>>> applications (MPI or others like Verbs).
>>>> Info about my setup:
>>>>    openMPI version = 1.4.1 (I also tried 1.4.2, nightly snapshot of
>>>> 1.4.3, nightly snapshot of 1.5 --- all show the same error)
>>>>    config.log in the tarball
>>>>    "ompi_info --all" in the tarball
>>>>    OFED version = 1.3 installed from RHEL 5.3
>>>>    Distro = RedHat Entreprise Linux 5.3
>>>>    Kernel = 2.6.18-128.4.1.el5 x86_64
>>>>    subnet manager = built-in SM from the cisco/topspin switch
>>>>    output of ibv_devinfo included in the tarball (there are no "bad" nodes)
>>>>    "ulimit -l" says "unlimited"
>>>> The tarball contains:
>>>>   - ben12.c: my test program showing the behavior
>>>>   - config.log / config.out / make.out / make-install.out /
>>>> ifconfig.txt / ibv-devinfo.txt / ompi_info.txt
>>>>   - trace-tcp.txt: output of the server and each client when it works
>>>> with TCP (I added "btl = tcp,self" in ~/.openmpi/mca-params.conf)
>>>>   - trace-ib.txt: output of the server and each client when it fails
>>>> with IB (I added "btl = openib,self" in ~/.openmpi/mca-params.conf)
>>>> I hope I provided enough info for somebody to reproduce the problem...
>>>> <ompi-output.tar.bz2>_______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
> _______________________________________________
> users mailing list
> users_at_[hidden]