Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand
From: Philippe (philmpi_at_[hidden])
Date: 2010-07-21 09:44:39


Ralph,

Sorry for the late reply -- I was away on vacation.

regarding your earlier question about how many processes where
involved when the memory was entirely allocated, it was only two, a
sender and a receiver. I'm still trying to pinpoint what can be
different between the standalone case and the "integrated" case. I
will try to find out what part of the code is allocating memory in a
loop.

On Tue, Jul 20, 2010 at 12:51 AM, Ralph Castain <rhc_at_[hidden]> wrote:
> Well, I finally managed to make this work without the required ompi-server rendezvous point. The fix is only in the devel trunk right now - I'll have to ask the release managers for 1.5 and 1.4 if they want it ported to those series.
>

great -- i'll give it a try

> On the notion of integrating OMPI to your launch environment: remember that we don't necessarily require that you use mpiexec for that purpose. If your launch environment provides just a little info in the environment of the launched procs, we can usually devise a method that allows the procs to perform an MPI_Init as a single job without all this work you are doing.
>

I'm working on creating operators using MPI for the IBM product
"InfoSphere Streams". It has its own launching mechanism to start the
processes. However I can pass some information to the processes that
belong to the same job (Streams job -- which should neatly map to MPI
job).

> Only difference is that your procs will all block in MPI_Init until they -all- have executed that function. If that isn't a problem, this would be a much more scalable and reliable method than doing it thru massive calls to MPI_Port_connect.
>

in the general case, that would be a problem, but for my prototype,
this is acceptable.

In general, each process is composed of operators, some may be MPI
related and some may not. But in my case, I know ahead of time which
processes will be part of the MPI job, so I can easily deal with the
fact that they would block on MPI_init (actually -- MPI_thread_init
since its using a lot of threads).

Is there a documentation or example I can use to see what information
I can pass to the processes to enable that? Is it just environment
variables?

Many thanks!
p.

>
> On Jul 18, 2010, at 4:09 PM, Philippe wrote:
>
>> Ralph,
>>
>> thanks for investigating.
>>
>> I've applied the two patches you mentioned earlier and ran with the
>> ompi server. Although i was able to runn our standalone test, when I
>> integrated the changes to our code, the processes entered a crazy loop
>> and allocated all the memory available when calling MPI_Port_Connect.
>> I was not able to identify why it works standalone but not integrated
>> with our code. If I found why, I'll let your know.
>>
>> looking forward to your findings. We'll be happy to test any patches
>> if you have some!
>>
>> p.
>>
>> On Sat, Jul 17, 2010 at 9:47 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>> Okay, I can reproduce this problem. Frankly, I don't think this ever worked with OMPI, and I'm not sure how the choice of BTL makes a difference.
>>>
>>> The program is crashing in the communicator definition, which involves a communication over our internal out-of-band messaging system. That system has zero connection to any BTL, so it should crash either way.
>>>
>>> Regardless, I will play with this a little as time allows. Thanks for the reproducer!
>>>
>>>
>>> On Jun 25, 2010, at 7:23 AM, Philippe wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm trying to run a test program which consists of a server creating a
>>>> port using MPI_Open_port and N clients using MPI_Comm_connect to
>>>> connect to the server.
>>>>
>>>> I'm able to do so with 1 server and 2 clients, but with 1 server + 3
>>>> clients, I get the following error message:
>>>>
>>>>   [node003:32274] [[37084,0],0]:route_callback tried routing message
>>>> from [[37084,1],0] to [[40912,1],0]:102, can't find route
>>>>
>>>> This is only happening with the openib BTL. With tcp BTL it works
>>>> perfectly fine (ofud also works as a matter of fact...). This has been
>>>> tested on two completely different clusters, with identical results.
>>>> In either cases, the IB frabic works normally.
>>>>
>>>> Any help would be greatly appreciated! Several people in my team
>>>> looked at the problem. Google and the mailing list archive did not
>>>> provide any clue. I believe that from an MPI standpoint, my test
>>>> program is valid (and it works with TCP, which make me feel better
>>>> about the sequence of MPI calls)
>>>>
>>>> Regards,
>>>> Philippe.
>>>>
>>>>
>>>>
>>>> Background:
>>>>
>>>> I intend to use openMPI to transport data inside a much larger
>>>> application. Because of that, I cannot used mpiexec. Each process is
>>>> started by our own "job management" and use a name server to find
>>>> about each others. Once all the clients are connected, I would like
>>>> the server to do MPI_Recv to get the data from all the client. I dont
>>>> care about the order or which client are sending data, as long as I
>>>> can receive it with on call. Do do that, the clients and the server
>>>> are going through a series of Comm_accept/Conn_connect/Intercomm_merge
>>>> so that at the end, all the clients and the server are inside the same
>>>> intracomm.
>>>>
>>>> Steps:
>>>>
>>>> I have a sample program that show the issue. I tried to make it as
>>>> short as possible. It needs to be executed on a shared file system
>>>> like NFS because the server write the port info to a file that the
>>>> client will read. To reproduce the issue, the following steps should
>>>> be performed:
>>>>
>>>> 0. compile the test with "mpicc -o ben12 ben12.c"
>>>> 1. ssh to the machine that will be the server
>>>> 2. run ./ben12 3 1
>>>> 3. ssh to the machine that will be the client #1
>>>> 4. run ./ben12 3 0
>>>> 5. repeat step 3-4 for client #2 and #3
>>>>
>>>> the server accept the connection from client #1 and merge it in a new
>>>> intracomm. It then accept connection from client #2 and merge it. when
>>>> the client #3 arrives, the server accept the connection, but that
>>>> cause client #1 and #2 to die with the error above (see the complete
>>>> trace in the tarball).
>>>>
>>>> The exact steps are:
>>>>
>>>>     - server open port
>>>>     - server does accept
>>>>     - client #1 does connect
>>>>     - server and client #1 do merge
>>>>     - server does accept
>>>>     - client #2 does connect
>>>>     - server, client #1 and client #2 do merge
>>>>     - server does accept
>>>>     - client #3 does connect
>>>>     - server, client #1, client #2 and client #3 do merge
>>>>
>>>>
>>>> My infiniband network works normally with other test programs or
>>>> applications (MPI or others like Verbs).
>>>>
>>>> Info about my setup:
>>>>
>>>>    openMPI version = 1.4.1 (I also tried 1.4.2, nightly snapshot of
>>>> 1.4.3, nightly snapshot of 1.5 --- all show the same error)
>>>>    config.log in the tarball
>>>>    "ompi_info --all" in the tarball
>>>>    OFED version = 1.3 installed from RHEL 5.3
>>>>    Distro = RedHat Entreprise Linux 5.3
>>>>    Kernel = 2.6.18-128.4.1.el5 x86_64
>>>>    subnet manager = built-in SM from the cisco/topspin switch
>>>>    output of ibv_devinfo included in the tarball (there are no "bad" nodes)
>>>>    "ulimit -l" says "unlimited"
>>>>
>>>> The tarball contains:
>>>>
>>>>   - ben12.c: my test program showing the behavior
>>>>   - config.log / config.out / make.out / make-install.out /
>>>> ifconfig.txt / ibv-devinfo.txt / ompi_info.txt
>>>>   - trace-tcp.txt: output of the server and each client when it works
>>>> with TCP (I added "btl = tcp,self" in ~/.openmpi/mca-params.conf)
>>>>   - trace-ib.txt: output of the server and each client when it fails
>>>> with IB (I added "btl = openib,self" in ~/.openmpi/mca-params.conf)
>>>>
>>>> I hope I provided enough info for somebody to reproduce the problem...
>>>> <ompi-output.tar.bz2>_______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>