Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Bug btl:tcp with grpcomm:hier
From: George Bosilca (bosilca_at_[hidden])
Date: 2011-03-16 16:25:17


Actually I think that Damien analysis is correct. On a 8 nodes cluster

mpirun -npernode 1 -np 4 --mca grpcomm hier --mca btl self,sm,tcp ./IMB-MPI1 Sendrecv

does work, while

mpirun -npernode 2 -np 4 --mca grpcomm hier --mca btl self,sm,tcp ./IMB-MPI1 Sendrecv

doesn't. As soon as I remove the grpcomm (aka use bad instead) everything works as expected.

I just committed a patch (r24534) to the TCP BTL to output more information and here is what I get when I add --mca btl_base_verbose 100 to the mpirun.

[node02:01565] btl: tcp: attempting to connect() to [[14725,1],0] address 192.168.3.1 on port 1024
[node02:01565] btl: tcp: attempting to connect() to [[14725,1],1] address 192.168.3.1 on port 1024
[node01:31562] btl: tcp: attempting to connect() to [[14725,1],2] address 192.168.3.2 on port 1026
[node01:31561] btl: tcp: attempting to connect() to [[14725,1],2] address 192.168.3.2 on port 1026
[node01:31562] btl: tcp: attempting to connect() to [[14725,1],3] address 192.168.3.2 on port 1026

The "-npernode 2" will place 2 processes per node, so the vpid 0 and 1 will be on node01 and vpid 2 and 3 will be on node02. Looking at the BTL TCP connection attempts one can clearly see that process 01565 on node02 think that both vpid 0 and 1 can be joined using address 192.168.3.1 on port 1024, which is obviously wrong.

As removing the grpcomm hier solves the problem, I would expect the issues is not in the TCP BTL.

  george.

On Mar 16, 2011, at 15:16 , Ralph Castain wrote:

> I suspect something else is wrong - the grpcomm system never has any visibility as to what data goes into the modex, or how that data is used. In other words, if the tcp btl isn't providing adequate info, then it would fail regardless of which grpcomm module was in use. So your statement about the hier module not distinguishing between peers on the same node doesn't make sense - the hier module has no idea that a tcp btl even exists, let alone have anything to do with the modex data.
>
> You might take a look at how the tcp btl is picking its sockets. The srun direct launch method may be setting envars that confuse it, perhaps causing the procs to all pick the same socket.
>
>
> On Mar 16, 2011, at 12:48 PM, Damien Guinier wrote:
>
>> Hi all
>>
>> From my test, it is impossible to use "btl:tcp" with "grpcomm:hier". The "grpcomm:hier" module is important because, "srun" launch protocol can't use any other "grpcomm" module.
>> You can reproduce this bug, by using "btl:tcp" and "grpcomm:hier" , when you create a ring(like: IMB sendrecv)
>>
>> $>salloc -N 2 -n 4 mpirun --mca grpcomm hier --mca btl self,sm,tcp ./IMB-MPI1 Sendrecv
>> salloc: Granted job allocation 2979
>> [cuzco95][[59536,1],2][btl_tcp_endpoint.c:486:mca_btl_tcp_endpoint_recv_connect_ack] received unexpected process identifier [[59536,1],0]
>> [cuzco92][[59536,1],0][btl_tcp_endpoint.c:486:mca_btl_tcp_endpoint_recv_connect_ack] received unexpected process identifier [[59536,1],2]
>> ^C
>> $>
>>
>> This error message show: "btl:tcp" have create a connection to a peer, but it not the good one ( peer identity is checked with the "ack").
>> To create a connection between two peers with "btl:tcp":
>> - Each peer broadcast theirs IP parameters with ompi_modex_send().
>> - IP parameters from selected peer is received with ompi_modex_recv().
>>
>> In fact, modex use "orte_grpcomm.set_proc_attr()" and "orte_grpcomm.get_proc_attr()" to exchange data. The problem is "grpcomm:hier" doesn't make difference between two peer on the same node. From my test the IP parameters, from the fist rank on the selected node, is always return.
>>
>>
>> "grpcomm:hier" is restricted to "btl:sm" and "btl:openib" ?
>>
>>
>> --------
>>
>> One easy solution to fix this problem, is to add rank information in the "name" variable on
>> - ompi/runtime/ompi_module_exchange.c:ompi_modex_send()
>> - ompi/runtime/ompi_module_exchange.c:ompi_modex_recv()
>> but I dislike it.
>>
>> Someone have a better solution ?
>>
>>
>> thanks you
>> Damien
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

"To preserve the freedom of the human mind then and freedom of the press, every spirit should be ready to devote itself to martyrdom; for as long as we may think as we will, and speak as we think, the condition of man will proceed in improvement."
  -- Thomas Jefferson, 1799