Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] RE : RE : RE : Bug when mixing sent types in version 1.6
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2012-06-11 11:36:29


On Jun 11, 2012, at 11:15 AM, BOUVIER Benjamin wrote:

> Thanks for your hints Jeff.
> I've just tried without any firewalls on involved machines, but the issue remains.
>
> # /etc/init.d/ip6tables status
> ip6tables: Firewall is not running.
> # /etc/init.d/iptables status
> iptables: Firewall is not running.

Ok.

> The machines have the host names "node1", "node2" and "node3".
> I launch the basic program on one machine, asking node1 and node2 to be hosts. Typing `netstat -a | grep node1` from node2 shows me that node1 and node2 are connected by tcp, as the connection is marked as ESTABLISHED. I have the same thing when I do `netstat -a | grep node2` from node1. However, the program keeps blocking.

I'm not entirely clear which combinations are working and which are not. Can you specify which ones are working? You might want to try the ring_c.c program in the OMPI examples/ directory -- it's a trivial "send a message around in a ring" program that will scale up to >=2 processes.

- on node1, "mpirun --host node1,node2 ring_c"

- on node1, "mpirun --host node1,node3 ring_c"

- on node1, "mpirun --host node2,node3 ring_c"

- on node1, "mpirun --host node1,node2,node3 ring_c"

Repeat all 4 from node2.

> What else could provoke that failure ?
> --
> Benjamin BOUVIER
>
> ________________________________________
> To start, I would ensure that all firewalling (e.g., iptables) is disabled on all machines involved.
>
> On Jun 11, 2012, at 10:16 AM, BOUVIER Benjamin wrote:
>
>> Hi,
>>
>>> I'd guess that running net pipe with 3 procs may be undefined.
>>
>> It is indeed undefined. Running the net pipe program locally with 3 processors blocks, on my computer.
>>
>> This issue is especially weird as there is no problem for running the example program on network with MPICH2 implementation, for 2 processes.
>>
>> However, with MPICH2, it fails with 3 processes and blocks also on connect ("Connection refused"), which could indicate that it's actually a network issue, with both MPICH2 and OMPI. I don't know how many connections OMPI use to send the data in the example program, but with the assumption that it tries to open 2 connections (while for the same program, MPICH2 only uses one connection, which is another hypothesis), maybe the number of connections is the right way to look for. I'll ask MPICH2 users on their mailing list, so as to get their opinion about it.
>>
>> Now that I know the program doesn't work both with OMPI and MPICH2 implementations, I guess it's not dependant of MPI implementation.
>>
>> If you have any ideas or comments, I would be pleased to hear them.
>>
>> --
>> Benjamin Bouvier
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/