Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OpenMPI hangs across multiple nodes.
From: George Bosilca (bosilca_at_[hidden])
Date: 2009-02-12 10:46:52


You don't necessarily have to open all ports. Open MPI can use a range
of ports if specified.

The intervals used can be twitched using [btl_tcp_port_min_v4,
btl_tcp_port_range_v4] for the MPI layer and there should be something
similar for our daemons.

   george.

On Feb 11, 2009, at 21:49 , Robertson Burgess wrote:

> My apologies for not changing the subject to something suitable just
> then.
>
> Thankyou for that. I have not yet been able to get the IT department
> to help me with disabling the firewalls, but hopefully that is the
> problem. Sorry for the late response, I was hoping the IT department
> would be faster.
>
> Robertson
>
> Message: 2
> Date: Fri, 6 Feb 2009 17:27:34 -0500
> From: Jeff Squyres <jsquyres_at_[hidden]>
> Subject: Re: [OMPI users] OpenMPI hangs across multiple nodes.
> To: Open MPI Users <users_at_[hidden]>
> Message-ID: <8BA0E4A5-FA7C-430B-8731-231ED6E672BE_at_[hidden]>
> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
>
> Open MPI requires that there be no TCP firewall between hosts that are
> used in a single parallel job -- it uses random TCP ports between
> peers.
>
>
> On Feb 5, 2009, at 2:39 AM, Robertson Burgess wrote:
>
>> I have checked with IT. It is TCP. I have been told that there's a
>> firewall on the nodes. Should I open some ports on the firewall, and
>> if so, which ones?
>>
>> Robertson
>>
>>>>> Robertson Burgess 5/02/2009 5:09 pm >>>
>> Thankyou for your help.
>> I tried the command
>> mpirun -np 4 -host node1,node2 -mca btl tcp,self random
>> but still got the same result.
>>
>> I'm pretty sure that the communication between the nodes is TCP but
>> I'm not sure, I've emailedIT support to ask them, but am yet to hear
>> back from them.
>> Other than that I'm running the latest release of OMPI (1.3) and I
>> installed it on both nodes. And yes they are in the same absolute
>> paths.
>> My configuration was very standard:
>>
>> shell$ gunzip -c openmpi-1.3.tar.gz | tar xf -
>> shell$ cd openmpi-1.3
>> shell$ ./configure CC=icc CXX=icpc F77=ifort FC=ifort --prefix=/
>> home/bburgess/bin/bin
>> shell$ make all install
>>
>> Again thankyou for your help, I'll have to investigate whether my
>> assumption about my connections being TCP are correct. When I was
>> setting it up at first, and before I'd configured the nodes to log
>> into each other without a password, I did get the message
>>
>> user@ node.newcastle.edu.au's password:
>>
>> In my log files, so it did at least seem to be reaching the other
>> node. Does that mean that my connections are working, or could it be
>> more to it than that?
>>
>> Robertson Burgess
>>
>>
>> Message: 2
>> Date: Wed, 4 Feb 2009 15:37:44 +0200
>> From: Lenny Verkhovsky <lenny.verkhovsky_at_[hidden]>
>> Subject: Re: [OMPI users] OpenMPI hangs across multiple nodes.
>> To: Open MPI Users <users_at_[hidden]>
>> Message-ID:
>> <453d39990902040537o45137abbh2f12db423d971eb4_at_[hidden]>
>> Content-Type: text/plain; charset=ISO-8859-1
>>
>> what kind of communication between nodes do you have - tcp, openib (
>> IB/IWARP ) ?
>> you can try
>>
>> mpirun -np 4 -host node1,node2 -mca btl tcp,self random
>>
>>
>>
>> On Wed, Feb 4, 2009 at 1:21 AM, Ralph Castain <rhc_at_[hidden]> wrote:
>>> Could you tell us which version of OpenMPI you are using, and how
>>> it was
>>> configured?
>>>
>>> Did you install the OMPI libraries and binaries on both nodes? Are
>>> they in
>>> the same absolute path locations?
>>>
>>> Thanks
>>> Ralph
>>>
>>>
>>> On Feb 3, 2009, at 3:46 PM, Robertson Burgess wrote:
>>>
>>>> Dear users,
>>>> I am quite new to OpenMPI, I have compiled it on two nodes, each
>>>> node with
>>>> 8 CPU cores. The two nodes are identical. The code I am using
>>>> works in
>>>> parallel across the 8 cores on a single node. However, whenever I
>>>> try to run
>>>> across both nodes, OpenMPI simply hangs. There is no output
>>>> whatsoever, when
>>>> I run it in background, outputting to a log file, the log file is
>>>> always
>>>> empty. The cores do not appear to be doing anything at all, either
>>>> on the
>>>> host node or on the remote node. This happens whether I am running
>>>> my code,
>>>> or even if I when I tell it to run a process that doesn't even
>>>> exist, for
>>>> instance
>>>>
>>>> mpirun -np 4 -host node1,node2 random
>>>>
>>>> Simply results in the terminal hanging, so all I can do is close
>>>> the
>>>> terminal and open up a new one.
>>>>
>>>> mpirun -np 4 -host node1,node2 random >& log.log &
>>>>
>>>> simply produces and empty log.log file
>>>>
>>>> I am running Redhat Linux on the systems, and compiled OpenMPI
>>>> with the
>>>> Intel Compilers 10.1. As I've said, it works fine on one node. I
>>>> have set up
>>>> both nodes such that they can log into each other via ssh without
>>>> the need
>>>> for a password, and I have altered my .bashrc file so the PATH and
>>>> LD_LIBRARY_PATH include the appropriate folders.
>>>> I have looked through the FAQ and mailing lists, but I was unable
>>>> to find
>>>> anything that really matched my problem. Any help would be greatly
>>>> appreciated.
>>>>
>>>> Sincerely,
>>>> Robertson Burgess
>>>> University of Newcastle
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>> **************************************
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> Cisco Systems
>
>
>
> ------------------------------
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users