Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] mpirun hangs: "hello" test in single machine
From: Rodrigo Gómez Vázquez (rodrigoggv_at_[hidden])
Date: 2013-04-12 10:43:43


I solved the issue by accepting the input traffic of data packages
through the TCP Ports as long as they are sent "from" and "to" the local
machine. Here is the line I added to the iptables:

     /sbin/iptables -A INPUT --source <actual-IP-address> --destination
<the-same-IP-address> --protocol tcp -j ACCEPT

Just an observation, I am not an expert in the use of iptables, so
PLEASE consider the possible security risks before you copy and paste
this line into your system.

Rodrigo

On 04/11/2013 03:40 PM, Ralph Castain wrote:
> FWIW: I'm working on a rewrite of our out-of-band comm system (it does the wireup that is hanging on your system) that will include a shared memory module. Once that is in place, this problem will go away when running on a single node (still need sockets for multi-node, of course).
>
>
> On Apr 11, 2013, at 6:32 AM, Rodrigo Gómez Vázquez <rodrigoggv_at_[hidden]> wrote:
>
>> You were right, Ralph. I made a short test turning off the firewall and MPI ran as predicted. I am taking a look to the firewall rules, to figure out how to set it up properly, so that it does not interfere with OpenMPI's functionalities. I will post the required changes in those settings as soon as I find them out, just in case anyone needs that in the future.
>> Thanks again!
>> Rodrigo
>>
>> On 04/10/2013 10:26 PM, Rodrigo Gómez Vázquez wrote:
>>> In fact we should have restrictive firewall settings, as long as I remember. I will check the rules again tomorrow morning. That's very interesting, I would expect such kind of problem if I were working with a cluster, but I haven't thought that it might lead also to problems for the internal communication in the machine.
>>>
>>> Thanks, Ralph. I'll let you know if this was the actual reason of the problem.
>>> Rodrigo
>>>
>>> On 04/10/2013 09:46 PM, Ralph Castain wrote:
>>>> Best guess is that there is some issue with getting TCP sockets on the system - once the procs are launched, they need to open a TCP socket and communicate back to mpirun. If the socket is "stuck" waiting to complete the open, things will hang.
>>>>
>>>> You might check to ensure there isn't some security setting in place that protects sockets - something like iptables, for example.
>>>>
>>>>
>>>> On Apr 10, 2013, at 11:57 AM, Rodrigo Gómez Vázquez <rodrigoggv_at_[hidden]> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am having troubles with the program in a simulation server.
>>>>> The system consists of several processors but all in the same node (more information of the specs. is in the attachments).
>>>>> The system is quite new (few months) and a user reported me that it was not possible to run simulations on multiple processors in parallel.
>>>>> We are using it for CFD-Simulations with OpenFOAM, which comes along with an own 1.5.3-version of OpenMPI (for more details you can look inside the "ThirdParty software folder" following this link: http://www.openfoam.org/archive/2.1.1/download/source.php). The OS is an Ubuntu 12.04 Server distro (see uname.out in the attachments).
>>>>> He tried to start a simulation in parallel using the following command:
>>>>>
>>>>> ~: mpirun -np 4 <solver-with-its-corresponding-parameters>
>>>>>
>>>>> As a result the simulation does not start and there is no error message. It looks like the program is just waiting/looking for something. We can see shortly the 4 processes with their PIDs in the "top" processes list, but only for few tenths of second and with 0% use of CPU and 0.0% use of memory as well. In order to recover the command terminal we have to kill the process.
>>>>>
>>>>> The same happens with the "hello" scripts that come along with the OpenMPI's sources:
>>>>>
>>>>> :~$mpicc hello_c.c -o hello
>>>>> :~$mpirun -np 4 hello
>>>>> ... and here it hangs again.
>>>>>
>>>>> I tried to execute other simpler processes, as recommended to check the installation. Let's see:
>>>>>
>>>>> :~$mpirun -np 4 hostname
>>>>> simserver
>>>>> simserver
>>>>> simserver
>>>>> simserver
>>>>> :~$
>>>>>
>>>>> Works, as well as "ompi_info" does.
>>>>>
>>>>> Since we use the same OpenFOAM version without problems in several computers over ubuntu-based distros, I supposed that there must be any kind of incompatibility problem, due to the hardware, but...
>>>>>
>>>>> Anyway, I repeated the tests with the OpenMPI version from the ubuntu repositories (1.4.3) and got the same result.
>>>>>
>>>>> It would be wonderful if anyone could give me a hint.
>>>>>
>>>>> I am afraid, it may result a complicated issue, so please, let me know whatever relevant information missing.
>>>>>
>>>>> Thanks in advance, guys
>>>>>
>>>>> Rodrigo (Europe, GMT+2:00)
>>>>> <openmpi1.4.3_ompi_info.out.bz2><uname.out><cat_-proc-cpuinfo.out.bz2>_______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>