Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] mpirun hangs: "hello" test in single machine
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-04-11 09:40:54


FWIW: I'm working on a rewrite of our out-of-band comm system (it does the wireup that is hanging on your system) that will include a shared memory module. Once that is in place, this problem will go away when running on a single node (still need sockets for multi-node, of course).

On Apr 11, 2013, at 6:32 AM, Rodrigo Gómez Vázquez <rodrigoggv_at_[hidden]> wrote:

> You were right, Ralph. I made a short test turning off the firewall and MPI ran as predicted. I am taking a look to the firewall rules, to figure out how to set it up properly, so that it does not interfere with OpenMPI's functionalities. I will post the required changes in those settings as soon as I find them out, just in case anyone needs that in the future.
> Thanks again!
> Rodrigo
>
> On 04/10/2013 10:26 PM, Rodrigo Gómez Vázquez wrote:
>> In fact we should have restrictive firewall settings, as long as I remember. I will check the rules again tomorrow morning. That's very interesting, I would expect such kind of problem if I were working with a cluster, but I haven't thought that it might lead also to problems for the internal communication in the machine.
>>
>> Thanks, Ralph. I'll let you know if this was the actual reason of the problem.
>> Rodrigo
>>
>> On 04/10/2013 09:46 PM, Ralph Castain wrote:
>>> Best guess is that there is some issue with getting TCP sockets on the system - once the procs are launched, they need to open a TCP socket and communicate back to mpirun. If the socket is "stuck" waiting to complete the open, things will hang.
>>>
>>> You might check to ensure there isn't some security setting in place that protects sockets - something like iptables, for example.
>>>
>>>
>>> On Apr 10, 2013, at 11:57 AM, Rodrigo Gómez Vázquez <rodrigoggv_at_[hidden]> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am having troubles with the program in a simulation server.
>>>> The system consists of several processors but all in the same node (more information of the specs. is in the attachments).
>>>> The system is quite new (few months) and a user reported me that it was not possible to run simulations on multiple processors in parallel.
>>>> We are using it for CFD-Simulations with OpenFOAM, which comes along with an own 1.5.3-version of OpenMPI (for more details you can look inside the "ThirdParty software folder" following this link: http://www.openfoam.org/archive/2.1.1/download/source.php). The OS is an Ubuntu 12.04 Server distro (see uname.out in the attachments).
>>>> He tried to start a simulation in parallel using the following command:
>>>>
>>>> ~: mpirun -np 4 <solver-with-its-corresponding-parameters>
>>>>
>>>> As a result the simulation does not start and there is no error message. It looks like the program is just waiting/looking for something. We can see shortly the 4 processes with their PIDs in the "top" processes list, but only for few tenths of second and with 0% use of CPU and 0.0% use of memory as well. In order to recover the command terminal we have to kill the process.
>>>>
>>>> The same happens with the "hello" scripts that come along with the OpenMPI's sources:
>>>>
>>>> :~$mpicc hello_c.c -o hello
>>>> :~$mpirun -np 4 hello
>>>> ... and here it hangs again.
>>>>
>>>> I tried to execute other simpler processes, as recommended to check the installation. Let's see:
>>>>
>>>> :~$mpirun -np 4 hostname
>>>> simserver
>>>> simserver
>>>> simserver
>>>> simserver
>>>> :~$
>>>>
>>>> Works, as well as "ompi_info" does.
>>>>
>>>> Since we use the same OpenFOAM version without problems in several computers over ubuntu-based distros, I supposed that there must be any kind of incompatibility problem, due to the hardware, but...
>>>>
>>>> Anyway, I repeated the tests with the OpenMPI version from the ubuntu repositories (1.4.3) and got the same result.
>>>>
>>>> It would be wonderful if anyone could give me a hint.
>>>>
>>>> I am afraid, it may result a complicated issue, so please, let me know whatever relevant information missing.
>>>>
>>>> Thanks in advance, guys
>>>>
>>>> Rodrigo (Europe, GMT+2:00)
>>>> <openmpi1.4.3_ompi_info.out.bz2><uname.out><cat_-proc-cpuinfo.out.bz2>_______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users