Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] mpirun hangs: "hello" test in single machine
From: Rodrigo Gómez Vázquez (rodrigoggv_at_[hidden])
Date: 2013-04-11 09:32:47

You were right, Ralph. I made a short test turning off the firewall and
MPI ran as predicted. I am taking a look to the firewall rules, to
figure out how to set it up properly, so that it does not interfere with
OpenMPI's functionalities. I will post the required changes in those
settings as soon as I find them out, just in case anyone needs that in
the future.
Thanks again!

On 04/10/2013 10:26 PM, Rodrigo Gómez Vázquez wrote:
> In fact we should have restrictive firewall settings, as long as I
> remember. I will check the rules again tomorrow morning. That's very
> interesting, I would expect such kind of problem if I were working
> with a cluster, but I haven't thought that it might lead also to
> problems for the internal communication in the machine.
> Thanks, Ralph. I'll let you know if this was the actual reason of the
> problem.
> Rodrigo
> On 04/10/2013 09:46 PM, Ralph Castain wrote:
>> Best guess is that there is some issue with getting TCP sockets on
>> the system - once the procs are launched, they need to open a TCP
>> socket and communicate back to mpirun. If the socket is "stuck"
>> waiting to complete the open, things will hang.
>> You might check to ensure there isn't some security setting in place
>> that protects sockets - something like iptables, for example.
>> On Apr 10, 2013, at 11:57 AM, Rodrigo Gómez Vázquez
>> <rodrigoggv_at_[hidden]> wrote:
>>> Hi,
>>> I am having troubles with the program in a simulation server.
>>> The system consists of several processors but all in the same node
>>> (more information of the specs. is in the attachments).
>>> The system is quite new (few months) and a user reported me that it
>>> was not possible to run simulations on multiple processors in parallel.
>>> We are using it for CFD-Simulations with OpenFOAM, which comes along
>>> with an own 1.5.3-version of OpenMPI (for more details you can look
>>> inside the "ThirdParty software folder" following this link:
>>> The OS
>>> is an Ubuntu 12.04 Server distro (see uname.out in the attachments).
>>> He tried to start a simulation in parallel using the following command:
>>> ~: mpirun -np 4 <solver-with-its-corresponding-parameters>
>>> As a result the simulation does not start and there is no error
>>> message. It looks like the program is just waiting/looking for
>>> something. We can see shortly the 4 processes with their PIDs in the
>>> "top" processes list, but only for few tenths of second and with 0%
>>> use of CPU and 0.0% use of memory as well. In order to recover the
>>> command terminal we have to kill the process.
>>> The same happens with the "hello" scripts that come along with the
>>> OpenMPI's sources:
>>> :~$mpicc hello_c.c -o hello
>>> :~$mpirun -np 4 hello
>>> ... and here it hangs again.
>>> I tried to execute other simpler processes, as recommended to check
>>> the installation. Let's see:
>>> :~$mpirun -np 4 hostname
>>> simserver
>>> simserver
>>> simserver
>>> simserver
>>> :~$
>>> Works, as well as "ompi_info" does.
>>> Since we use the same OpenFOAM version without problems in several
>>> computers over ubuntu-based distros, I supposed that there must be
>>> any kind of incompatibility problem, due to the hardware, but...
>>> Anyway, I repeated the tests with the OpenMPI version from the
>>> ubuntu repositories (1.4.3) and got the same result.
>>> It would be wonderful if anyone could give me a hint.
>>> I am afraid, it may result a complicated issue, so please, let me
>>> know whatever relevant information missing.
>>> Thanks in advance, guys
>>> Rodrigo (Europe, GMT+2:00)
>>> <openmpi1.4.3_ompi_info.out.bz2><uname.out><cat_-proc-cpuinfo.out.bz2>_______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]