Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] mpirun hangs: "hello" test in single machine
From: Rodrigo Gómez Vázquez (rodrigoggv_at_[hidden])
Date: 2013-04-10 14:57:35


I am having troubles with the program in a simulation server.
The system consists of several processors but all in the same node (more
information of the specs. is in the attachments).
The system is quite new (few months) and a user reported me that it was
not possible to run simulations on multiple processors in parallel.
We are using it for CFD-Simulations with OpenFOAM, which comes along
with an own 1.5.3-version of OpenMPI (for more details you can look
inside the "ThirdParty software folder" following this link: The OS is an
Ubuntu 12.04 Server distro (see uname.out in the attachments).
He tried to start a simulation in parallel using the following command:

~: mpirun -np 4 <solver-with-its-corresponding-parameters>

As a result the simulation does not start and there is no error message.
It looks like the program is just waiting/looking for something. We can
see shortly the 4 processes with their PIDs in the "top" processes list,
but only for few tenths of second and with 0% use of CPU and 0.0% use of
memory as well. In order to recover the command terminal we have to kill
the process.

The same happens with the "hello" scripts that come along with the
OpenMPI's sources:

:~$mpicc hello_c.c -o hello
:~$mpirun -np 4 hello
... and here it hangs again.

I tried to execute other simpler processes, as recommended to check the
installation. Let's see:

:~$mpirun -np 4 hostname

Works, as well as "ompi_info" does.

Since we use the same OpenFOAM version without problems in several
computers over ubuntu-based distros, I supposed that there must be any
kind of incompatibility problem, due to the hardware, but...

Anyway, I repeated the tests with the OpenMPI version from the ubuntu
repositories (1.4.3) and got the same result.

It would be wonderful if anyone could give me a hint.

I am afraid, it may result a complicated issue, so please, let me know
whatever relevant information missing.

Thanks in advance, guys

Rodrigo (Europe, GMT+2:00)