Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Simple MPI hello world hangs over IB
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2013-02-11 13:50:28

On Feb 4, 2013, at 10:55 AM, Bharath Ramesh <bramesh_at_[hidden]> wrote:

> I am trying to debug an issue which is really weird. I have
> simple MPI hello world application (attached) that hangs when I
> try to run on our cluster using 256 nodes with 16 cores on each
> node. The cluster uses QDR IB.
> I am able to run the test over ethernet by excluding openib from
> the btl. However, what is weird is that for the same set of nodes
> xhpl completes without any error using 256 nodes and 16 cores. I
> have tried running the Pallas MPI Benchmark and it also behaves
> similarly to hello world and ends up hanging when I run it using
> 256 nodes.

Sorry for the delay; I was on travel all last week and fell behind.

I'm not sure I can parse your scenario description. Are you saying:

- hello world over IB hangs at 256*16 procs
- hello world over TCP works at 256*16 procs
- xhpl over TCP works at 256*16 procs
- IMB over ?TCP|IB? hangs at 256*16 procs

> When I attach gdb to the MPI processes and look at the backtrace
> I see that close ~1000 of the MPI processes are stuck in MPI_Send
> while the others are waiting in MPI_Finalize. I have checked to
> make sure that the ulimit setting for locked memory is unlimited.
> The number of open files per process is 131072. The default MPI
> stack provided is openmpi-1.6.1 on the system. I compiled
> openmpi-1.6.3 in my home directory and the behavior remains to be
> the same.
> I would appreciate any help in debugging this issue.

Can you try the 1.6.4rc?

> --
> Bharath
> <hello_world_mpi.c>_______________________________________________
> users mailing list
> users_at_[hidden]

Jeff Squyres
For corporate legal information go to: