Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Hang in collectives involving shared memory
From: Sylvain Jeaugey (sylvain.jeaugey_at_[hidden])
Date: 2009-06-10 12:29:01

Hum, very glad that padb works with Open MPI, I couldn't live without it.
In my opinion, the best debug tool for parallel applications, and more
importantly, the only one that scales.

About the issue, I couldn't reproduce it on my platform (tried 2 nodes
with 2 to 8 processes each, nodes are twin 2.93 GHz Nehalem, IB is
Mellanox QDR).

So my feeling about that is that is may be very hardware related.
Especially if you use the hierarch component, some transactions will be
done through RDMA on one side and read directly through shared memory on
the other side, which can, depending on the hardware, produce very
different timings and bugs. Did you try with a different collective
component (i.e. not hierarch) ? Or with another interconnect ? [Yes, of
course, if it is a race condition, we might well avoid the bug because
timings will be different, but that's still information]

Perhaps all what I'm saying makes no sense or you already thought about
this, anyway, if you want me to try different things, just let me know.


On Wed, 10 Jun 2009, Ralph Castain wrote:

> Hi Ashley
> Thanks! I would definitely be interested and will look at the tool. Meantime, I have filed a bunch of data on this in
> ticket #1944, so perhaps you might take a glance at that and offer some thoughts?
> Will be back after I look at the tool.
> Thanks again
> Ralph
> On Wed, Jun 10, 2009 at 8:51 AM, Ashley Pittman <ashley_at_[hidden]> wrote:
> Ralph,
> If I may say this is exactly the type of problem the tool I have been
> working on recently aims to help with and I'd be happy to help you
> through it.
> Firstly I'd say of the three collectives you mention, MPI_Allgather,
> MPI_Reduce and MPI_Bcast one exhibit a many-to-many, one a many-to-one
> and the last a many-to-one communication pattern.  The scenario of a
> root process falling behind and getting swamped in comms is a plausible
> one for MPI_Reduce only but doesn't hold water with the other two.  You
> also don't mention if the loop is over a single collective or if you
> have loop calling a number of different collectives each iteration.
> padb, the tool I've been working on has the ability to look at parallel
> jobs and report on the state of collective comms and should help narrow
> you down on erroneous processes and those simply blocked waiting for
> comms.  I'd recommend using it to look at maybe four or five instances
> where the application has hung and look for any common features between
> them.
> Let me know if you are willing to try this route and I'll talk, the code
> is downloadable from and if you want the full
> collective functionality you'll need to patch openmp with the patch from
> Ashley,
> --
> Ashley Pittman, Bath, UK.
> Padb - A parallel job inspection tool for cluster computing
> _______________________________________________
> devel mailing list
> devel_at_[hidden]