I managed to have a deadlock after a whole night, but not the same you
have : after a quick analysis, process 0 seems to be blocked in the very
first send through shared memory. Still maybe a bug, but not the same as
I also figured out that libnuma support was not in my library, so I
rebuilt the lib and this doesn't seem to change anything : same execution
speed, same memory footprint, and of course same the-bug-does-not-appear
So, no luck so far in reproducing your problem. I guess you're the only
one to be able to progress on this (since you seem to have a real
On Wed, 10 Jun 2009, Sylvain Jeaugey wrote:
> Hum, very glad that padb works with Open MPI, I couldn't live without it. In
> my opinion, the best debug tool for parallel applications, and more
> importantly, the only one that scales.
> About the issue, I couldn't reproduce it on my platform (tried 2 nodes with 2
> to 8 processes each, nodes are twin 2.93 GHz Nehalem, IB is Mellanox QDR).
> So my feeling about that is that is may be very hardware related. Especially
> if you use the hierarch component, some transactions will be done through
> RDMA on one side and read directly through shared memory on the other side,
> which can, depending on the hardware, produce very different timings and
> bugs. Did you try with a different collective component (i.e. not hierarch) ?
> Or with another interconnect ? [Yes, of course, if it is a race condition, we
> might well avoid the bug because timings will be different, but that's still
> Perhaps all what I'm saying makes no sense or you already thought about this,
> anyway, if you want me to try different things, just let me know.
> On Wed, 10 Jun 2009, Ralph Castain wrote:
>> Hi Ashley
>> Thanks! I would definitely be interested and will look at the tool.
>> Meantime, I have filed a bunch of data on this in
>> ticket #1944, so perhaps you might take a glance at that and offer some
>> Will be back after I look at the tool.
>> Thanks again
>> On Wed, Jun 10, 2009 at 8:51 AM, Ashley Pittman <ashley_at_[hidden]>
>> If I may say this is exactly the type of problem the tool I have been
>> working on recently aims to help with and I'd be happy to help you
>> through it.
>> Firstly I'd say of the three collectives you mention, MPI_Allgather,
>> MPI_Reduce and MPI_Bcast one exhibit a many-to-many, one a
>> and the last a many-to-one communication pattern. The scenario of a
>> root process falling behind and getting swamped in comms is a
>> one for MPI_Reduce only but doesn't hold water with the other two.
>> also don't mention if the loop is over a single collective or if you
>> have loop calling a number of different collectives each iteration.
>> padb, the tool I've been working on has the ability to look at
>> jobs and report on the state of collective comms and should help
>> you down on erroneous processes and those simply blocked waiting for
>> comms. I'd recommend using it to look at maybe four or five
>> where the application has hung and look for any common features
>> Let me know if you are willing to try this route and I'll talk, the
>> is downloadable from http://padb.pittman.org.uk and if you want the
>> collective functionality you'll need to patch openmp with the patch
>> Ashley Pittman, Bath, UK.
>> Padb - A parallel job inspection tool for cluster computing
>> devel mailing list