Per some of my other comments on this thread and on the referenced ticket,
can you tell me what kernel you have on that machine? I assume you have NUMA
support enabled, given that chipset?
On Wed, Jun 10, 2009 at 10:29 AM, Sylvain Jeaugey
> Hum, very glad that padb works with Open MPI, I couldn't live without it.
> In my opinion, the best debug tool for parallel applications, and more
> importantly, the only one that scales.
> About the issue, I couldn't reproduce it on my platform (tried 2 nodes with
> 2 to 8 processes each, nodes are twin 2.93 GHz Nehalem, IB is Mellanox QDR).
> So my feeling about that is that is may be very hardware related.
> Especially if you use the hierarch component, some transactions will be done
> through RDMA on one side and read directly through shared memory on the
> other side, which can, depending on the hardware, produce very different
> timings and bugs. Did you try with a different collective component (i.e.
> not hierarch) ? Or with another interconnect ? [Yes, of course, if it is a
> race condition, we might well avoid the bug because timings will be
> different, but that's still information]
> Perhaps all what I'm saying makes no sense or you already thought about
> this, anyway, if you want me to try different things, just let me know.
> On Wed, 10 Jun 2009, Ralph Castain wrote:
> Hi Ashley
>> Thanks! I would definitely be interested and will look at the tool.
>> Meantime, I have filed a bunch of data on this in
>> ticket #1944, so perhaps you might take a glance at that and offer some
>> Will be back after I look at the tool.
>> Thanks again
>> On Wed, Jun 10, 2009 at 8:51 AM, Ashley Pittman <ashley_at_[hidden]>
>> If I may say this is exactly the type of problem the tool I have been
>> working on recently aims to help with and I'd be happy to help you
>> through it.
>> Firstly I'd say of the three collectives you mention, MPI_Allgather,
>> MPI_Reduce and MPI_Bcast one exhibit a many-to-many, one a
>> and the last a many-to-one communication pattern. The scenario of a
>> root process falling behind and getting swamped in comms is a
>> one for MPI_Reduce only but doesn't hold water with the other two.
>> also don't mention if the loop is over a single collective or if you
>> have loop calling a number of different collectives each iteration.
>> padb, the tool I've been working on has the ability to look at
>> jobs and report on the state of collective comms and should help
>> you down on erroneous processes and those simply blocked waiting for
>> comms. I'd recommend using it to look at maybe four or five
>> where the application has hung and look for any common features
>> Let me know if you are willing to try this route and I'll talk, the
>> is downloadable from http://padb.pittman.org.uk and if you want the
>> collective functionality you'll need to patch openmp with the patch
>> Ashley Pittman, Bath, UK.
>> Padb - A parallel job inspection tool for cluster computing
>> devel mailing list
> devel mailing list