I think I understand the problem very well. My point is that it is easier
for us researchers to "bit-twiddle" than to ask accommodation from a more
"orthodox" implementation. If you believe that an OS threading approach
better addresses your concerns, then by all means, drop the single threading
concern. It truly doesn't inconvenience us much at all. Perhaps some
logical bifurcation point has been reached.
Our work involves a re-visitation of the hwloc and carto modules in new and
interesting ways. You have touched on a major performance issue - the
asynchronous nature, not only of message passing and certain RDMA, but of
the generally asynchronous nature we face in MPP computation across myriad
hardware platforms (FPGAs, CPUs of various stripes, GPUs, memories, IO hubs,
HCAs and bridges thereof), not to mention different software and middleware.
We discovered we were playing "wack-a-mole" or Theory of Constraints in
optimizing efficiency and effectiveness of the many configurations, given
the different software stacks (esp. w/ hard-coded task rollouts) and various
data partitioning schemes. IOW, trust me, we KNOW about hanging.
There are probably several ways of addressing this issue. Ours is not yours.
When we get some reliable data, we'll be happy to push out a whitepaper
describing some of the experiments that lead us to our conclusions. That
way, others can experiment to see which solutions work best for them.
From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]] On
Behalf Of Ralph Castain
Sent: Tuesday, October 12, 2010 9:28 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] Threading
I honestly wasn't casting aspersions - just sounds like a very strange
operational mode. Never heard of something like that before.
The problem is that we continue to have issues with clean termination and
"hangs", largely because the program counter gets "hung" as we try to work
with an event-driven system constrained to a single thread. We also have
performance problems because we cannot progress communications
So the movement is to threading mpirun and the orte daemons to solve the
problems. Maintaining both threaded and unthreaded operations inside a
single code becomes a study in spaghetti, and so it may prove intractable.
In that case, I'll "freeze" an unthreaded version at the current level, and
we'll focus further development on the threaded version.
If we go that route (and that isn't a given yet), then I'll rig the build
system so configuring without threads generates the unthreaded version, with
the correct accompanying man page.
On Oct 12, 2010, at 9:15 AM, Kenneth Lloyd wrote:
> There is really no need to do anything different to accommodate us
> cases. Continue to "do what you do".
> -----Original Message-----
> From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]] On
> Behalf Of Ralph Castain
> Sent: Tuesday, October 12, 2010 9:01 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] Threading
> Hmmm...I don't understand what you just said, but it definitely sounds
> -ugly-! :-)
> I'll take your word for it - we may have to provide a lower performance
> version for such oddball purposes, and offer a higher capability version
> everyone else. I'll see if I can keep a single version, though, assuming
> code doesn't get too convoluted so as to become unmaintainable.
> Otherwise, I'll branch it and "freeze" a non-threaded version for the
> unusual case.
> On Oct 12, 2010, at 8:51 AM, Kenneth Lloyd wrote:
>> In certain hybrid, heterogeneous HPC configurations, mpirun often cannot
>> should not be threaded through the OS under which OpenMPI runs. The
>> OS and MPI can configure management nodes and topologies (even other MPI
>> layers) that subsequently spawn various OSes and other lightweight
>> These share memory spaces and indirectly access the program stacks in
>> various devices.
>> In short, yes, there are environments where this would cause a problem.
>> Kenneth A. Lloyd
>> Watt Systems Technologies Inc.
>> -----Original Message-----
>> From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]] On
>> Behalf Of Barrett, Brian W
>> Sent: Tuesday, October 12, 2010 8:24 AM
>> To: Open MPI Developers
>> Subject: Re: [OMPI devel] Threading
>> On Oct 11, 2010, at 11:41 PM, Ralph Castain wrote:
>>> Does anyone know of a reason why mpirun can -not- be threaded, assuming
>> that all threads block and do not continuously chew cpu? Is there an
>> environment where this would cause a problem?
>> We don't have any machines at Sandia where I could see this being a
>> Brian W. Barrett
>> Dept. 1423: Scalable System Software
>> Sandia National Laboratories
>> devel mailing list
>> devel mailing list
> devel mailing list
> devel mailing list
devel mailing list