Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Threading
From: Ralph Castain (rhc_at_[hidden])
Date: 2010-10-12 12:30:19


Thanks - that helps clarify a great deal!

I'll keep you posted, pending any further input on the initial question.

FWIW: I'm also using OMPI/ORTE in an embedded environment, so I suspect some of our issues are common.

On Oct 12, 2010, at 9:59 AM, Kenneth Lloyd wrote:

> Ralph,
>
> I think I understand the problem very well. My point is that it is easier
> for us researchers to "bit-twiddle" than to ask accommodation from a more
> "orthodox" implementation. If you believe that an OS threading approach
> better addresses your concerns, then by all means, drop the single threading
> concern. It truly doesn't inconvenience us much at all. Perhaps some
> logical bifurcation point has been reached.
>
> Our work involves a re-visitation of the hwloc and carto modules in new and
> interesting ways. You have touched on a major performance issue - the
> asynchronous nature, not only of message passing and certain RDMA, but of
> the generally asynchronous nature we face in MPP computation across myriad
> hardware platforms (FPGAs, CPUs of various stripes, GPUs, memories, IO hubs,
> HCAs and bridges thereof), not to mention different software and middleware.
> We discovered we were playing "wack-a-mole" or Theory of Constraints in
> optimizing efficiency and effectiveness of the many configurations, given
> the different software stacks (esp. w/ hard-coded task rollouts) and various
> data partitioning schemes. IOW, trust me, we KNOW about hanging.
>
> There are probably several ways of addressing this issue. Ours is not yours.
> When we get some reliable data, we'll be happy to push out a whitepaper
> describing some of the experiments that lead us to our conclusions. That
> way, others can experiment to see which solutions work best for them.
>
> Ken
>
>
>
> -----Original Message-----
> From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]] On
> Behalf Of Ralph Castain
> Sent: Tuesday, October 12, 2010 9:28 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] Threading
>
> I honestly wasn't casting aspersions - just sounds like a very strange
> operational mode. Never heard of something like that before.
>
> The problem is that we continue to have issues with clean termination and
> "hangs", largely because the program counter gets "hung" as we try to work
> with an event-driven system constrained to a single thread. We also have
> performance problems because we cannot progress communications
> asynchronously.
>
> So the movement is to threading mpirun and the orte daemons to solve the
> problems. Maintaining both threaded and unthreaded operations inside a
> single code becomes a study in spaghetti, and so it may prove intractable.
> In that case, I'll "freeze" an unthreaded version at the current level, and
> we'll focus further development on the threaded version.
>
> If we go that route (and that isn't a given yet), then I'll rig the build
> system so configuring without threads generates the unthreaded version, with
> the correct accompanying man page.
>
> HTH
> Ralph
>
>
> On Oct 12, 2010, at 9:15 AM, Kenneth Lloyd wrote:
>
>> Ralph,
>>
>> There is really no need to do anything different to accommodate us
> "oddball"
>> cases. Continue to "do what you do".
>>
>> Ken
>>
>> -----Original Message-----
>> From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]] On
>> Behalf Of Ralph Castain
>> Sent: Tuesday, October 12, 2010 9:01 AM
>> To: Open MPI Developers
>> Subject: Re: [OMPI devel] Threading
>>
>> Hmmm...I don't understand what you just said, but it definitely sounds
>> -ugly-! :-)
>>
>> I'll take your word for it - we may have to provide a lower performance
>> version for such oddball purposes, and offer a higher capability version
> for
>> everyone else. I'll see if I can keep a single version, though, assuming
> the
>> code doesn't get too convoluted so as to become unmaintainable.
>>
>> Otherwise, I'll branch it and "freeze" a non-threaded version for the
>> unusual case.
>>
>> Thanks!
>>
>> On Oct 12, 2010, at 8:51 AM, Kenneth Lloyd wrote:
>>
>>> In certain hybrid, heterogeneous HPC configurations, mpirun often cannot
>> or
>>> should not be threaded through the OS under which OpenMPI runs. The
>> primary
>>> OS and MPI can configure management nodes and topologies (even other MPI
>>> layers) that subsequently spawn various OSes and other lightweight
>> kernels.
>>> These share memory spaces and indirectly access the program stacks in
>>> various devices.
>>>
>>> In short, yes, there are environments where this would cause a problem.
>>>
>>> ==================
>>> Kenneth A. Lloyd
>>> Watt Systems Technologies Inc.
>>>
>>>
>>> -----Original Message-----
>>> From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]] On
>>> Behalf Of Barrett, Brian W
>>> Sent: Tuesday, October 12, 2010 8:24 AM
>>> To: Open MPI Developers
>>> Subject: Re: [OMPI devel] Threading
>>>
>>> On Oct 11, 2010, at 11:41 PM, Ralph Castain wrote:
>>>
>>>> Does anyone know of a reason why mpirun can -not- be threaded, assuming
>>> that all threads block and do not continuously chew cpu? Is there an
>>> environment where this would cause a problem?
>>>
>>> We don't have any machines at Sandia where I could see this being a
>> problem.
>>>
>>> Brian
>>>
>>> --
>>> Brian W. Barrett
>>> Dept. 1423: Scalable System Software
>>> Sandia National Laboratories
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel