On 12/10/12 11:25 AM, "Ralph Castain" <rhc_at_[hidden]> wrote:
>On Dec 10, 2012, at 10:15 AM, "Barrett, Brian W" <bwbarre_at_[hidden]>
>> On 12/8/12 7:59 PM, "Ralph Castain" <rhc_at_[hidden]> wrote:
>>> WHAT: Enable both OPAL and libevent thread support by default
>>> WHY: We need to support threaded operations for MPI-3, and for
>>> Enabling thread support by default is the only way to
>>> ensure we fix all the problems.
>>> WHEN: COB, Thurs Dec 13
>>> This was a decision reached at the OMPI Developers meeting, so the RFC
>>> mostly just a "heads up" to everyone that this will happen. We spent
>>> time recently profiling the impact on performance and found it to be
>>> significant: 100ns in shared memory latency, and a similar number to
>>> message latency. However, without setting the support "on" by default,
>>> will never address those problems. Thus, the group decided that we
>>> enable support by default and being a concerted effort to reduce and/or
>>> remove the performance impact.
>> Thinking about this on the way home Friday, I'm not sure we need to go
>> quite that far. I think we do want to enable MPI_THREAD_MULTIPLE by
>> default to cause all the locks to be "on" by default. I'm not sure we
>> need to enable progress threads at this point; the question is do we
>> to take a top-down approach, where we turn on the locks all the time for
>> everything (expensive) and pare down what actually needs locking for
>> btl callbacks or do we leave off all the locking by default (when thread
>> count == 1) and only turn on always-lock locks for the code paths that
>> will deal with async callbacks from the BTLs. I'm split on the issue.
>I viewed this in a different light. The question of thread_multiple is a
>separate one. From my perspective, if we say we are going to support
>MPI-3's async progress, then I don't see how we avoid the OPAL thread
>support being "on" all the time.
>Likewise, if the ORTE wireup methods have to support async behavior, then
>we have to build the event lib with thread support.
>So it seems to me that the best path forward is to turn both "on" by
>default, then learn how to live with that situation.
It depends on what you mean by "on". Thread support is always "on" these
days, meaning that opal_mutex_lock does, in fact, have a mutex that
locks/unlocks. The question is what the value of opal_using_threads() is
(i.e., is OPAL_THREAD_LOCK a lock or not?). In some ways, it doesn't need
to be (i.e., attributes still don't require the big attribute lock in
MPI_THREAD_SINGLE). The problem is that because we protect many of the
base data structures internally (like free lists) instead of externally,
it's hard to be thread safe for the small portions of the PML, OSC, and
runtime components that need thread safety for progress without enabling
thread safety in a whole lot of other places.
Brian W. Barrett
Scalable System Software Group
Sandia National Laboratories