Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-06-12 06:45:47


Heh. I don't. :-)

Well, I should specify: since the group is [pretty strongly] leaning
towards threading being the issue for 1.3, then it makes sense to
dedicate my resources elsewhere (rather than 1.2 thread testing).

On Jun 11, 2007, at 2:49 PM, Paul H. Hargrove wrote:

> If Jeff has the resources to run threaded tests against 1.2, *and* to
> examine the results, then it might be valuable to have a summary the
> known threading issues in 1.2 written down somewhere for the
> benefit of
> those who don't chase the trunk.
>
> -Paul
>
> Graham, Richard L. wrote:
>> I would second this - thread safety should be a 1.3 item, unless
>> someone has a lot of spare time.
>>
>> Rich
>>
>> -----Original Message-----
>> From: devel-bounces_at_[hidden] <devel-bounces_at_[hidden]>
>> To: Open MPI Developers <devel_at_[hidden]>
>> Sent: Mon Jun 11 10:44:33 2007
>> Subject: Re: [OMPI devel] threaded builds
>>
>>
>> On Jun 11, 2007, at 8:25 AM, Jeff Squyres wrote:
>>
>>
>>> I leave it to the thread subgroup to decide... Should we discuss on
>>> the call tomorrow?
>>>
>>> I don't have a strong opinion; I was just testing both because it
>>> was
>>> easy to do so. If we want to concentrate on the trunk, I can adjust
>>> my MTT setup.
>>>
>>>
>>
>> I think trying to worry about 1.2 would just be a time sink. We know
>> that there are architectural issues with threads in some parts of the
>> code. I don't see us re-architecting 1.2 in this regard.
>> Seems we should only focus on the trunk.
>>
>>
>> - Galen
>>
>>
>>
>>> On Jun 11, 2007, at 10:17 AM, Brian Barrett wrote:
>>>
>>>
>>>> Yes, this is a known issue. I don't know -- are we trying to make
>>>> threads work on the 1.2 branch, or just the trunk? I had thought
>>>> just the trunk?
>>>>
>>>> Brian
>>>>
>>>>
>>>> On Jun 11, 2007, at 8:13 AM, Tim Prins wrote:
>>>>
>>>>
>>>>> I had similar problems on the trunk, which was fixed by Brian with
>>>>> r14877.
>>>>>
>>>>> Perhaps 1.2 needs something similar?
>>>>>
>>>>> Tim
>>>>>
>>>>> On Monday 11 June 2007 10:08:15 am Jeff Squyres wrote:
>>>>>
>>>>>> Per the teleconf last week, I have started to revamp the Cisco
>>>>>> MTT
>>>>>> infrastructure to do simplistic thread testing. Specifically,
>>>>>> I'm
>>>>>> building the OMPI trunk and v1.2 branches with "--with-threads --
>>>>>> enable-mpi-threads".
>>>>>>
>>>>>> I haven't switched this into my production MTT setup yet, but in
>>>>>> the
>>>>>> first trial runs, I'm noticing a segv in the test/threads/
>>>>>> opal_condition program.
>>>>>>
>>>>>> It seems that in the thr1 test on the v1.2 branch, when it calls
>>>>>> opal_progress() underneath the condition variable wait, at some
>>>>>> point
>>>>>> in there current_base is getting to be NULL. Hence, the
>>>>>> following
>>>>>> segv's because the passed in value of "base" is NULL (event.c):
>>>>>>
>>>>>> int
>>>>>> opal_event_base_loop(struct event_base *base, int flags)
>>>>>> {
>>>>>> const struct opal_eventop *evsel = base->evsel;
>>>>>> ...
>>>>>>
>>>>>> Here's the full call stack:
>>>>>>
>>>>>> #0 0x0000002a955a020e in opal_event_base_loop (base=0x0,
>>>>>> flags=5)
>>>>>> at event.c:520
>>>>>> #1 0x0000002a955a01f9 in opal_event_loop (flags=5) at event.c:
>>>>>> 514
>>>>>> #2 0x0000002a95599111 in opal_progress () at runtime/
>>>>>> opal_progress.c:
>>>>>> 259
>>>>>> #3 0x00000000004012c8 in opal_condition_wait (c=0x5025a0,
>>>>>> m=0x502600)
>>>>>> at ../../opal/threads/condition.h:81
>>>>>> #4 0x0000000000401146 in thr1_run (obj=0x503110) at
>>>>>> opal_condition.c:46
>>>>>> #5 0x00000036e290610a in start_thread () from /lib64/tls/
>>>>>> libpthread.so.0
>>>>>> #6 0x00000036e1ec68c3 in clone () from /lib64/tls/libc.so.6
>>>>>> #7 0x0000000000000000 in ?? ()
>>>>>>
>>>>>> This test seems to work fine on the trunk (at least, it didn't
>>>>>> segv
>>>>>> in my small number of trail runs).
>>>>>>
>>>>>> Is this a known problem in the 1.2 branch? Should I skip the
>>>>>> thread
>>>>>> testing on the 1.2 branch and concentrate on the trunk?
>>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>> --
>>> Jeff Squyres
>>> Cisco Systems
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>
> --
> Paul H. Hargrove PHHargrove_at_[hidden]
> Future Technologies Group
> HPC Research Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
Cisco Systems