Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Graham, Richard L. (rlgraham_at_[hidden])
Date: 2007-06-11 14:32:33


I would second this - thread safety should be a 1.3 item, unless someone has a lot of spare time.

Rich

-----Original Message-----
From: devel-bounces_at_[hidden] <devel-bounces_at_[hidden]>
To: Open MPI Developers <devel_at_[hidden]>
Sent: Mon Jun 11 10:44:33 2007
Subject: Re: [OMPI devel] threaded builds

On Jun 11, 2007, at 8:25 AM, Jeff Squyres wrote:

> I leave it to the thread subgroup to decide... Should we discuss on
> the call tomorrow?
>
> I don't have a strong opinion; I was just testing both because it was
> easy to do so. If we want to concentrate on the trunk, I can adjust
> my MTT setup.
>

I think trying to worry about 1.2 would just be a time sink. We know
that there are architectural issues with threads in some parts of the
code. I don't see us re-architecting 1.2 in this regard.
Seems we should only focus on the trunk.

- Galen

>
> On Jun 11, 2007, at 10:17 AM, Brian Barrett wrote:
>
>> Yes, this is a known issue. I don't know -- are we trying to make
>> threads work on the 1.2 branch, or just the trunk? I had thought
>> just the trunk?
>>
>> Brian
>>
>>
>> On Jun 11, 2007, at 8:13 AM, Tim Prins wrote:
>>
>>> I had similar problems on the trunk, which was fixed by Brian with
>>> r14877.
>>>
>>> Perhaps 1.2 needs something similar?
>>>
>>> Tim
>>>
>>> On Monday 11 June 2007 10:08:15 am Jeff Squyres wrote:
>>>> Per the teleconf last week, I have started to revamp the Cisco MTT
>>>> infrastructure to do simplistic thread testing. Specifically, I'm
>>>> building the OMPI trunk and v1.2 branches with "--with-threads --
>>>> enable-mpi-threads".
>>>>
>>>> I haven't switched this into my production MTT setup yet, but in
>>>> the
>>>> first trial runs, I'm noticing a segv in the test/threads/
>>>> opal_condition program.
>>>>
>>>> It seems that in the thr1 test on the v1.2 branch, when it calls
>>>> opal_progress() underneath the condition variable wait, at some
>>>> point
>>>> in there current_base is getting to be NULL. Hence, the following
>>>> segv's because the passed in value of "base" is NULL (event.c):
>>>>
>>>> int
>>>> opal_event_base_loop(struct event_base *base, int flags)
>>>> {
>>>> const struct opal_eventop *evsel = base->evsel;
>>>> ...
>>>>
>>>> Here's the full call stack:
>>>>
>>>> #0 0x0000002a955a020e in opal_event_base_loop (base=0x0, flags=5)
>>>> at event.c:520
>>>> #1 0x0000002a955a01f9 in opal_event_loop (flags=5) at event.c:514
>>>> #2 0x0000002a95599111 in opal_progress () at runtime/
>>>> opal_progress.c:
>>>> 259
>>>> #3 0x00000000004012c8 in opal_condition_wait (c=0x5025a0,
>>>> m=0x502600)
>>>> at ../../opal/threads/condition.h:81
>>>> #4 0x0000000000401146 in thr1_run (obj=0x503110) at
>>>> opal_condition.c:46
>>>> #5 0x00000036e290610a in start_thread () from /lib64/tls/
>>>> libpthread.so.0
>>>> #6 0x00000036e1ec68c3 in clone () from /lib64/tls/libc.so.6
>>>> #7 0x0000000000000000 in ?? ()
>>>>
>>>> This test seems to work fine on the trunk (at least, it didn't segv
>>>> in my small number of trail runs).
>>>>
>>>> Is this a known problem in the 1.2 branch? Should I skip the
>>>> thread
>>>> testing on the 1.2 branch and concentrate on the trunk?
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
devel_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/devel