Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Brian Barrett (bbarrett_at_[hidden])
Date: 2007-06-11 10:17:46


Yes, this is a known issue. I don't know -- are we trying to make
threads work on the 1.2 branch, or just the trunk? I had thought
just the trunk?

Brian

On Jun 11, 2007, at 8:13 AM, Tim Prins wrote:

> I had similar problems on the trunk, which was fixed by Brian with
> r14877.
>
> Perhaps 1.2 needs something similar?
>
> Tim
>
> On Monday 11 June 2007 10:08:15 am Jeff Squyres wrote:
>> Per the teleconf last week, I have started to revamp the Cisco MTT
>> infrastructure to do simplistic thread testing. Specifically, I'm
>> building the OMPI trunk and v1.2 branches with "--with-threads --
>> enable-mpi-threads".
>>
>> I haven't switched this into my production MTT setup yet, but in the
>> first trial runs, I'm noticing a segv in the test/threads/
>> opal_condition program.
>>
>> It seems that in the thr1 test on the v1.2 branch, when it calls
>> opal_progress() underneath the condition variable wait, at some point
>> in there current_base is getting to be NULL. Hence, the following
>> segv's because the passed in value of "base" is NULL (event.c):
>>
>> int
>> opal_event_base_loop(struct event_base *base, int flags)
>> {
>> const struct opal_eventop *evsel = base->evsel;
>> ...
>>
>> Here's the full call stack:
>>
>> #0 0x0000002a955a020e in opal_event_base_loop (base=0x0, flags=5)
>> at event.c:520
>> #1 0x0000002a955a01f9 in opal_event_loop (flags=5) at event.c:514
>> #2 0x0000002a95599111 in opal_progress () at runtime/
>> opal_progress.c:
>> 259
>> #3 0x00000000004012c8 in opal_condition_wait (c=0x5025a0,
>> m=0x502600)
>> at ../../opal/threads/condition.h:81
>> #4 0x0000000000401146 in thr1_run (obj=0x503110) at
>> opal_condition.c:46
>> #5 0x00000036e290610a in start_thread () from /lib64/tls/
>> libpthread.so.0
>> #6 0x00000036e1ec68c3 in clone () from /lib64/tls/libc.so.6
>> #7 0x0000000000000000 in ?? ()
>>
>> This test seems to work fine on the trunk (at least, it didn't segv
>> in my small number of trail runs).
>>
>> Is this a known problem in the 1.2 branch? Should I skip the thread
>> testing on the 1.2 branch and concentrate on the trunk?
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel