Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-06-11 10:08:15


Per the teleconf last week, I have started to revamp the Cisco MTT
infrastructure to do simplistic thread testing. Specifically, I'm
building the OMPI trunk and v1.2 branches with "--with-threads --
enable-mpi-threads".

I haven't switched this into my production MTT setup yet, but in the
first trial runs, I'm noticing a segv in the test/threads/
opal_condition program.

It seems that in the thr1 test on the v1.2 branch, when it calls
opal_progress() underneath the condition variable wait, at some point
in there current_base is getting to be NULL. Hence, the following
segv's because the passed in value of "base" is NULL (event.c):

int
opal_event_base_loop(struct event_base *base, int flags)
{
         const struct opal_eventop *evsel = base->evsel;
...

Here's the full call stack:

#0 0x0000002a955a020e in opal_event_base_loop (base=0x0, flags=5)
     at event.c:520
#1 0x0000002a955a01f9 in opal_event_loop (flags=5) at event.c:514
#2 0x0000002a95599111 in opal_progress () at runtime/opal_progress.c:
259
#3 0x00000000004012c8 in opal_condition_wait (c=0x5025a0, m=0x502600)
     at ../../opal/threads/condition.h:81
#4 0x0000000000401146 in thr1_run (obj=0x503110) at opal_condition.c:46
#5 0x00000036e290610a in start_thread () from /lib64/tls/
libpthread.so.0
#6 0x00000036e1ec68c3 in clone () from /lib64/tls/libc.so.6
#7 0x0000000000000000 in ?? ()

This test seems to work fine on the trunk (at least, it didn't segv
in my small number of trail runs).

Is this a known problem in the 1.2 branch? Should I skip the thread
testing on the 1.2 branch and concentrate on the trunk?

-- 
Jeff Squyres
Cisco Systems