Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2005-10-25 10:35:18


Hugh --

We are actually unable to replicate the problem; we've run some
single-threaded and multi-threaded apps with no problems. This is
unfortunately probably symptomatic of bugs that are still remaining in
the code. :-(

Can you try disabling MPI progress threads (I believe that tcp may be
the only BTL component that has async progress support implemented
anyway; sm *may*, but I'd have to go back and check)? Leave MPI threads
enabled (i.e., MPI_THREAD_MULTIPLE) and see if that gets you further.

Hugh Merz wrote:
>>It's still only lightly tested. I'm surprised that it totally hangs for
>>you, though -- what is your simple test program doing?
>
>
> It just initializes mpi (tried both mpi_init and mpi_init_thread), prints
> a string and exits. It works fine without thread support compiled into
> ompi.
>
> It happens with any mpi program I try.
>
> Attaching gdb to each thread of the executable gives:
>
> (original process)
> #0 0x420293d5 in sigsuspend () from /lib/i686/libc.so.6
> #1 0x401e8609 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0
> #2 0x401e4eec in pthread_cond_wait () from /lib/i686/libpthread.so.0
> #3 0x40bda418 in mca_oob_tcp_msg_wait () from /opt/openmpi-1.0rc2_asynch/lib/openmpi/mca_oob_tcp.so
>
> (thread 1)
> #0 0x420e01a7 in poll () from /lib/i686/libc.so.6
> #1 0x401e5c30 in __pthread_manager () from /lib/i686/libpthread.so.0
>
> (thread 2)
> #0 0x420e01a7 in poll () from /lib/i686/libc.so.6
> #1 0x4013268b in poll_dispatch () from /opt/openmpi-1.0rc2_asynch/lib/libopal.so.0
> Cannot access memory at address 0x3e8
>
> (thread 3)
> #0 0x420dae14 in read () from /lib/i686/libc.so.6
> #1 0x401f3b18 in __DTOR_END__ () from /lib/i686/libpthread.so.0
> #2 0x40c8dfe3 in mca_btl_sm_component_event_thread ()
> from /opt/openmpi-1.0rc2_asynch/lib/openmpi/mca_btl_sm.so
>
> And there are also 2 additional threads spawned by each of mpirun and
> orted.
>
> Any clues or hints on how to debug this would be appreciated, but I
> understand that it is probably not high priority right now.
>
> Thanks,
>
> Hugh
>
>
>>Hugh Merz wrote:
>>
>>>Howdy,
>>>
>>> I tried installing the release candidate with thread support
>>>enabled ( --enable-mpi-threads and --enable-progress-threads ) using an
>>>old rh7.3 install and a recent fc4 install (Intel compilers). When I try
>>>to run a simple test program, the executable, mpirun and orted all sleep
>>>in what appears to be a deadlock. If I compile ompi without threads
>>>everything works fine.
>>>
>>> The faq states that thread support has only been lightly tested, and
>>>there was only brief discussion about it in the maillist 8 months ago -
>>>have there been any developments, and should I expect it to work properly?
>>>
>>>Thanks,
>>>
>>>Hugh
>>>_______________________________________________
>>>users mailing list
>>>users_at_[hidden]
>>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>--
>>{+} Jeff Squyres
>>{+} The Open MPI Project
>>{+} http://www.open-mpi.org/
>>_______________________________________________
>>users mailing list
>>users_at_[hidden]
>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/