Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Hugh Merz (merz_at_[hidden])
Date: 2005-10-26 09:13:18


I've tried with Thread support: posix (mpi: yes, progress: no), using
MPI_THREAD_MULTIPLE and MPI_THREAD_SINGLE and these all hang as well.

Unlike Arnstein I do not find that jobs will work properly running on a
single node.

Hugh

On Tue, 25 Oct 2005, Jeff Squyres wrote:

> Hugh --
>
> We are actually unable to replicate the problem; we've run some
> single-threaded and multi-threaded apps with no problems. This is
> unfortunately probably symptomatic of bugs that are still remaining in
> the code. :-(
>
> Can you try disabling MPI progress threads (I believe that tcp may be
> the only BTL component that has async progress support implemented
> anyway; sm *may*, but I'd have to go back and check)? Leave MPI threads
> enabled (i.e., MPI_THREAD_MULTIPLE) and see if that gets you further.
>
>
>
> Hugh Merz wrote:
>>> It's still only lightly tested. I'm surprised that it totally hangs for
>>> you, though -- what is your simple test program doing?
>>
>>
>> It just initializes mpi (tried both mpi_init and mpi_init_thread), prints
>> a string and exits. It works fine without thread support compiled into
>> ompi.
>>
>> It happens with any mpi program I try.
>>
>> Attaching gdb to each thread of the executable gives:
>>
>> (original process)
>> #0 0x420293d5 in sigsuspend () from /lib/i686/libc.so.6
>> #1 0x401e8609 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0
>> #2 0x401e4eec in pthread_cond_wait () from /lib/i686/libpthread.so.0
>> #3 0x40bda418 in mca_oob_tcp_msg_wait () from /opt/openmpi-1.0rc2_asynch/lib/openmpi/mca_oob_tcp.so
>>
>> (thread 1)
>> #0 0x420e01a7 in poll () from /lib/i686/libc.so.6
>> #1 0x401e5c30 in __pthread_manager () from /lib/i686/libpthread.so.0
>>
>> (thread 2)
>> #0 0x420e01a7 in poll () from /lib/i686/libc.so.6
>> #1 0x4013268b in poll_dispatch () from /opt/openmpi-1.0rc2_asynch/lib/libopal.so.0
>> Cannot access memory at address 0x3e8
>>
>> (thread 3)
>> #0 0x420dae14 in read () from /lib/i686/libc.so.6
>> #1 0x401f3b18 in __DTOR_END__ () from /lib/i686/libpthread.so.0
>> #2 0x40c8dfe3 in mca_btl_sm_component_event_thread ()
>> from /opt/openmpi-1.0rc2_asynch/lib/openmpi/mca_btl_sm.so
>>
>> And there are also 2 additional threads spawned by each of mpirun and
>> orted.
>>
>> Any clues or hints on how to debug this would be appreciated, but I
>> understand that it is probably not high priority right now.
>>
>> Thanks,
>>
>> Hugh
>>
>>
>>> Hugh Merz wrote:
>>>
>>>> Howdy,
>>>>
>>>> I tried installing the release candidate with thread support
>>>> enabled ( --enable-mpi-threads and --enable-progress-threads ) using an
>>>> old rh7.3 install and a recent fc4 install (Intel compilers). When I try
>>>> to run a simple test program, the executable, mpirun and orted all sleep
>>>> in what appears to be a deadlock. If I compile ompi without threads
>>>> everything works fine.
>>>>
>>>> The faq states that thread support has only been lightly tested, and
>>>> there was only brief discussion about it in the maillist 8 months ago -
>>>> have there been any developments, and should I expect it to work properly?
>>>>
>>>> Thanks,
>>>>
>>>> Hugh
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> --
>>> {+} Jeff Squyres
>>> {+} The Open MPI Project
>>> {+} http://www.open-mpi.org/
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> {+} Jeff Squyres
> {+} The Open MPI Project
> {+} http://www.open-mpi.org/
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>