Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2005-10-25 10:35:18


Hugh --

We are actually unable to replicate the problem; we've run some
single-threaded and multi-threaded apps with no problems. This is
unfortunately probably symptomatic of bugs that are still remaining in
the code. :-(

Can you try disabling MPI progress threads (I believe that tcp may be
the only BTL component that has async progress support implemented
anyway; sm *may*, but I'd have to go back and check)? Leave MPI threads
enabled (i.e., MPI_THREAD_MULTIPLE) and see if that gets you further.

Hugh Merz wrote:
>>It's still only lightly tested. I'm surprised that it totally hangs for
>>you, though -- what is your simple test program doing?
>
>
> It just initializes mpi (tried both mpi_init and mpi_init_thread), prints
> a string and exits. It works fine without thread support compiled into
> ompi.
>
> It happens with any mpi program I try.
>
> Attaching gdb to each thread of the executable gives:
>
> (original process)
> #0 0x420293d5 in sigsuspend () from /lib/i686/libc.so.6
> #1 0x401e8609 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0
> #2 0x401e4eec in pthread_cond_wait () from /lib/i686/libpthread.so.0
> #3 0x40bda418 in mca_oob_tcp_msg_wait () from /opt/openmpi-1.0rc2_asynch/lib/openmpi/mca_oob_tcp.so
>
> (thread 1)
> #0 0x420e01a7 in poll () from /lib/i686/libc.so.6
> #1 0x401e5c30 in __pthread_manager () from /lib/i686/libpthread.so.0
>
> (thread 2)
> #0 0x420e01a7 in poll () from /lib/i686/libc.so.6
> #1 0x4013268b in poll_dispatch () from /opt/openmpi-1.0rc2_asynch/lib/libopal.so.0
> Cannot access memory at address 0x3e8
>
> (thread 3)
> #0 0x420dae14 in read () from /lib/i686/libc.so.6
> #1 0x401f3b18 in __DTOR_END__ () from /lib/i686/libpthread.so.0
> #2 0x40c8dfe3 in mca_btl_sm_component_event_thread ()
> from /opt/openmpi-1.0rc2_asynch/lib/openmpi/mca_btl_sm.so
>
> And there are also 2 additional threads spawned by each of mpirun and
> orted.
>
> Any clues or hints on how to debug this would be appreciated, but I
> understand that it is probably not high priority right now.
>
> Thanks,
>
> Hugh
>
>
>>Hugh Merz wrote:
>>
>>>Howdy,
>>>
>>> I tried installing the release candidate with thread support
>>>enabled ( --enable-mpi-threads and --enable-progress-threads ) using an
>>>old rh7.3 install and a recent fc4 install (Intel compilers). When I try
>>>to run a simple test program, the executable, mpirun and orted all sleep
>>>in what appears to be a deadlock. If I compile ompi without threads
>>>everything works fine.
>>>
>>> The faq states that thread support has only been lightly tested, and
>>>there was only brief discussion about it in the maillist 8 months ago -
>>>have there been any developments, and should I expect it to work properly?
>>>
>>>Thanks,
>>>
>>>Hugh
>>>_______________________________________________
>>>users mailing list
>>>users_at_[hidden]
>>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>--
>>{+} Jeff Squyres
>>{+} The Open MPI Project
>>{+} http://www.open-mpi.org/
>>_______________________________________________
>>users mailing list
>>users_at_[hidden]
>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/