Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] Deadlocks with new (routed) orted launch algorithm
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-11-17 09:56:05


We routinely launch across thousands of nodes without a problem...I have never seen it stick in this fashion.

Did you build and/or are using ORTE threaded by any chance? If so, that definitely won't work.

On Nov 17, 2009, at 9:27 AM, Sylvain Jeaugey wrote:

> Hi all,
>
> We are currently experiencing problems at launch on the 1.5 branch on relatively large number of nodes (at least 80). Some processes are not spawned and orted processes are deadlocked.
>
> When MPI processes are calling MPI_Init before send_relay is complete, the send_relay function and the daemon_collective function are doing a nice interlock :
>
> Here is the scenario :
>> send_relay
> performs the send tree :
> > orte_rml_oob_send_buffer
> > orte_rml_oob_send
> > opal_wait_condition
> Waiting on completion from send thus calling opal_progress()
> > opal_progress()
> But since a collective request arrived from the network, entered :
> > daemon_collective
> However, daemon_collective is waiting for the job to be initialized (wait on jobdat->launch_msg_processed) before continuing, thus calling :
> > opal_progress()
>
> At this time, the send may complete, but since we will never go back to orte_rml_oob_send, we will never perform the launch (setting jobdat->launch_msg_processed to 1).
>
> I may try to solve the bug (this is quite a top priority problem for me), but maybe people who are more familiar with orted than I am may propose a nice and clean solution ...
>
> For those who like real (and complete) gdb stacks, here they are :
> #0 0x0000003b7fed4f38 in poll () from /lib64/libc.so.6
> #1 0x00007fd0de5d861a in poll_dispatch (base=0x930230, arg=0x91a4b0, tv=0x7fff0d977880) at poll.c:167
> #2 0x00007fd0de5d586f in opal_event_base_loop (base=0x930230, flags=1) at event.c:823
> #3 0x00007fd0de5d556b in opal_event_loop (flags=1) at event.c:746
> #4 0x00007fd0de5aeb6d in opal_progress () at runtime/opal_progress.c:189
> #5 0x00007fd0dd340a02 in daemon_collective (sender=0x97af50, data=0x97b010) at grpcomm_bad_module.c:696
> #6 0x00007fd0dd341809 in process_msg (fd=-1, opal_event=1, data=0x97af20) at grpcomm_bad_module.c:901
> #7 0x00007fd0de5d5334 in event_process_active (base=0x930230) at event.c:667
> #8 0x00007fd0de5d597a in opal_event_base_loop (base=0x930230, flags=1) at event.c:839
> #9 0x00007fd0de5d556b in opal_event_loop (flags=1) at event.c:746
> #10 0x00007fd0de5aeb6d in opal_progress () at runtime/opal_progress.c:189
> #11 0x00007fd0dd340a02 in daemon_collective (sender=0x979700, data=0x9676e0) at grpcomm_bad_module.c:696
> #12 0x00007fd0dd341809 in process_msg (fd=-1, opal_event=1, data=0x9796d0) at grpcomm_bad_module.c:901
> #13 0x00007fd0de5d5334 in event_process_active (base=0x930230) at event.c:667
> #14 0x00007fd0de5d597a in opal_event_base_loop (base=0x930230, flags=1) at event.c:839
> #15 0x00007fd0de5d556b in opal_event_loop (flags=1) at event.c:746
> #16 0x00007fd0de5aeb6d in opal_progress () at runtime/opal_progress.c:189
> #17 0x00007fd0dd340a02 in daemon_collective (sender=0x97b420, data=0x97b4e0) at grpcomm_bad_module.c:696
> #18 0x00007fd0dd341809 in process_msg (fd=-1, opal_event=1, data=0x97b3f0) at grpcomm_bad_module.c:901
> #19 0x00007fd0de5d5334 in event_process_active (base=0x930230) at event.c:667
> #20 0x00007fd0de5d597a in opal_event_base_loop (base=0x930230, flags=1) at event.c:839
> #21 0x00007fd0de5d556b in opal_event_loop (flags=1) at event.c:746
> #22 0x00007fd0de5aeb6d in opal_progress () at runtime/opal_progress.c:189
> #23 0x00007fd0dd969a8a in opal_condition_wait (c=0x97b210, m=0x97b1a8) at ../../../../opal/threads/condition.h:99
> #24 0x00007fd0dd96a4bf in orte_rml_oob_send (peer=0x7fff0d9785a0, iov=0x7fff0d978530, count=1, tag=1, flags=16) at rml_oob_send.c:153
> #25 0x00007fd0dd96ac4d in orte_rml_oob_send_buffer (peer=0x7fff0d9785a0, buffer=0x7fff0d9786b0, tag=1, flags=0) at rml_oob_send.c:270
> #26 0x00007fd0de86ed2a in send_relay (buf=0x7fff0d9786b0) at orted/orted_comm.c:127
> #27 0x00007fd0de86f6de in orte_daemon_cmd_processor (fd=-1, opal_event=1, data=0x965fc0) at orted/orted_comm.c:308
> #28 0x00007fd0de5d5334 in event_process_active (base=0x930230) at event.c:667
> #29 0x00007fd0de5d597a in opal_event_base_loop (base=0x930230, flags=0) at event.c:839
> #30 0x00007fd0de5d556b in opal_event_loop (flags=0) at event.c:746
> #31 0x00007fd0de5d5418 in opal_event_dispatch () at event.c:682
> #32 0x00007fd0de86e339 in orte_daemon (argc=19, argv=0x7fff0d979ca8) at orted/orted_main.c:769
> #33 0x00000000004008e2 in main (argc=19, argv=0x7fff0d979ca8) at orted.c:62
>
> Thanks in advance,
> Sylvain
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel