Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] application hangs with multiple dup
From: Thomas Ropars (tropars_at_[hidden])
Date: 2009-09-09 11:44:01


Ashley Pittman wrote:
> On Tue, 2009-09-08 at 15:00 +0200, Thomas Ropars wrote:
>
>> Hi,
>>
>> I'm working on r21949 of the trunk.
>>
>> When I run on a single node with 4 processes this simple program calling
>> 2 times MPI_Comm_dup , the processes hang from time to time in the 2nd dup.
>>
>
> I can't reproduce this, how often does it fail? I've run it in a loop
> hundreds of times here and not had one hang.
>
It happens once every 4 or 5 runs. And it also happens if the processes
are on different nodes.

Here is the ouptut I get from padb -axt :

main() at ?:?
  PMPI_Comm_dup() at pcomm_dup.c:62
    ompi_comm_dup() at communicator/comm.c:661
      -----------------
      [0,2] (2 processes)
      -----------------
      ompi_comm_nextcid() at communicator/comm_cid.c:264
        ompi_comm_allreduce_intra() at communicator/comm_cid.c:619
          ompi_coll_tuned_allreduce_intra_dec_fixed() at
coll_tuned_decision_fixed.c:61
            ompi_coll_tuned_allreduce_intra_recursivedoubling() at
coll_tuned_allreduce.c:223
              ompi_request_default_wait_all() at request/req_wait.c:262
                opal_condition_wait() at ../opal/threads/condition.h:99
      -----------------
      [1,3] (2 processes)
      -----------------
      ompi_comm_nextcid() at communicator/comm_cid.c:245
        ompi_comm_allreduce_intra() at communicator/comm_cid.c:619
          ompi_coll_tuned_allreduce_intra_dec_fixed() at
coll_tuned_decision_fixed.c:61
            ompi_coll_tuned_allreduce_intra_recursivedoubling() at
coll_tuned_allreduce.c:223
              ompi_request_default_wait_all() at request/req_wait.c:262
                opal_condition_wait() at ../opal/threads/condition.h:99

Thomas
> Off-topic I know but this is exactly the type of problem that padb is
> designed to help with, if you could get it to hang and then run "padb
> -axt" in another window on the same node and send along the output I'm
> sure it would be of help.
>
> Ashley,
>
>