Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] MPI_COMM_DUP freeze with OpenMPI 1.4.1
From: francoise.roch_at_[hidden]
Date: 2011-06-08 09:25:30


Thanks for your answer.

Jeff Squyres wrote:
> On May 31, 2011, at 10:55 AM, francoise.roch_at_[hidden] wrote:
>> I reproduced the problem with the following code :
> I'm not sure I can reconcile this statement with your later statements...?
>> I execute the program on 2 nodes of 12 cores each (a total of 24 processes), it doesn't stop.
> Your first statement seems to imply that you got the sample program to hang, but this statement says that it worked fine.
> I am able to run this sample program fine, too. :-\
Sorry for the misunderstanding. When I say that the program is frozen
and it does not stop it means that the program hang at the
"MPI_COMM_DUP" instruction level.

>> Adding the 2 lines above in the code, just before the MPI_COMM_DUP call, I remark that several process have the same rank for COMM_NODES communicator .
>> WRITE(*,*) 'before DUP call myid is ', MYID, 'myid2 is ', MYID2
> That definitely should not be. Can you show the output for this?
Here's the output (the rank 17 is missing and the 22 is twice :

before DUP myid is 1 myid2 is 0
before DUP myid is 2 myid2 is 1
before DUP myid is 3 myid2 is 2
before DUP myid is 4 myid2 is 3
before DUP myid is 5 myid2 is 4
before DUP myid is 6 myid2 is 5
before DUP myid is 7 myid2 is 6
before DUP myid is 8 myid2 is 7
before DUP myid is 9 myid2 is 8
before DUP myid is 10 myid2 is 9
before DUP myid is 11 myid2 is 10
before DUP myid is 12 myid2 is 11
before DUP myid is 13 myid2 is 12
before DUP myid is 14 myid2 is 13
before DUP myid is 15 myid2 is 14
before DUP myid is 16 myid2 is 15
before DUP myid is 17 myid2 is 16
before DUP myid is 18 myid2 is 18
before DUP myid is 19 myid2 is 19
before DUP myid is 20 myid2 is 20
before DUP myid is 21 myid2 is 21
before DUP myid is 22 myid2 is 22
before DUP myid is 23 myid2 is 22

> I put those lines in an I see unique rank values for all processes.
> Are you using the wrong mpif.h,
I have verified the include path and it is ok.
Moreover, I am able to run the program on 2 nodes and a total of 12
tasks (mpirun -np 12) or with 2 nodes with a total of 18 tasks. The rank
values are ok.
But the program hang beyond 18 tasks. And the rank values are not unique
in these cases. It's the same behaviour for 4 nodes, for example.

Best regards
F. Roch