Executing a new modex is a little bit extreme. We can solve this by an approach similar to what we use for connect/accept.

Step 1. the two leaders of the groups prepare a buffer with all the modex information they have about their own group peers (if they don't have it they can do a gather).

Step 2. the leaders create the communicator, and they exchange the buffer prepared at 1 with each other.

Step 3. They broadcast this buffer to each local process, which will then include it in their known local modex information.

I'll came up with the code next week.


On Oct 25, 2011, at 11:19 , Ralph Castain wrote:

This problem resurfaced on the user list, so I dug around a bit and think I've figured it out using George's test code. The problem lies in the fact that the intercomm "merge" function can create a linkage between procs that was not reflected anywhere in a modex, and so at least some of the procs in the resulting communicator don't know how to talk to some of the new communicator's peers.

For example, consider the case where:

1. parent job A comm_spawns a process (job B) - these processes exchange modex and can communicate

2. parent job A now comm_spawns another process (job C) - again, these can communicate, but the proc in C knows nothing of B

3. do an intercomm merge across the communicators created by the two comm_spawns. This puts B and C into the same communicator, but they know nothing about how to talk to each other as they were not involved in any exchange of contact info. Hence, collectives on that communicator now fail.

I tried adding all known contact info (not just your own) into the modex, but that doesn't resolve the problem. It resulted in C knowing how to talk to B (because A knew when the comm_spawn was done), but B still has no idea how to talk to C as it didn't participate in the modex associated with step 2.

It seems to me that the solution is to have intercomm "merge" actually execute a modex to ensure that all procs in the new communicator know how to communicate with each other, but I readily admit I might be missing something.

Anyone have thoughts on this? It has come up twice now, so probably something worth addressing.

Begin forwarded message:

From: Ralph Castain <rhc@open-mpi.org>
Date: October 25, 2011 10:08:00 AM MDT
To: Open MPI Users <users@open-mpi.org>
Subject: Re: [OMPI users] Problem-Bug with MPI_Intercomm_create()

FWIW: I have tracked this problem down. The fix is a little more complicated then I'd like, so I'm going to have to ping some other folks to ensure we concur on the approach before doing something.

On Oct 25, 2011, at 8:20 AM, Ralph Castain wrote:

I still see it failing the test George provided on the trunk. I'm unaware of anyone looking further into it, though, as the prior discussion seemed to just end.

On Oct 25, 2011, at 7:01 AM, orel wrote:


I try from several days to use advanced MPI2 features in the following scenario :

1) a master code A (of size NPA) spawns (MPI_Comm_spawn()) two slave
  codes B (of size NPB) and C (of size NPC), providing intercomms A-B and A-C ;
2) i create intracomm AB and AC by merging intercomms ;
3) then i create intercomm AB-C by calling MPI_Intercomm_create() by using AC as bridge...

 MPI_Comm intercommABC; A: MPI_Intercomm_create(intracommAB, 0, intracommAC, NPA, TAG,&intercommABC);
B: MPI_Intercomm_create(intracommAB, 0, MPI_COMM_NULL, 0,TAG,&intercommABC);
C: MPI_Intercomm_create(intracommC, 0, intracommAC, 0, TAG,&intercommABC);

   In these calls, A0 and C0 play the role of local leader for AB and C respectively.
   C0 and A0 play the roles of remote leader in bridge intracomm AC.

3)  MPI_Barrier(intercommABC);
4)  i merge intercomm AB-C into intracomm ABC$
5)  MPI_Barrier(intracommABC);

My BUG: These calls success, but when i try to use intracommABC for a collective communication like MPI_Barrier(),
            i got the following error :

*** An error occurred in MPI_Barrier
*** on communicator
*** MPI_ERR_INTERN: internal error
*** MPI_ERRORS_ARE_FATAL: your MPI job will now abort

I try with OpenMPI trunk, 1.5.3, 1.5.4 and Mpich2-1.4.1p1

My code works perfectly if intracomm A, B and C are obtained by MPI_Comm_split() instead of MPI_Comm_spawn() !!!!

I found same problem in a previous thread of the OMPI Users mailing list :

=> http://www.open-mpi.org/community/lists/users/2011/06/16711.php

Is that bug/problem is currently under investigation ? :-)

i can give detailed code, but the one provided by George Bosilca in this previous thread provides same error...

Thank you to help me...

Aurélien Esnard
University Bordeaux 1 / LaBRI / INRIA (France)
users mailing list

devel mailing list