Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] how to make a process start and then join a MPI group
From: Mark Borgerding (markb_at_[hidden])
Date: 2008-07-27 22:33:44


Perhaps I am doing something wrong. The childrens' calls to
MPI_Intercomm_merge never return.

Here's the chronology (with 2 children):

parent calls MPI_Init
parent calls MPI_Comm_spawn
child calls MPI_Init
child calls MPI_Init
parent call to MPI_Comm_spawn returns
(long pause inserted)
parent calls MPI_Intercomm_merge
child MPI_Init returns
child calls MPI_Intercomm_merge
child MPI_Init returns
child calls MPI_Intercomm_merge
parent MPI_Intercomm_merge returns
... but the child processes never return from the MPI_InterComm_merge
function.

Here are some code snippets:

############# parent:

    MPI_Init(NULL,NULL);

    int nkids=2;
    int errs[nkids];
    MPI_Comm kid;
    cerr << "parent calls MPI_Comm_spawn" << endl;
    CHECK_MPI_CODE(
MPI_Comm_spawn("test_mpi",NULL,nkids,MPI_INFO_NULL,0,MPI_COMM_WORLD,&kid,errs)
);
    cerr << "parent call to MPI_Comm_spawn returns" << endl;
    for (k=0;k<nkids;++k)
        CHECK_MPI_CODE( errs[k] );

    MPI_Comm allmpi;
    cerr << "(long pause)" << endl;
    sleep(3);
    cerr << "parent calls MPI_Intercomm_merge\n";
    CHECK_MPI_CODE( MPI_Intercomm_merge( kid, 0, &allmpi) );
    cerr << "parent MPI_Intercomm_merge returns\n";

############### child:

    fprintf(stderr,"child calls MPI_Init \n");
    CHECK_MPI_CODE( MPI_Init(NULL,NULL) );
    fprintf(stderr,"child MPI_Init returns\n");

    MPI_Comm parent;
    CHECK_MPI_CODE( MPI_Comm_get_parent(&parent) );

    fprintf(stderr,"child calls MPI_Intercomm_merge \n");
    MPI_Comm allmpi;
    CHECK_MPI_CODE( MPI_Intercomm_merge( parent, 1, &allmpi) );
    fprintf(stderr,"child call to MPI_Intercomm_merge returns\n");
(the above line never gets executed)

Aurélien Bouteiller wrote:
> MPI_Intercomm_merge is what you are looking for.
>
> Aurelien
> Le 26 juil. 08 à 13:23, Mark Borgerding a écrit :
>
>> Okay, so I've gotten a little bit closer.
>>
>> I'm using MPI_Comm_spawn to start several children processes. The
>> problem is that the children are in their own group, separate from
>> the parent (just the like the documentation says). I want to merge
>> the children's group with the parent group so I can efficiently
>> Send/Recv data between them..
>>
>> Is this possible?
>>
>> Plan B: I guess if there is no elegant way to merge all those
>> processes into one group, I can connect sockets and make intercomms
>> to talk from the parent directly to each child.
>>
>> -- Mark
>>
>>
>>
>> Mark Borgerding wrote:
>>> I am writing a code module that plugs into a larger application
>>> framework. That framework loads my code module as a shared object.
>>> So I do not control how the first process gets started, but I still
>>> want it to be able to start and participate in an MPI group.
>>>
>>> Here's roughly what I want to happen ( I think):
>>>
>>> framework app running (not under my control)
>>> -> framework loads mycode.so shared object into its process
>>> -> mycode.so starts mpi programs on several hosts (e.g.
>>> via system call to mpiexec )
>>> -> initial mycode.so process participates in the group
>>> he just started (e.g. he shows up in MPI_Comm_group, can use
>>> MPI_Send, MPI_Recv, etc. )
>>>
>>> Can this be done?
>>> I am running under Centos 5.2
>>>
>>> Thanks,
>>> Mark
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users