Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] how to make a process start and then join a MPI group
From: Mark Borgerding (markb_at_[hidden])
Date: 2008-07-27 23:13:45


I got something working, but I'm not 100% sure why.

The children woke up and returned from their calls to
MPI_Intercomm_merge only after
  the parent used the intercomm to send some data to the children via
MPI_Send.

Mark Borgerding wrote:
> Perhaps I am doing something wrong. The childrens' calls to
> MPI_Intercomm_merge never return.
>
> Here's the chronology (with 2 children):
>
> parent calls MPI_Init
> parent calls MPI_Comm_spawn
> child calls MPI_Init
> child calls MPI_Init
> parent call to MPI_Comm_spawn returns
> (long pause inserted)
> parent calls MPI_Intercomm_merge
> child MPI_Init returns
> child calls MPI_Intercomm_merge
> child MPI_Init returns
> child calls MPI_Intercomm_merge
> parent MPI_Intercomm_merge returns
> ... but the child processes never return from the MPI_InterComm_merge
> function.
>
>
> Here are some code snippets:
>
> ############# parent:
>
> MPI_Init(NULL,NULL);
>
> int nkids=2;
> int errs[nkids];
> MPI_Comm kid;
> cerr << "parent calls MPI_Comm_spawn" << endl;
> CHECK_MPI_CODE(
> MPI_Comm_spawn("test_mpi",NULL,nkids,MPI_INFO_NULL,0,MPI_COMM_WORLD,&kid,errs)
> );
> cerr << "parent call to MPI_Comm_spawn returns" << endl;
> for (k=0;k<nkids;++k)
> CHECK_MPI_CODE( errs[k] );
>
> MPI_Comm allmpi;
> cerr << "(long pause)" << endl;
> sleep(3);
> cerr << "parent calls MPI_Intercomm_merge\n";
> CHECK_MPI_CODE( MPI_Intercomm_merge( kid, 0, &allmpi) );
> cerr << "parent MPI_Intercomm_merge returns\n";
>
> ############### child:
>
> fprintf(stderr,"child calls MPI_Init \n");
> CHECK_MPI_CODE( MPI_Init(NULL,NULL) );
> fprintf(stderr,"child MPI_Init returns\n");
>
> MPI_Comm parent;
> CHECK_MPI_CODE( MPI_Comm_get_parent(&parent) );
>
> fprintf(stderr,"child calls MPI_Intercomm_merge \n");
> MPI_Comm allmpi;
> CHECK_MPI_CODE( MPI_Intercomm_merge( parent, 1, &allmpi) );
> fprintf(stderr,"child call to MPI_Intercomm_merge returns\n");
> (the above line never gets executed)
>
>
>
> Aurélien Bouteiller wrote:
>> MPI_Intercomm_merge is what you are looking for.
>>
>> Aurelien
>> Le 26 juil. 08 à 13:23, Mark Borgerding a écrit :
>>
>>> Okay, so I've gotten a little bit closer.
>>>
>>> I'm using MPI_Comm_spawn to start several children processes. The
>>> problem is that the children are in their own group, separate from
>>> the parent (just the like the documentation says). I want to merge
>>> the children's group with the parent group so I can efficiently
>>> Send/Recv data between them..
>>>
>>> Is this possible?
>>>
>>> Plan B: I guess if there is no elegant way to merge all those
>>> processes into one group, I can connect sockets and make intercomms
>>> to talk from the parent directly to each child.
>>>
>>> -- Mark
>>>
>>>
>>>
>>> Mark Borgerding wrote:
>>>> I am writing a code module that plugs into a larger application
>>>> framework. That framework loads my code module as a shared object.
>>>> So I do not control how the first process gets started, but I still
>>>> want it to be able to start and participate in an MPI group.
>>>>
>>>> Here's roughly what I want to happen ( I think):
>>>>
>>>> framework app running (not under my control)
>>>> -> framework loads mycode.so shared object into its process
>>>> -> mycode.so starts mpi programs on several hosts
>>>> (e.g. via system call to mpiexec )
>>>> -> initial mycode.so process participates in the group
>>>> he just started (e.g. he shows up in MPI_Comm_group, can use
>>>> MPI_Send, MPI_Recv, etc. )
>>>>
>>>> Can this be done?
>>>> I am running under Centos 5.2
>>>>
>>>> Thanks,
>>>> Mark
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users