Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] how to make a process start and then join a MPI group
From: Mark Borgerding (markb_at_[hidden])
Date: 2008-07-27 23:13:45


I got something working, but I'm not 100% sure why.

The children woke up and returned from their calls to
MPI_Intercomm_merge only after
  the parent used the intercomm to send some data to the children via
MPI_Send.

Mark Borgerding wrote:
> Perhaps I am doing something wrong. The childrens' calls to
> MPI_Intercomm_merge never return.
>
> Here's the chronology (with 2 children):
>
> parent calls MPI_Init
> parent calls MPI_Comm_spawn
> child calls MPI_Init
> child calls MPI_Init
> parent call to MPI_Comm_spawn returns
> (long pause inserted)
> parent calls MPI_Intercomm_merge
> child MPI_Init returns
> child calls MPI_Intercomm_merge
> child MPI_Init returns
> child calls MPI_Intercomm_merge
> parent MPI_Intercomm_merge returns
> ... but the child processes never return from the MPI_InterComm_merge
> function.
>
>
> Here are some code snippets:
>
> ############# parent:
>
> MPI_Init(NULL,NULL);
>
> int nkids=2;
> int errs[nkids];
> MPI_Comm kid;
> cerr << "parent calls MPI_Comm_spawn" << endl;
> CHECK_MPI_CODE(
> MPI_Comm_spawn("test_mpi",NULL,nkids,MPI_INFO_NULL,0,MPI_COMM_WORLD,&kid,errs)
> );
> cerr << "parent call to MPI_Comm_spawn returns" << endl;
> for (k=0;k<nkids;++k)
> CHECK_MPI_CODE( errs[k] );
>
> MPI_Comm allmpi;
> cerr << "(long pause)" << endl;
> sleep(3);
> cerr << "parent calls MPI_Intercomm_merge\n";
> CHECK_MPI_CODE( MPI_Intercomm_merge( kid, 0, &allmpi) );
> cerr << "parent MPI_Intercomm_merge returns\n";
>
> ############### child:
>
> fprintf(stderr,"child calls MPI_Init \n");
> CHECK_MPI_CODE( MPI_Init(NULL,NULL) );
> fprintf(stderr,"child MPI_Init returns\n");
>
> MPI_Comm parent;
> CHECK_MPI_CODE( MPI_Comm_get_parent(&parent) );
>
> fprintf(stderr,"child calls MPI_Intercomm_merge \n");
> MPI_Comm allmpi;
> CHECK_MPI_CODE( MPI_Intercomm_merge( parent, 1, &allmpi) );
> fprintf(stderr,"child call to MPI_Intercomm_merge returns\n");
> (the above line never gets executed)
>
>
>
> Aurélien Bouteiller wrote:
>> MPI_Intercomm_merge is what you are looking for.
>>
>> Aurelien
>> Le 26 juil. 08 à 13:23, Mark Borgerding a écrit :
>>
>>> Okay, so I've gotten a little bit closer.
>>>
>>> I'm using MPI_Comm_spawn to start several children processes. The
>>> problem is that the children are in their own group, separate from
>>> the parent (just the like the documentation says). I want to merge
>>> the children's group with the parent group so I can efficiently
>>> Send/Recv data between them..
>>>
>>> Is this possible?
>>>
>>> Plan B: I guess if there is no elegant way to merge all those
>>> processes into one group, I can connect sockets and make intercomms
>>> to talk from the parent directly to each child.
>>>
>>> -- Mark
>>>
>>>
>>>
>>> Mark Borgerding wrote:
>>>> I am writing a code module that plugs into a larger application
>>>> framework. That framework loads my code module as a shared object.
>>>> So I do not control how the first process gets started, but I still
>>>> want it to be able to start and participate in an MPI group.
>>>>
>>>> Here's roughly what I want to happen ( I think):
>>>>
>>>> framework app running (not under my control)
>>>> -> framework loads mycode.so shared object into its process
>>>> -> mycode.so starts mpi programs on several hosts
>>>> (e.g. via system call to mpiexec )
>>>> -> initial mycode.so process participates in the group
>>>> he just started (e.g. he shows up in MPI_Comm_group, can use
>>>> MPI_Send, MPI_Recv, etc. )
>>>>
>>>> Can this be done?
>>>> I am running under Centos 5.2
>>>>
>>>> Thanks,
>>>> Mark
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users