Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] how to make a process start and then join a MPI group
From: Aurélien Bouteiller (bouteill_at_[hidden])
Date: 2008-07-28 10:00:32


Ok, I'll check to see what happens. Which version of Open MPI are you
using ?

Aurelien

Le 27 juil. 08 à 23:13, Mark Borgerding a écrit :

> I got something working, but I'm not 100% sure why.
>
> The children woke up and returned from their calls to
> MPI_Intercomm_merge only after
> the parent used the intercomm to send some data to the children via
> MPI_Send.
>
>
>
> Mark Borgerding wrote:
>> Perhaps I am doing something wrong. The childrens' calls to
>> MPI_Intercomm_merge never return.
>>
>> Here's the chronology (with 2 children):
>>
>> parent calls MPI_Init
>> parent calls MPI_Comm_spawn
>> child calls MPI_Init
>> child calls MPI_Init
>> parent call to MPI_Comm_spawn returns
>> (long pause inserted)
>> parent calls MPI_Intercomm_merge
>> child MPI_Init returns
>> child calls MPI_Intercomm_merge
>> child MPI_Init returns
>> child calls MPI_Intercomm_merge
>> parent MPI_Intercomm_merge returns
>> ... but the child processes never return from the
>> MPI_InterComm_merge function.
>>
>>
>> Here are some code snippets:
>>
>> ############# parent:
>>
>> MPI_Init(NULL,NULL);
>>
>> int nkids=2;
>> int errs[nkids];
>> MPI_Comm kid;
>> cerr << "parent calls MPI_Comm_spawn" << endl;
>>
>> CHECK_MPI_CODE( MPI_Comm_spawn("test_mpi",NULL,nkids,MPI_INFO_NULL,
>> 0,MPI_COMM_WORLD,&kid,errs) );
>> cerr << "parent call to MPI_Comm_spawn returns" << endl;
>> for (k=0;k<nkids;++k)
>> CHECK_MPI_CODE( errs[k] );
>>
>> MPI_Comm allmpi;
>> cerr << "(long pause)" << endl;
>> sleep(3);
>> cerr << "parent calls MPI_Intercomm_merge\n";
>> CHECK_MPI_CODE( MPI_Intercomm_merge( kid, 0, &allmpi) );
>> cerr << "parent MPI_Intercomm_merge returns\n";
>>
>> ############### child:
>>
>> fprintf(stderr,"child calls MPI_Init \n");
>> CHECK_MPI_CODE( MPI_Init(NULL,NULL) );
>> fprintf(stderr,"child MPI_Init returns\n");
>>
>> MPI_Comm parent;
>> CHECK_MPI_CODE( MPI_Comm_get_parent(&parent) );
>>
>> fprintf(stderr,"child calls MPI_Intercomm_merge \n");
>> MPI_Comm allmpi;
>> CHECK_MPI_CODE( MPI_Intercomm_merge( parent, 1, &allmpi) );
>> fprintf(stderr,"child call to MPI_Intercomm_merge returns\n");
>> (the above line never gets executed)
>>
>>
>>
>> Aurélien Bouteiller wrote:
>>> MPI_Intercomm_merge is what you are looking for.
>>>
>>> Aurelien
>>> Le 26 juil. 08 à 13:23, Mark Borgerding a écrit :
>>>
>>>> Okay, so I've gotten a little bit closer.
>>>>
>>>> I'm using MPI_Comm_spawn to start several children processes.
>>>> The problem is that the children are in their own group, separate
>>>> from the parent (just the like the documentation says). I want
>>>> to merge the children's group with the parent group so I can
>>>> efficiently Send/Recv data between them..
>>>>
>>>> Is this possible?
>>>>
>>>> Plan B: I guess if there is no elegant way to merge all those
>>>> processes into one group, I can connect sockets and make
>>>> intercomms to talk from the parent directly to each child.
>>>>
>>>> -- Mark
>>>>
>>>>
>>>>
>>>> Mark Borgerding wrote:
>>>>> I am writing a code module that plugs into a larger application
>>>>> framework. That framework loads my code module as a shared
>>>>> object.
>>>>> So I do not control how the first process gets started, but I
>>>>> still want it to be able to start and participate in an MPI group.
>>>>>
>>>>> Here's roughly what I want to happen ( I think):
>>>>>
>>>>> framework app running (not under my control)
>>>>> -> framework loads mycode.so shared object into its process
>>>>> -> mycode.so starts mpi programs on several hosts
>>>>> (e.g. via system call to mpiexec )
>>>>> -> initial mycode.so process participates in the
>>>>> group he just started (e.g. he shows up in MPI_Comm_group, can
>>>>> use MPI_Send, MPI_Recv, etc. )
>>>>>
>>>>> Can this be done?
>>>>> I am running under Centos 5.2
>>>>>
>>>>> Thanks,
>>>>> Mark
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

--
* Dr. Aurélien Bouteiller
* Sr. Research Associate at Innovative Computing Laboratory
* University of Tennessee
* 1122 Volunteer Boulevard, suite 350
* Knoxville, TN 37996
* 865 974 6321