Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] how to make a process start and then join a MPI group
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-07-28 15:09:24


Are you mixing both v1.2.4 and v1.2.5 in a single MPI job? That may
have unintended side-effects -- we unfortunately do not guarantee
binary compatibility between any of our releases.

On Jul 28, 2008, at 10:16 AM, Mark Borgerding wrote:

> I am using version 1.2.4 (Fedora 9) and 1.2.5 ( CentOS 5.2 )
>
>
> A little clarification:
> The children do not actually wake up when the parent *sends* data to
> them, but only after the parent tries to receive data from the
> merged intercomm.
>
>
> Here is the timeline:
>
> ...
> parent call to MPI_Comm_spawn returns
> parent calls MPI_Intercomm_merge
> children call to MPI_Init return
> children call MPI_Intercomm_merge
> parent MPI_Intercomm_merge returns
> (long pause inserted via parent sleep)
> parent sends data to kid 1
> (long pause inserted via parent sleep)
> parent starts to receive data from kid 1
> all children's calls to MPI_Intercomm_merge return
>
>
> -- Mark
>
> Aurélien Bouteiller wrote:
>> Ok, I'll check to see what happens. Which version of Open MPI are
>> you using ?
>>
>> Aurelien
>>
>> Le 27 juil. 08 à 23:13, Mark Borgerding a écrit :
>>
>>> I got something working, but I'm not 100% sure why.
>>>
>>> The children woke up and returned from their calls to
>>> MPI_Intercomm_merge only after
>>> the parent used the intercomm to send some data to the children
>>> via MPI_Send.
>>>
>>>
>>>
>>> Mark Borgerding wrote:
>>>> Perhaps I am doing something wrong. The childrens' calls to
>>>> MPI_Intercomm_merge never return.
>>>>
>>>> Here's the chronology (with 2 children):
>>>>
>>>> parent calls MPI_Init
>>>> parent calls MPI_Comm_spawn
>>>> child calls MPI_Init
>>>> child calls MPI_Init
>>>> parent call to MPI_Comm_spawn returns
>>>> (long pause inserted)
>>>> parent calls MPI_Intercomm_merge
>>>> child MPI_Init returns
>>>> child calls MPI_Intercomm_merge
>>>> child MPI_Init returns
>>>> child calls MPI_Intercomm_merge
>>>> parent MPI_Intercomm_merge returns
>>>> ... but the child processes never return from the
>>>> MPI_InterComm_merge function.
>>>>
>>>>
>>>> Here are some code snippets:
>>>>
>>>> ############# parent:
>>>>
>>>> MPI_Init(NULL,NULL);
>>>>
>>>> int nkids=2;
>>>> int errs[nkids];
>>>> MPI_Comm kid;
>>>> cerr << "parent calls MPI_Comm_spawn" << endl;
>>>>
>>>> CHECK_MPI_CODE
>>>> ( MPI_Comm_spawn("test_mpi",NULL,nkids,MPI_INFO_NULL,
>>>> 0,MPI_COMM_WORLD,&kid,errs) );
>>>> cerr << "parent call to MPI_Comm_spawn returns" << endl;
>>>> for (k=0;k<nkids;++k)
>>>> CHECK_MPI_CODE( errs[k] );
>>>>
>>>> MPI_Comm allmpi;
>>>> cerr << "(long pause)" << endl;
>>>> sleep(3);
>>>> cerr << "parent calls MPI_Intercomm_merge\n";
>>>> CHECK_MPI_CODE( MPI_Intercomm_merge( kid, 0, &allmpi) );
>>>> cerr << "parent MPI_Intercomm_merge returns\n";
>>>>
>>>> ############### child:
>>>>
>>>> fprintf(stderr,"child calls MPI_Init \n");
>>>> CHECK_MPI_CODE( MPI_Init(NULL,NULL) );
>>>> fprintf(stderr,"child MPI_Init returns\n");
>>>>
>>>> MPI_Comm parent;
>>>> CHECK_MPI_CODE( MPI_Comm_get_parent(&parent) );
>>>>
>>>> fprintf(stderr,"child calls MPI_Intercomm_merge \n");
>>>> MPI_Comm allmpi;
>>>> CHECK_MPI_CODE( MPI_Intercomm_merge( parent, 1, &allmpi) );
>>>> fprintf(stderr,"child call to MPI_Intercomm_merge returns\n");
>>>> (the above line never gets executed)
>>>>
>>>>
>>>>
>>>> Aurélien Bouteiller wrote:
>>>>> MPI_Intercomm_merge is what you are looking for.
>>>>>
>>>>> Aurelien
>>>>> Le 26 juil. 08 à 13:23, Mark Borgerding a écrit :
>>>>>
>>>>>> Okay, so I've gotten a little bit closer.
>>>>>>
>>>>>> I'm using MPI_Comm_spawn to start several children processes.
>>>>>> The problem is that the children are in their own group,
>>>>>> separate from the parent (just the like the documentation
>>>>>> says). I want to merge the children's group with the parent
>>>>>> group so I can efficiently Send/Recv data between them..
>>>>>>
>>>>>> Is this possible?
>>>>>>
>>>>>> Plan B: I guess if there is no elegant way to merge all those
>>>>>> processes into one group, I can connect sockets and make
>>>>>> intercomms to talk from the parent directly to each child.
>>>>>>
>>>>>> -- Mark
>>>>>>
>>>>>>
>>>>>>
>>>>>> Mark Borgerding wrote:
>>>>>>> I am writing a code module that plugs into a larger
>>>>>>> application framework. That framework loads my code module as
>>>>>>> a shared object.
>>>>>>> So I do not control how the first process gets started, but I
>>>>>>> still want it to be able to start and participate in an MPI
>>>>>>> group.
>>>>>>>
>>>>>>> Here's roughly what I want to happen ( I think):
>>>>>>>
>>>>>>> framework app running (not under my control)
>>>>>>> -> framework loads mycode.so shared object into its process
>>>>>>> -> mycode.so starts mpi programs on several hosts
>>>>>>> (e.g. via system call to mpiexec )
>>>>>>> -> initial mycode.so process participates in the
>>>>>>> group he just started (e.g. he shows up in MPI_Comm_group, can
>>>>>>> use MPI_Send, MPI_Recv, etc. )
>>>>>>>
>>>>>>> Can this be done?
>>>>>>> I am running under Centos 5.2
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Mark
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> --
>> * Dr. Aurélien Bouteiller
>> * Sr. Research Associate at Innovative Computing Laboratory
>> * University of Tennessee
>> * 1122 Volunteer Boulevard, suite 350
>> * Knoxville, TN 37996
>> * 865 974 6321
>>
>>
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Mark Borgerding
> 3dB Labs, Inc
> Innovate. Develop. Deliver.
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems