Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] how to make a process start and then join a MPI group
From: Mark Borgerding (markb_at_[hidden])
Date: 2008-07-28 15:42:13


I should've been clearer. I have observed the same behavior under both
those versions.
I was not using the two version in the same cluster.

-- Mark

Jeff Squyres wrote:
> Are you mixing both v1.2.4 and v1.2.5 in a single MPI job? That may
> have unintended side-effects -- we unfortunately do not guarantee
> binary compatibility between any of our releases.
>
>
> On Jul 28, 2008, at 10:16 AM, Mark Borgerding wrote:
>
>> I am using version 1.2.4 (Fedora 9) and 1.2.5 ( CentOS 5.2 )
>>
>>
>> A little clarification:
>> The children do not actually wake up when the parent *sends* data to
>> them, but only after the parent tries to receive data from the merged
>> intercomm.
>>
>>
>> Here is the timeline:
>>
>> ...
>> parent call to MPI_Comm_spawn returns
>> parent calls MPI_Intercomm_merge
>> children call to MPI_Init return
>> children call MPI_Intercomm_merge
>> parent MPI_Intercomm_merge returns
>> (long pause inserted via parent sleep)
>> parent sends data to kid 1
>> (long pause inserted via parent sleep)
>> parent starts to receive data from kid 1
>> all children's calls to MPI_Intercomm_merge return
>>
>>
>> -- Mark
>>
>> Aurélien Bouteiller wrote:
>>> Ok, I'll check to see what happens. Which version of Open MPI are
>>> you using ?
>>>
>>> Aurelien
>>>
>>> Le 27 juil. 08 à 23:13, Mark Borgerding a écrit :
>>>
>>>> I got something working, but I'm not 100% sure why.
>>>>
>>>> The children woke up and returned from their calls to
>>>> MPI_Intercomm_merge only after
>>>> the parent used the intercomm to send some data to the children via
>>>> MPI_Send.
>>>>
>>>>
>>>>
>>>> Mark Borgerding wrote:
>>>>> Perhaps I am doing something wrong. The childrens' calls to
>>>>> MPI_Intercomm_merge never return.
>>>>>
>>>>> Here's the chronology (with 2 children):
>>>>>
>>>>> parent calls MPI_Init
>>>>> parent calls MPI_Comm_spawn
>>>>> child calls MPI_Init
>>>>> child calls MPI_Init
>>>>> parent call to MPI_Comm_spawn returns
>>>>> (long pause inserted)
>>>>> parent calls MPI_Intercomm_merge
>>>>> child MPI_Init returns
>>>>> child calls MPI_Intercomm_merge
>>>>> child MPI_Init returns
>>>>> child calls MPI_Intercomm_merge
>>>>> parent MPI_Intercomm_merge returns
>>>>> ... but the child processes never return from the
>>>>> MPI_InterComm_merge function.
>>>>>
>>>>>
>>>>> Here are some code snippets:
>>>>>
>>>>> ############# parent:
>>>>>
>>>>> MPI_Init(NULL,NULL);
>>>>>
>>>>> int nkids=2;
>>>>> int errs[nkids];
>>>>> MPI_Comm kid;
>>>>> cerr << "parent calls MPI_Comm_spawn" << endl;
>>>>> CHECK_MPI_CODE(
>>>>> MPI_Comm_spawn("test_mpi",NULL,nkids,MPI_INFO_NULL,0,MPI_COMM_WORLD,&kid,errs)
>>>>> );
>>>>> cerr << "parent call to MPI_Comm_spawn returns" << endl;
>>>>> for (k=0;k<nkids;++k)
>>>>> CHECK_MPI_CODE( errs[k] );
>>>>>
>>>>> MPI_Comm allmpi;
>>>>> cerr << "(long pause)" << endl;
>>>>> sleep(3);
>>>>> cerr << "parent calls MPI_Intercomm_merge\n";
>>>>> CHECK_MPI_CODE( MPI_Intercomm_merge( kid, 0, &allmpi) );
>>>>> cerr << "parent MPI_Intercomm_merge returns\n";
>>>>>
>>>>> ############### child:
>>>>>
>>>>> fprintf(stderr,"child calls MPI_Init \n");
>>>>> CHECK_MPI_CODE( MPI_Init(NULL,NULL) );
>>>>> fprintf(stderr,"child MPI_Init returns\n");
>>>>>
>>>>> MPI_Comm parent;
>>>>> CHECK_MPI_CODE( MPI_Comm_get_parent(&parent) );
>>>>>
>>>>> fprintf(stderr,"child calls MPI_Intercomm_merge \n");
>>>>> MPI_Comm allmpi;
>>>>> CHECK_MPI_CODE( MPI_Intercomm_merge( parent, 1, &allmpi) );
>>>>> fprintf(stderr,"child call to MPI_Intercomm_merge returns\n");
>>>>> (the above line never gets executed)
>>>>>
>>>>>
>>>>>
>>>>> Aurélien Bouteiller wrote:
>>>>>> MPI_Intercomm_merge is what you are looking for.
>>>>>>
>>>>>> Aurelien
>>>>>> Le 26 juil. 08 à 13:23, Mark Borgerding a écrit :
>>>>>>
>>>>>>> Okay, so I've gotten a little bit closer.
>>>>>>>
>>>>>>> I'm using MPI_Comm_spawn to start several children processes.
>>>>>>> The problem is that the children are in their own group,
>>>>>>> separate from the parent (just the like the documentation
>>>>>>> says). I want to merge the children's group with the parent
>>>>>>> group so I can efficiently Send/Recv data between them..
>>>>>>>
>>>>>>> Is this possible?
>>>>>>>
>>>>>>> Plan B: I guess if there is no elegant way to merge all those
>>>>>>> processes into one group, I can connect sockets and make
>>>>>>> intercomms to talk from the parent directly to each child.
>>>>>>>
>>>>>>> -- Mark
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Mark Borgerding wrote:
>>>>>>>> I am writing a code module that plugs into a larger application
>>>>>>>> framework. That framework loads my code module as a shared
>>>>>>>> object.
>>>>>>>> So I do not control how the first process gets started, but I
>>>>>>>> still want it to be able to start and participate in an MPI group.
>>>>>>>>
>>>>>>>> Here's roughly what I want to happen ( I think):
>>>>>>>>
>>>>>>>> framework app running (not under my control)
>>>>>>>> -> framework loads mycode.so shared object into its process
>>>>>>>> -> mycode.so starts mpi programs on several hosts
>>>>>>>> (e.g. via system call to mpiexec )
>>>>>>>> -> initial mycode.so process participates in the
>>>>>>>> group he just started (e.g. he shows up in MPI_Comm_group, can
>>>>>>>> use MPI_Send, MPI_Recv, etc. )
>>>>>>>>
>>>>>>>> Can this be done?
>>>>>>>> I am running under Centos 5.2
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Mark
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> users_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>>
>>> --
>>> * Dr. Aurélien Bouteiller
>>> * Sr. Research Associate at Innovative Computing Laboratory
>>> * University of Tennessee
>>> * 1122 Volunteer Boulevard, suite 350
>>> * Knoxville, TN 37996
>>> * 865 974 6321
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> --
>> Mark Borgerding
>> 3dB Labs, Inc
>> Innovate. Develop. Deliver.
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>

-- 
Mark Borgerding
3dB Labs, Inc
Innovate.  Develop.  Deliver.