Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] how to make a process start and then join a MPI group
From: Mark Borgerding (markb_at_[hidden])
Date: 2008-07-28 15:52:57


Check.

Parent has high=0
Children have high=1

Jeff Squyres wrote:
> Ok, good.
>
> One thing to check is that you have put different values for the
> "high" value between the parent group and the children group.
>
>
> On Jul 28, 2008, at 3:42 PM, Mark Borgerding wrote:
>
>> I should've been clearer. I have observed the same behavior under
>> both those versions.
>> I was not using the two version in the same cluster.
>>
>> -- Mark
>>
>>
>> Jeff Squyres wrote:
>>> Are you mixing both v1.2.4 and v1.2.5 in a single MPI job? That may
>>> have unintended side-effects -- we unfortunately do not guarantee
>>> binary compatibility between any of our releases.
>>>
>>>
>>> On Jul 28, 2008, at 10:16 AM, Mark Borgerding wrote:
>>>
>>>> I am using version 1.2.4 (Fedora 9) and 1.2.5 ( CentOS 5.2 )
>>>>
>>>>
>>>> A little clarification:
>>>> The children do not actually wake up when the parent *sends* data
>>>> to them, but only after the parent tries to receive data from the
>>>> merged intercomm.
>>>>
>>>>
>>>> Here is the timeline:
>>>>
>>>> ...
>>>> parent call to MPI_Comm_spawn returns
>>>> parent calls MPI_Intercomm_merge
>>>> children call to MPI_Init return
>>>> children call MPI_Intercomm_merge
>>>> parent MPI_Intercomm_merge returns
>>>> (long pause inserted via parent sleep)
>>>> parent sends data to kid 1
>>>> (long pause inserted via parent sleep)
>>>> parent starts to receive data from kid 1
>>>> all children's calls to MPI_Intercomm_merge return
>>>>
>>>>
>>>> -- Mark
>>>>
>>>> Aurélien Bouteiller wrote:
>>>>> Ok, I'll check to see what happens. Which version of Open MPI are
>>>>> you using ?
>>>>>
>>>>> Aurelien
>>>>>
>>>>> Le 27 juil. 08 à 23:13, Mark Borgerding a écrit :
>>>>>
>>>>>> I got something working, but I'm not 100% sure why.
>>>>>>
>>>>>> The children woke up and returned from their calls to
>>>>>> MPI_Intercomm_merge only after
>>>>>> the parent used the intercomm to send some data to the children
>>>>>> via MPI_Send.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Mark Borgerding wrote:
>>>>>>> Perhaps I am doing something wrong. The childrens' calls to
>>>>>>> MPI_Intercomm_merge never return.
>>>>>>>
>>>>>>> Here's the chronology (with 2 children):
>>>>>>>
>>>>>>> parent calls MPI_Init
>>>>>>> parent calls MPI_Comm_spawn
>>>>>>> child calls MPI_Init
>>>>>>> child calls MPI_Init
>>>>>>> parent call to MPI_Comm_spawn returns
>>>>>>> (long pause inserted)
>>>>>>> parent calls MPI_Intercomm_merge
>>>>>>> child MPI_Init returns
>>>>>>> child calls MPI_Intercomm_merge
>>>>>>> child MPI_Init returns
>>>>>>> child calls MPI_Intercomm_merge
>>>>>>> parent MPI_Intercomm_merge returns
>>>>>>> ... but the child processes never return from the
>>>>>>> MPI_InterComm_merge function.
>>>>>>>
>>>>>>>
>>>>>>> Here are some code snippets:
>>>>>>>
>>>>>>> ############# parent:
>>>>>>>
>>>>>>> MPI_Init(NULL,NULL);
>>>>>>>
>>>>>>> int nkids=2;
>>>>>>> int errs[nkids];
>>>>>>> MPI_Comm kid;
>>>>>>> cerr << "parent calls MPI_Comm_spawn" << endl;
>>>>>>> CHECK_MPI_CODE(
>>>>>>> MPI_Comm_spawn("test_mpi",NULL,nkids,MPI_INFO_NULL,0,MPI_COMM_WORLD,&kid,errs)
>>>>>>> );
>>>>>>> cerr << "parent call to MPI_Comm_spawn returns" << endl;
>>>>>>> for (k=0;k<nkids;++k)
>>>>>>> CHECK_MPI_CODE( errs[k] );
>>>>>>>
>>>>>>> MPI_Comm allmpi;
>>>>>>> cerr << "(long pause)" << endl;
>>>>>>> sleep(3);
>>>>>>> cerr << "parent calls MPI_Intercomm_merge\n";
>>>>>>> CHECK_MPI_CODE( MPI_Intercomm_merge( kid, 0, &allmpi) );
>>>>>>> cerr << "parent MPI_Intercomm_merge returns\n";
>>>>>>>
>>>>>>> ############### child:
>>>>>>>
>>>>>>> fprintf(stderr,"child calls MPI_Init \n");
>>>>>>> CHECK_MPI_CODE( MPI_Init(NULL,NULL) );
>>>>>>> fprintf(stderr,"child MPI_Init returns\n");
>>>>>>>
>>>>>>> MPI_Comm parent;
>>>>>>> CHECK_MPI_CODE( MPI_Comm_get_parent(&parent) );
>>>>>>>
>>>>>>> fprintf(stderr,"child calls MPI_Intercomm_merge \n");
>>>>>>> MPI_Comm allmpi;
>>>>>>> CHECK_MPI_CODE( MPI_Intercomm_merge( parent, 1, &allmpi) );
>>>>>>> fprintf(stderr,"child call to MPI_Intercomm_merge returns\n");
>>>>>>> (the above line never gets executed)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Aurélien Bouteiller wrote:
>>>>>>>> MPI_Intercomm_merge is what you are looking for.
>>>>>>>>
>>>>>>>> Aurelien
>>>>>>>> Le 26 juil. 08 à 13:23, Mark Borgerding a écrit :
>>>>>>>>
>>>>>>>>> Okay, so I've gotten a little bit closer.
>>>>>>>>>
>>>>>>>>> I'm using MPI_Comm_spawn to start several children processes.
>>>>>>>>> The problem is that the children are in their own group,
>>>>>>>>> separate from the parent (just the like the documentation
>>>>>>>>> says). I want to merge the children's group with the parent
>>>>>>>>> group so I can efficiently Send/Recv data between them..
>>>>>>>>>
>>>>>>>>> Is this possible?
>>>>>>>>>
>>>>>>>>> Plan B: I guess if there is no elegant way to merge all those
>>>>>>>>> processes into one group, I can connect sockets and make
>>>>>>>>> intercomms to talk from the parent directly to each child.
>>>>>>>>>
>>>>>>>>> -- Mark
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Mark Borgerding wrote:
>>>>>>>>>> I am writing a code module that plugs into a larger
>>>>>>>>>> application framework. That framework loads my code module
>>>>>>>>>> as a shared object.
>>>>>>>>>> So I do not control how the first process gets started, but I
>>>>>>>>>> still want it to be able to start and participate in an MPI
>>>>>>>>>> group.
>>>>>>>>>>
>>>>>>>>>> Here's roughly what I want to happen ( I think):
>>>>>>>>>>
>>>>>>>>>> framework app running (not under my control)
>>>>>>>>>> -> framework loads mycode.so shared object into its process
>>>>>>>>>> -> mycode.so starts mpi programs on several hosts
>>>>>>>>>> (e.g. via system call to mpiexec )
>>>>>>>>>> -> initial mycode.so process participates in the
>>>>>>>>>> group he just started (e.g. he shows up in MPI_Comm_group,
>>>>>>>>>> can use MPI_Send, MPI_Recv, etc. )
>>>>>>>>>>
>>>>>>>>>> Can this be done?
>>>>>>>>>> I am running under Centos 5.2
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Mark
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> users_at_[hidden]
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> users_at_[hidden]
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> users_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> * Dr. Aurélien Bouteiller
>>>>> * Sr. Research Associate at Innovative Computing Laboratory
>>>>> * University of Tennessee
>>>>> * 1122 Volunteer Boulevard, suite 350
>>>>> * Knoxville, TN 37996
>>>>> * 865 974 6321
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>> --
>>>> Mark Borgerding
>>>> 3dB Labs, Inc
>>>> Innovate. Develop. Deliver.
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>
>>
>> --
>> Mark Borgerding
>> 3dB Labs, Inc
>> Innovate. Develop. Deliver.
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>

-- 
Mark Borgerding
3dB Labs, Inc
Innovate.  Develop.  Deliver.