Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] how to make a process start and then join a MPI group
From: Mark Borgerding (markb_at_[hidden])
Date: 2008-07-28 10:16:59


I am using version 1.2.4 (Fedora 9) and 1.2.5 ( CentOS 5.2 )

A little clarification:
 The children do not actually wake up when the parent *sends* data to
them, but only after the parent tries to receive data from the merged
intercomm.

Here is the timeline:

...
parent call to MPI_Comm_spawn returns
parent calls MPI_Intercomm_merge
children call to MPI_Init return
children call MPI_Intercomm_merge
parent MPI_Intercomm_merge returns
    (long pause inserted via parent sleep)
parent sends data to kid 1
    (long pause inserted via parent sleep)
parent starts to receive data from kid 1
all children's calls to MPI_Intercomm_merge return

-- Mark

Aurélien Bouteiller wrote:
> Ok, I'll check to see what happens. Which version of Open MPI are you
> using ?
>
> Aurelien
>
> Le 27 juil. 08 à 23:13, Mark Borgerding a écrit :
>
>> I got something working, but I'm not 100% sure why.
>>
>> The children woke up and returned from their calls to
>> MPI_Intercomm_merge only after
>> the parent used the intercomm to send some data to the children via
>> MPI_Send.
>>
>>
>>
>> Mark Borgerding wrote:
>>> Perhaps I am doing something wrong. The childrens' calls to
>>> MPI_Intercomm_merge never return.
>>>
>>> Here's the chronology (with 2 children):
>>>
>>> parent calls MPI_Init
>>> parent calls MPI_Comm_spawn
>>> child calls MPI_Init
>>> child calls MPI_Init
>>> parent call to MPI_Comm_spawn returns
>>> (long pause inserted)
>>> parent calls MPI_Intercomm_merge
>>> child MPI_Init returns
>>> child calls MPI_Intercomm_merge
>>> child MPI_Init returns
>>> child calls MPI_Intercomm_merge
>>> parent MPI_Intercomm_merge returns
>>> ... but the child processes never return from the
>>> MPI_InterComm_merge function.
>>>
>>>
>>> Here are some code snippets:
>>>
>>> ############# parent:
>>>
>>> MPI_Init(NULL,NULL);
>>>
>>> int nkids=2;
>>> int errs[nkids];
>>> MPI_Comm kid;
>>> cerr << "parent calls MPI_Comm_spawn" << endl;
>>> CHECK_MPI_CODE(
>>> MPI_Comm_spawn("test_mpi",NULL,nkids,MPI_INFO_NULL,0,MPI_COMM_WORLD,&kid,errs)
>>> );
>>> cerr << "parent call to MPI_Comm_spawn returns" << endl;
>>> for (k=0;k<nkids;++k)
>>> CHECK_MPI_CODE( errs[k] );
>>>
>>> MPI_Comm allmpi;
>>> cerr << "(long pause)" << endl;
>>> sleep(3);
>>> cerr << "parent calls MPI_Intercomm_merge\n";
>>> CHECK_MPI_CODE( MPI_Intercomm_merge( kid, 0, &allmpi) );
>>> cerr << "parent MPI_Intercomm_merge returns\n";
>>>
>>> ############### child:
>>>
>>> fprintf(stderr,"child calls MPI_Init \n");
>>> CHECK_MPI_CODE( MPI_Init(NULL,NULL) );
>>> fprintf(stderr,"child MPI_Init returns\n");
>>>
>>> MPI_Comm parent;
>>> CHECK_MPI_CODE( MPI_Comm_get_parent(&parent) );
>>>
>>> fprintf(stderr,"child calls MPI_Intercomm_merge \n");
>>> MPI_Comm allmpi;
>>> CHECK_MPI_CODE( MPI_Intercomm_merge( parent, 1, &allmpi) );
>>> fprintf(stderr,"child call to MPI_Intercomm_merge returns\n");
>>> (the above line never gets executed)
>>>
>>>
>>>
>>> Aurélien Bouteiller wrote:
>>>> MPI_Intercomm_merge is what you are looking for.
>>>>
>>>> Aurelien
>>>> Le 26 juil. 08 à 13:23, Mark Borgerding a écrit :
>>>>
>>>>> Okay, so I've gotten a little bit closer.
>>>>>
>>>>> I'm using MPI_Comm_spawn to start several children processes. The
>>>>> problem is that the children are in their own group, separate from
>>>>> the parent (just the like the documentation says). I want to
>>>>> merge the children's group with the parent group so I can
>>>>> efficiently Send/Recv data between them..
>>>>>
>>>>> Is this possible?
>>>>>
>>>>> Plan B: I guess if there is no elegant way to merge all those
>>>>> processes into one group, I can connect sockets and make
>>>>> intercomms to talk from the parent directly to each child.
>>>>>
>>>>> -- Mark
>>>>>
>>>>>
>>>>>
>>>>> Mark Borgerding wrote:
>>>>>> I am writing a code module that plugs into a larger application
>>>>>> framework. That framework loads my code module as a shared object.
>>>>>> So I do not control how the first process gets started, but I
>>>>>> still want it to be able to start and participate in an MPI group.
>>>>>>
>>>>>> Here's roughly what I want to happen ( I think):
>>>>>>
>>>>>> framework app running (not under my control)
>>>>>> -> framework loads mycode.so shared object into its process
>>>>>> -> mycode.so starts mpi programs on several hosts
>>>>>> (e.g. via system call to mpiexec )
>>>>>> -> initial mycode.so process participates in the
>>>>>> group he just started (e.g. he shows up in MPI_Comm_group, can
>>>>>> use MPI_Send, MPI_Recv, etc. )
>>>>>>
>>>>>> Can this be done?
>>>>>> I am running under Centos 5.2
>>>>>>
>>>>>> Thanks,
>>>>>> Mark
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> --
> * Dr. Aurélien Bouteiller
> * Sr. Research Associate at Innovative Computing Laboratory
> * University of Tennessee
> * 1122 Volunteer Boulevard, suite 350
> * Knoxville, TN 37996
> * 865 974 6321
>
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Mark Borgerding
3dB Labs, Inc
Innovate.  Develop.  Deliver.