Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] how to make a process start and then join a MPI group
From: Aurélien Bouteiller (bouteill_at_[hidden])
Date: 2008-07-28 16:56:25


Having different values is fine for high parameter.

I think the problem comes from using NULL, NULL instead of &argc,
&argv as parameters for MPI_Init. This toy application works for me on
trunk. If you still experience troubles on 1.2, please let us know.

**********************
intercomm_merge_parent.c

#include <stdlib.h>
#include <stdio.h>
#include <mpi.h>

#define NKIDS 3

#define CHECK_MPI_CODE(expr) do { \
     int ret = expr; \
     if(MPI_SUCCESS != ret) { \
         printf("ERROR %d\tat line %d\n", ret, __LINE__); \
         return -ret; \
     } \
} while(0)

int main(int argc, char *argv[])
{
     int errs[NKIDS];
     MPI_Comm kids;
     MPI_Comm allmpi;
     int k;

     MPI_Init(&argc, &argv);

     printf("Parent Calls MPI_Comm_spawn\n");
     CHECK_MPI_CODE( MPI_Comm_spawn("intercomm_merge_child", NULL,
NKIDS, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &kids, errs) );
     printf("parent call to MPI_Comm_spawn returns\n");
     for (k = 0;k < NKIDS; ++k)
         CHECK_MPI_CODE( errs[k] );

     printf("parent calls MPI_Intercomm_merge\n");
     CHECK_MPI_CODE( MPI_Intercomm_merge( kids, 0, &allmpi) );
     printf("parent MPI_Intercomm_merge returns\n");
     MPI_Finalize();
     return EXIT_SUCCESS;
}

*********************
intercomm_merge_child.c

#include <stdlib.h>
#include <stdio.h>
#include <mpi.h>

#define CHECK_MPI_CODE(expr) do { \
int ret = expr; \
if(MPI_SUCCESS != ret) { \
printf("ERROR %d\tat line %d\n", ret, __LINE__); \
return -ret; \
} \
} while(0)

int main(int argc, char *argv[])
{
     MPI_Comm parent;
     MPI_Comm allmpi;

     fprintf(stderr,"child calls MPI_Init \n");
     CHECK_MPI_CODE( MPI_Init(&argc,&argv) );
     fprintf(stderr,"child MPI_Init returns\n");

     CHECK_MPI_CODE( MPI_Comm_get_parent(&parent) );

     fprintf(stderr,"child calls MPI_Intercomm_merge \n");
     CHECK_MPI_CODE( MPI_Intercomm_merge( parent, 1, &allmpi) );
     fprintf(stderr,"child call to MPI_Intercomm_merge returns\n");

     MPI_Finalize();
     return EXIT_SUCCESS;
}

Aurelien

Le 28 juil. 08 à 15:52, Mark Borgerding a écrit :

> Check.
>
> Parent has high=0
> Children have high=1
>
>
>
> Jeff Squyres wrote:
>> Ok, good.
>>
>> One thing to check is that you have put different values for the
>> "high" value between the parent group and the children group.
>>
>>
>> On Jul 28, 2008, at 3:42 PM, Mark Borgerding wrote:
>>
>>> I should've been clearer. I have observed the same behavior under
>>> both those versions.
>>> I was not using the two version in the same cluster.
>>>
>>> -- Mark
>>>
>>>
>>> Jeff Squyres wrote:
>>>> Are you mixing both v1.2.4 and v1.2.5 in a single MPI job? That
>>>> may have unintended side-effects -- we unfortunately do not
>>>> guarantee binary compatibility between any of our releases.
>>>>
>>>>
>>>> On Jul 28, 2008, at 10:16 AM, Mark Borgerding wrote:
>>>>
>>>>> I am using version 1.2.4 (Fedora 9) and 1.2.5 ( CentOS 5.2 )
>>>>>
>>>>>
>>>>> A little clarification:
>>>>> The children do not actually wake up when the parent *sends*
>>>>> data to them, but only after the parent tries to receive data
>>>>> from the merged intercomm.
>>>>>
>>>>>
>>>>> Here is the timeline:
>>>>>
>>>>> ...
>>>>> parent call to MPI_Comm_spawn returns
>>>>> parent calls MPI_Intercomm_merge
>>>>> children call to MPI_Init return
>>>>> children call MPI_Intercomm_merge
>>>>> parent MPI_Intercomm_merge returns
>>>>> (long pause inserted via parent sleep)
>>>>> parent sends data to kid 1
>>>>> (long pause inserted via parent sleep)
>>>>> parent starts to receive data from kid 1
>>>>> all children's calls to MPI_Intercomm_merge return
>>>>>
>>>>>
>>>>> -- Mark
>>>>>
>>>>> Aurélien Bouteiller wrote:
>>>>>> Ok, I'll check to see what happens. Which version of Open MPI
>>>>>> are you using ?
>>>>>>
>>>>>> Aurelien
>>>>>>
>>>>>> Le 27 juil. 08 à 23:13, Mark Borgerding a écrit :
>>>>>>
>>>>>>> I got something working, but I'm not 100% sure why.
>>>>>>>
>>>>>>> The children woke up and returned from their calls to
>>>>>>> MPI_Intercomm_merge only after
>>>>>>> the parent used the intercomm to send some data to the
>>>>>>> children via MPI_Send.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Mark Borgerding wrote:
>>>>>>>> Perhaps I am doing something wrong. The childrens' calls to
>>>>>>>> MPI_Intercomm_merge never return.
>>>>>>>>
>>>>>>>> Here's the chronology (with 2 children):
>>>>>>>>
>>>>>>>> parent calls MPI_Init
>>>>>>>> parent calls MPI_Comm_spawn
>>>>>>>> child calls MPI_Init
>>>>>>>> child calls MPI_Init
>>>>>>>> parent call to MPI_Comm_spawn returns
>>>>>>>> (long pause inserted)
>>>>>>>> parent calls MPI_Intercomm_merge
>>>>>>>> child MPI_Init returns
>>>>>>>> child calls MPI_Intercomm_merge
>>>>>>>> child MPI_Init returns
>>>>>>>> child calls MPI_Intercomm_merge
>>>>>>>> parent MPI_Intercomm_merge returns
>>>>>>>> ... but the child processes never return from the
>>>>>>>> MPI_InterComm_merge function.
>>>>>>>>
>>>>>>>>
>>>>>>>> Here are some code snippets:
>>>>>>>>
>>>>>>>> ############# parent:
>>>>>>>>
>>>>>>>> MPI_Init(NULL,NULL);
>>>>>>>>
>>>>>>>> int nkids=2;
>>>>>>>> int errs[nkids];
>>>>>>>> MPI_Comm kid;
>>>>>>>> cerr << "parent calls MPI_Comm_spawn" << endl;
>>>>>>>> CHECK_MPI_CODE
>>>>>>>> ( MPI_Comm_spawn("test_mpi",NULL,nkids,MPI_INFO_NULL,
>>>>>>>> 0,MPI_COMM_WORLD,&kid,errs) );
>>>>>>>> cerr << "parent call to MPI_Comm_spawn returns" << endl;
>>>>>>>> for (k=0;k<nkids;++k)
>>>>>>>> CHECK_MPI_CODE( errs[k] );
>>>>>>>>
>>>>>>>> MPI_Comm allmpi;
>>>>>>>> cerr << "(long pause)" << endl;
>>>>>>>> sleep(3);
>>>>>>>> cerr << "parent calls MPI_Intercomm_merge\n";
>>>>>>>> CHECK_MPI_CODE( MPI_Intercomm_merge( kid, 0, &allmpi) );
>>>>>>>> cerr << "parent MPI_Intercomm_merge returns\n";
>>>>>>>>
>>>>>>>> ############### child:
>>>>>>>>
>>>>>>>> fprintf(stderr,"child calls MPI_Init \n");
>>>>>>>> CHECK_MPI_CODE( MPI_Init(NULL,NULL) );
>>>>>>>> fprintf(stderr,"child MPI_Init returns\n");
>>>>>>>>
>>>>>>>> MPI_Comm parent;
>>>>>>>> CHECK_MPI_CODE( MPI_Comm_get_parent(&parent) );
>>>>>>>>
>>>>>>>> fprintf(stderr,"child calls MPI_Intercomm_merge \n");
>>>>>>>> MPI_Comm allmpi;
>>>>>>>> CHECK_MPI_CODE( MPI_Intercomm_merge( parent, 1, &allmpi) );
>>>>>>>> fprintf(stderr,"child call to MPI_Intercomm_merge returns\n");
>>>>>>>> (the above line never gets executed)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Aurélien Bouteiller wrote:
>>>>>>>>> MPI_Intercomm_merge is what you are looking for.
>>>>>>>>>
>>>>>>>>> Aurelien
>>>>>>>>> Le 26 juil. 08 à 13:23, Mark Borgerding a écrit :
>>>>>>>>>
>>>>>>>>>> Okay, so I've gotten a little bit closer.
>>>>>>>>>>
>>>>>>>>>> I'm using MPI_Comm_spawn to start several children
>>>>>>>>>> processes. The problem is that the children are in their
>>>>>>>>>> own group, separate from the parent (just the like the
>>>>>>>>>> documentation says). I want to merge the children's group
>>>>>>>>>> with the parent group so I can efficiently Send/Recv data
>>>>>>>>>> between them..
>>>>>>>>>>
>>>>>>>>>> Is this possible?
>>>>>>>>>>
>>>>>>>>>> Plan B: I guess if there is no elegant way to merge all
>>>>>>>>>> those processes into one group, I can connect sockets and
>>>>>>>>>> make intercomms to talk from the parent directly to each
>>>>>>>>>> child.
>>>>>>>>>>
>>>>>>>>>> -- Mark
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Mark Borgerding wrote:
>>>>>>>>>>> I am writing a code module that plugs into a larger
>>>>>>>>>>> application framework. That framework loads my code
>>>>>>>>>>> module as a shared object.
>>>>>>>>>>> So I do not control how the first process gets started,
>>>>>>>>>>> but I still want it to be able to start and participate in
>>>>>>>>>>> an MPI group.
>>>>>>>>>>>
>>>>>>>>>>> Here's roughly what I want to happen ( I think):
>>>>>>>>>>>
>>>>>>>>>>> framework app running (not under my control)
>>>>>>>>>>> -> framework loads mycode.so shared object into its
>>>>>>>>>>> process
>>>>>>>>>>> -> mycode.so starts mpi programs on several hosts
>>>>>>>>>>> (e.g. via system call to mpiexec )
>>>>>>>>>>> -> initial mycode.so process participates in the
>>>>>>>>>>> group he just started (e.g. he shows up in MPI_Comm_group,
>>>>>>>>>>> can use MPI_Send, MPI_Recv, etc. )
>>>>>>>>>>>
>>>>>>>>>>> Can this be done?
>>>>>>>>>>> I am running under Centos 5.2
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Mark
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> users mailing list
>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> users_at_[hidden]
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> users_at_[hidden]
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> users_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> * Dr. Aurélien Bouteiller
>>>>>> * Sr. Research Associate at Innovative Computing Laboratory
>>>>>> * University of Tennessee
>>>>>> * 1122 Volunteer Boulevard, suite 350
>>>>>> * Knoxville, TN 37996
>>>>>> * 865 974 6321
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>> --
>>>>> Mark Borgerding
>>>>> 3dB Labs, Inc
>>>>> Innovate. Develop. Deliver.
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>
>>>
>>> --
>>> Mark Borgerding
>>> 3dB Labs, Inc
>>> Innovate. Develop. Deliver.
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>
>
> --
> Mark Borgerding
> 3dB Labs, Inc
> Innovate. Develop. Deliver.
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

--
* Dr. Aurélien Bouteiller
* Sr. Research Associate at Innovative Computing Laboratory
* University of Tennessee
* 1122 Volunteer Boulevard, suite 350
* Knoxville, TN 37996
* 865 974 6321