Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] How to specify hosts for MPI_Comm_spawn
From: Matt Hughes (matt.c.hughes+ompi_at_[hidden])
Date: 2008-07-29 17:01:18


I've found that I always have to use mpirun to start my spawner
process, due to the exact problem you are having: the need to give
OMPI a hosts file! It seems the singleton functionality is lacking
somehow... it won't allow you to spawn on arbitrary hosts. I have not
tested if this is fixed in the 1.3 series.

Try
mpiexec -np 1 -H op2-1,op2-2 spawner op2-2

mpiexec should start the first process on op2-1, and the spawn call
should start the second on op2-2. If you don't use the Info object to
set the hostname specifically, then on 1.2.x it will automatically
start on op2-2. With 1.3, the spawn call will start processes
starting with the first item in the host list.

mch

2008/7/29 Mark Borgerding <markb_at_[hidden]>:
> Yes. The host names are listed in the host file.
> e.g.
> "op2-1 slots=8"
> and there is an IP address for op2-1 in the /etc/hosts directory
> I've read the FAQ. Everything in there seems to assume I am starting the
> process group with mpirun or one of its brothers. This is not the case .
>
> I've created and attached a sample source file that demonstrates my problem.
> It participates in a MPI_Group in one of two ways: either from mpiexec or
> via MPI_Comm_spawn
>
> Case 1 works: I can run it on the remote node op2-1 by using mpiexec
> mpiexec -np 3 -H op2-1 spawner
>
> Case 2 works: I can run it on the current host with MPI_Comm_spawn
> ./spawner `hostname`
>
> Case 3 does not work: I cannot use MPI_Comm_spawn to start a group on a
> remote node.
> ./spawner op2-1
>
> The output from case 3 is:
> <QUOTE>
> I am going to spawn 2 children on op2-1
> --------------------------------------------------------------------------
> Some of the requested hosts are not included in the current allocation for
> the
> application:
> ./spawner
> The requested hosts were:
> op2-1
>
> Verify that you have mapped the allocated resources properly using the
> --host specification.
> --------------------------------------------------------------------------
> [gardner:32745] [0,0,0] ORTE_ERROR_LOG: Out of resource in file
> base/rmaps_base_support_fns.c at line 225
> [gardner:32745] [0,0,0] ORTE_ERROR_LOG: Out of resource in file rmaps_rr.c
> at line 478
> [gardner:32745] [0,0,0] ORTE_ERROR_LOG: Out of resource in file
> base/rmaps_base_map_job.c at line 210
> [gardner:32745] [0,0,0] ORTE_ERROR_LOG: Out of resource in file rmgr_urm.c
> at line 372
> [gardner:32745] [0,0,0] ORTE_ERROR_LOG: Out of resource in file
> communicator/comm_dyn.c at line 608
>
> </QUOTE>
>
> Ralph Castain wrote:
>>
>> OMPI doesn't care what your hosts are named - many of us use names that
>> have no numeric pattern or any other discernible pattern to them.
>>
>> OMPI_MCA_rds_hostfile should point to a file that contains a list of the
>> hosts - have you ensured that it does, and that the hostfile format is
>> correct? Check the FAQ on the open-mpi.org site:
>>
>> http://www.open-mpi.org/faq/?category=running#simple-spmd-run
>>
>> There are several explanations there pertaining to hostfiles.
>>
>>
>> On Jul 29, 2008, at 11:57 AM, Mark Borgerding wrote:
>>
>>> I listed the node names in the path named in ompi_info --param rds
>>> hostfile -- no luck.
>>> I also tried copying that file to another location and setting
>>> OMPI_MCA_rds_hostfile_path -- no luck.
>>>
>>> The remote hosts are named op2-1 and op2-2. Could this be another case
>>> of the problem I saw a few days ago where the hostnames were assumed to
>>> contain a numeric pattern?
>>>
>>> -- Mark
>>>
>>>
>>>
>>> Ralph Castain wrote:
>>>>
>>>> For the 1.2 release, I believe you will find the enviro param is
>>>> OMPI_MCA_rds_hostfile_path - you can check that with "ompi_info".
>>>>
>>>>
>>>> On Jul 29, 2008, at 11:10 AM, Mark Borgerding wrote:
>>>>
>>>>> Umm ... what -hostfile file?
>>>>>
>>>>> I am not starting anything via mpiexec/orterun so there is no
>>>>> "-hostfile" argument AFAIK.
>>>>> Is there some other way to communicate this? An environment variable or
>>>>> mca param?
>>>>>
>>>>>
>>>>> -- Mark
>>>>>
>>>>>
>>>>> Ralph Castain wrote:
>>>>>>
>>>>>> Are the hosts where you want the children to go in your -hostfile
>>>>>> file? All of the hosts you intend to use have to be in that file, even if
>>>>>> they don't get used until the comm_spawn.
>>>>>>
>>>>>>
>>>>>> On Jul 29, 2008, at 9:08 AM, Mark Borgerding wrote:
>>>>>>
>>>>>>> I've tried lots of different values for the "host" key in the info
>>>>>>> handle.
>>>>>>> I've tried hardcoding the hostname+ip entries in the /etc/hosts file
>>>>>>> -- no luck. I cannot get my MPI_Comm_spawn children to go anywhere else on
>>>>>>> the network.
>>>>>>>
>>>>>>> mpiexec can start groups on the other machines just fine. It seems
>>>>>>> like there is some initialization that is done by orterun but not by
>>>>>>> MPI_Comm_spawn.
>>>>>>>
>>>>>>> Is there a document that describes how the default process management
>>>>>>> works?
>>>>>>> I do not have infiniband, myrinet or any specialized rte, just ssh.
>>>>>>> All the machines are CentOS 5.2 (openmpi 1.2.5)
>>>>>>>
>>>>>>>
>>>>>>> -- Mark
>>>>>>>
>>>>>>> Ralph Castain wrote:
>>>>>>>>
>>>>>>>> The string "localhost" may not be recognized in the 1.2 series for
>>>>>>>> comm_spawn. Do a "hostname" and use that string instead - should work.
>>>>>>>>
>>>>>>>> Ralph
>>>>>>>>
>>>>>>>> On Jul 28, 2008, at 10:38 AM, Mark Borgerding wrote:
>>>>>>>>
>>>>>>>>> When I add the info parameter in MPI_Comm_spawn, I get the error
>>>>>>>>> "Some of the requested hosts are not included in the current
>>>>>>>>> allocation for the application:
>>>>>>>>> [...]
>>>>>>>>> Verify that you have mapped the allocated resources properly using
>>>>>>>>> the
>>>>>>>>> --host specification."
>>>>>>>>>
>>>>>>>>> Here is a snippet of my code that causes the error:
>>>>>>>>>
>>>>>>>>> MPI_Info info;
>>>>>>>>> MPI_Info_create( &info );
>>>>>>>>> MPI_Info_set(info,"host","localhost");
>>>>>>>>> MPI_Comm_spawn( cmd , MPI_ARGV_NULL , nkids , info , 0 ,
>>>>>>>>> MPI_COMM_SELF , &kid , errs );
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Mark Borgerding wrote:
>>>>>>>>>>
>>>>>>>>>> Thanks, I don't know how I missed that. Perhaps I got thrown off
>>>>>>>>>> by
>>>>>>>>>> "Portable programs not requiring detailed control over process
>>>>>>>>>> locations should use MPI_INFO_NULL."
>>>>>>>>>>
>>>>>>>>>> If there were a computing equivalent of Maslow's Hierarchy of
>>>>>>>>>> Needs, functioning would be more fundamental than portability :)
>>>>>>>>>>
>>>>>>>>>> -- Mark
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Ralph Castain wrote:
>>>>>>>>>>>
>>>>>>>>>>> Take a look at the man page for MPI_Comm_spawn. It should explain
>>>>>>>>>>> that you need to create an MPI_Info key that has the key of "host" and a
>>>>>>>>>>> value that contains a comma-delimited list of hosts to be used for the child
>>>>>>>>>>> processes.
>>>>>>>>>>>
>>>>>>>>>>> Hope that helps
>>>>>>>>>>> Ralph
>>>>>>>>>>>
>>>>>>>>>>> On Jul 28, 2008, at 8:54 AM, Mark Borgerding wrote:
>>>>>>>>>>>
>>>>>>>>>>>> How does openmpi decide which hosts are used with
>>>>>>>>>>>> MPI_Comm_spawn? All the docs I've found talk about specifying hosts on the
>>>>>>>>>>>> mpiexec/mpirun command and so are not applicable.
>>>>>>>>>>>> I am unable to spawn on anything but localhost (which makes for
>>>>>>>>>>>> a pretty uninteresting cluster).
>>>>>>>>>>>>
>>>>>>>>>>>> When I run
>>>>>>>>>>>> ompi_info --param rds hostfile
>>>>>>>>>>>> It reports MCA rds: parameter
>>>>>>>>>>>> "rds_hostfile_path" (current value:
>>>>>>>>>>>> "/usr/lib/openmpi/1.2.5-gcc/etc/openmpi-default-hostfile")
>>>>>>>>>>>> I tried changing that file but it has no effect.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I am using
>>>>>>>>>>>> openmpi 1.2.5
>>>>>>>>>>>> CentOS 5.2
>>>>>>>>>>>> ethernet TCP
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -- Mark
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> users mailing list
>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> users mailing list
>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Mark Borgerding
>>>>>>>>> 3dB Labs, Inc
>>>>>>>>> Innovate. Develop. Deliver.
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> users_at_[hidden]
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> users_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <mpi.h>
>
> /*
> *(new BSD license)
> *
> Copyright (c) 2008 Mark Borgerding
>
> All rights reserved.
>
> Redistribution and use in source and binary forms, with or without
> modification, are permitted provided that the following conditions are met:
>
> * Redistributions of source code must retain the above copyright notice,
> this list of conditions and the following disclaimer.
> * Redistributions in binary form must reproduce the above copyright
> notice, this list of conditions and the following disclaimer in the
> documentation and/or other materials provided with the distribution.
> * Neither the author nor the names of any contributors may be used to
> endorse or promote products derived from this software without specific
> prior written permission.
> THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
> IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
> THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
> PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
> CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
> EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
> PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
> OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
> WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
> OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
> ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> *
> */
>
> int main(int argc, char ** argv)
> {
> MPI_Comm parent;
> MPI_Comm allmpi;
> MPI_Info info;
> MPI_Comm icom;
> MPI_Status status;
> int i,k,rank,size,length,count;
> char name[256];
>
> MPI_Init(NULL,NULL);
> MPI_Comm_get_parent(&parent);
>
> if ( parent == MPI_COMM_NULL ) {
> MPI_Comm_size(MPI_COMM_WORLD,&size);
> if (size>1) {
> fprintf(stderr,"I think I was started by orterun\n");
> MPI_Comm_dup(MPI_COMM_WORLD,&allmpi);
> }else{
> if (argc<2) {
> fprintf(stderr,"please provide a host argument (will be
> placed in MPI_Info for MPI_Comm_spawn\n");
> return 1;
> }
> fprintf(stderr,"I am going to spawn 2 children on %s\n",argv[1]);
> int errs[2];
>
> MPI_Info_create( &info );
> MPI_Info_set(info,"host",argv[1]);
>
> MPI_Comm_spawn(argv[0],MPI_ARGV_NULL,2,info,0,MPI_COMM_WORLD,&icom,errs);
> MPI_Intercomm_merge( icom, 0, &allmpi);
> MPI_Info_free(&info);
> }
> }else{
> fprintf(stderr,"I was started by MPI_Comm_spawn\n");
> MPI_Intercomm_merge( parent, 1, &allmpi);
> }
>
> MPI_Comm_rank(allmpi,&rank);
> MPI_Comm_size(allmpi,&size);
> MPI_Get_processor_name(name,&length);
> fprintf(stderr,"Hello my name is %s. I am %d of %d\n",name,rank,size);
>
> if (rank==0) {
> int k;
> float buf[128];
> memset(buf,0,sizeof(buf));
> fprintf(stderr,"rank zero sending data to all others\n");
> for (k=1;k<size;++k)
> MPI_Send( buf , 128 , MPI_FLOAT, k, 42 , allmpi);
> fprintf(stderr,"rank zero data from all others\n");
>
> for (k=1;k<size;++k) {
> MPI_Recv( buf , 128 , MPI_FLOAT, k, 42 , allmpi,&status);
> MPI_Get_count( &status, MPI_FLOAT, &count);
> if (count!= 128) {
> fprintf(stderr,"short read from %d (count=%d)\n",k,count);
> exit(1);
> }
> }
> }else{
> float buf[128];
> MPI_Recv( buf , 128 , MPI_FLOAT, 0, 42 , allmpi,&status);
> MPI_Get_count( &status, MPI_FLOAT, &count);
> if (count!= 128) {
> fprintf(stderr,"short read from 0 (count=%d)\n",count);
> exit(1);
> }
> MPI_Send( buf , 128 , MPI_FLOAT, 0, 42 , allmpi);
> }
> fprintf(stderr,"Exiting %s (%d of %d)\n",name,rank,size);
>
> MPI_Comm_free( &allmpi);
> MPI_Finalize();
> }
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>