Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] How to specify hosts for MPI_Comm_spawn
From: Ralph Castain (rhc_at_[hidden])
Date: 2008-07-29 16:59:20


Afraid I am out of suggestions - could be a bug in the old 1.2 series.
You might try with the 1.3 series...or perhaps someone else has a
suggestion here.

On Jul 29, 2008, at 2:46 PM, Mark Borgerding wrote:

> Yes. The host names are listed in the host file.
> e.g.
> "op2-1 slots=8"
> and there is an IP address for op2-1 in the /etc/hosts directory
> I've read the FAQ. Everything in there seems to assume I am
> starting the process group with mpirun or one of its brothers. This
> is not the case .
>
> I've created and attached a sample source file that demonstrates my
> problem. It participates in a MPI_Group in one of two ways: either
> from mpiexec or via MPI_Comm_spawn
>
> Case 1 works: I can run it on the remote node op2-1 by using mpiexec
> mpiexec -np 3 -H op2-1 spawner
>
> Case 2 works: I can run it on the current host with MPI_Comm_spawn
> ./spawner `hostname`
>
> Case 3 does not work: I cannot use MPI_Comm_spawn to start a group
> on a remote node.
> ./spawner op2-1
>
> The output from case 3 is:
> <QUOTE>
> I am going to spawn 2 children on op2-1
> --------------------------------------------------------------------------
> Some of the requested hosts are not included in the current
> allocation for the
> application:
> ./spawner
> The requested hosts were:
> op2-1
>
> Verify that you have mapped the allocated resources properly using the
> --host specification.
> --------------------------------------------------------------------------
> [gardner:32745] [0,0,0] ORTE_ERROR_LOG: Out of resource in file base/
> rmaps_base_support_fns.c at line 225
> [gardner:32745] [0,0,0] ORTE_ERROR_LOG: Out of resource in file
> rmaps_rr.c at line 478
> [gardner:32745] [0,0,0] ORTE_ERROR_LOG: Out of resource in file base/
> rmaps_base_map_job.c at line 210
> [gardner:32745] [0,0,0] ORTE_ERROR_LOG: Out of resource in file
> rmgr_urm.c at line 372
> [gardner:32745] [0,0,0] ORTE_ERROR_LOG: Out of resource in file
> communicator/comm_dyn.c at line 608
>
> </QUOTE>
>
> Ralph Castain wrote:
>> OMPI doesn't care what your hosts are named - many of us use names
>> that have no numeric pattern or any other discernible pattern to
>> them.
>>
>> OMPI_MCA_rds_hostfile should point to a file that contains a list
>> of the hosts - have you ensured that it does, and that the hostfile
>> format is correct? Check the FAQ on the open-mpi.org site:
>>
>> http://www.open-mpi.org/faq/?category=running#simple-spmd-run
>>
>> There are several explanations there pertaining to hostfiles.
>>
>>
>> On Jul 29, 2008, at 11:57 AM, Mark Borgerding wrote:
>>
>>> I listed the node names in the path named in ompi_info --param rds
>>> hostfile -- no luck.
>>> I also tried copying that file to another location and setting
>>> OMPI_MCA_rds_hostfile_path -- no luck.
>>>
>>> The remote hosts are named op2-1 and op2-2. Could this be another
>>> case of the problem I saw a few days ago where the hostnames were
>>> assumed to contain a numeric pattern?
>>>
>>> -- Mark
>>>
>>>
>>>
>>> Ralph Castain wrote:
>>>> For the 1.2 release, I believe you will find the enviro param is
>>>> OMPI_MCA_rds_hostfile_path - you can check that with "ompi_info".
>>>>
>>>>
>>>> On Jul 29, 2008, at 11:10 AM, Mark Borgerding wrote:
>>>>
>>>>> Umm ... what -hostfile file?
>>>>>
>>>>> I am not starting anything via mpiexec/orterun so there is no "-
>>>>> hostfile" argument AFAIK.
>>>>> Is there some other way to communicate this? An environment
>>>>> variable or mca param?
>>>>>
>>>>>
>>>>> -- Mark
>>>>>
>>>>>
>>>>> Ralph Castain wrote:
>>>>>> Are the hosts where you want the children to go in your -
>>>>>> hostfile file? All of the hosts you intend to use have to be in
>>>>>> that file, even if they don't get used until the comm_spawn.
>>>>>>
>>>>>>
>>>>>> On Jul 29, 2008, at 9:08 AM, Mark Borgerding wrote:
>>>>>>
>>>>>>> I've tried lots of different values for the "host" key in the
>>>>>>> info handle.
>>>>>>> I've tried hardcoding the hostname+ip entries in the /etc/
>>>>>>> hosts file -- no luck. I cannot get my MPI_Comm_spawn
>>>>>>> children to go anywhere else on the network.
>>>>>>>
>>>>>>> mpiexec can start groups on the other machines just fine. It
>>>>>>> seems like there is some initialization that is done by
>>>>>>> orterun but not by MPI_Comm_spawn.
>>>>>>>
>>>>>>> Is there a document that describes how the default process
>>>>>>> management works?
>>>>>>> I do not have infiniband, myrinet or any specialized rte, just
>>>>>>> ssh.
>>>>>>> All the machines are CentOS 5.2 (openmpi 1.2.5)
>>>>>>>
>>>>>>>
>>>>>>> -- Mark
>>>>>>>
>>>>>>> Ralph Castain wrote:
>>>>>>>> The string "localhost" may not be recognized in the 1.2
>>>>>>>> series for comm_spawn. Do a "hostname" and use that string
>>>>>>>> instead - should work.
>>>>>>>>
>>>>>>>> Ralph
>>>>>>>>
>>>>>>>> On Jul 28, 2008, at 10:38 AM, Mark Borgerding wrote:
>>>>>>>>
>>>>>>>>> When I add the info parameter in MPI_Comm_spawn, I get the
>>>>>>>>> error
>>>>>>>>> "Some of the requested hosts are not included in the current
>>>>>>>>> allocation for the application:
>>>>>>>>> [...]
>>>>>>>>> Verify that you have mapped the allocated resources properly
>>>>>>>>> using the
>>>>>>>>> --host specification."
>>>>>>>>>
>>>>>>>>> Here is a snippet of my code that causes the error:
>>>>>>>>>
>>>>>>>>> MPI_Info info;
>>>>>>>>> MPI_Info_create( &info );
>>>>>>>>> MPI_Info_set(info,"host","localhost");
>>>>>>>>> MPI_Comm_spawn( cmd , MPI_ARGV_NULL , nkids , info , 0 ,
>>>>>>>>> MPI_COMM_SELF , &kid , errs );
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Mark Borgerding wrote:
>>>>>>>>>> Thanks, I don't know how I missed that. Perhaps I got
>>>>>>>>>> thrown off by
>>>>>>>>>> "Portable programs not requiring detailed control over
>>>>>>>>>> process locations should use MPI_INFO_NULL."
>>>>>>>>>>
>>>>>>>>>> If there were a computing equivalent of Maslow's Hierarchy
>>>>>>>>>> of Needs, functioning would be more fundamental than
>>>>>>>>>> portability :)
>>>>>>>>>>
>>>>>>>>>> -- Mark
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Ralph Castain wrote:
>>>>>>>>>>> Take a look at the man page for MPI_Comm_spawn. It should
>>>>>>>>>>> explain that you need to create an MPI_Info key that has
>>>>>>>>>>> the key of "host" and a value that contains a comma-
>>>>>>>>>>> delimited list of hosts to be used for the child processes.
>>>>>>>>>>>
>>>>>>>>>>> Hope that helps
>>>>>>>>>>> Ralph
>>>>>>>>>>>
>>>>>>>>>>> On Jul 28, 2008, at 8:54 AM, Mark Borgerding wrote:
>>>>>>>>>>>
>>>>>>>>>>>> How does openmpi decide which hosts are used with
>>>>>>>>>>>> MPI_Comm_spawn? All the docs I've found talk about
>>>>>>>>>>>> specifying hosts on the mpiexec/mpirun command and so are
>>>>>>>>>>>> not applicable.
>>>>>>>>>>>> I am unable to spawn on anything but localhost (which
>>>>>>>>>>>> makes for a pretty uninteresting cluster).
>>>>>>>>>>>>
>>>>>>>>>>>> When I run
>>>>>>>>>>>> ompi_info --param rds hostfile
>>>>>>>>>>>> It reports MCA rds: parameter
>>>>>>>>>>>> "rds_hostfile_path" (current value: "/usr/lib/openmpi/
>>>>>>>>>>>> 1.2.5-gcc/etc/openmpi-default-hostfile")
>>>>>>>>>>>> I tried changing that file but it has no effect.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I am using
>>>>>>>>>>>> openmpi 1.2.5
>>>>>>>>>>>> CentOS 5.2
>>>>>>>>>>>> ethernet TCP
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -- Mark
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> users mailing list
>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> users mailing list
>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Mark Borgerding
>>>>>>>>> 3dB Labs, Inc
>>>>>>>>> Innovate. Develop. Deliver.
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> users_at_[hidden]
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> users_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <mpi.h>
>
> /*
> *(new BSD license)
> *
> Copyright (c) 2008 Mark Borgerding
>
> All rights reserved.
>
> Redistribution and use in source and binary forms, with or without
> modification, are permitted provided that the following conditions
> are met:
>
> * Redistributions of source code must retain the above copyright
> notice, this list of conditions and the following disclaimer.
> * Redistributions in binary form must reproduce the above
> copyright notice, this list of conditions and the following
> disclaimer in the documentation and/or other materials provided with
> the distribution.
> * Neither the author nor the names of any contributors may be
> used to endorse or promote products derived from this software
> without specific prior written permission.
> THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
> INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
> MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
> IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
> ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
> CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
> SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
> BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
> LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
> NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
> SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> *
> */
>
> int main(int argc, char ** argv)
> {
> MPI_Comm parent;
> MPI_Comm allmpi;
> MPI_Info info;
> MPI_Comm icom;
> MPI_Status status;
> int i,k,rank,size,length,count;
> char name[256];
>
> MPI_Init(NULL,NULL);
> MPI_Comm_get_parent(&parent);
>
> if ( parent == MPI_COMM_NULL ) {
> MPI_Comm_size(MPI_COMM_WORLD,&size);
> if (size>1) {
> fprintf(stderr,"I think I was started by orterun\n");
> MPI_Comm_dup(MPI_COMM_WORLD,&allmpi);
> }else{
> if (argc<2) {
> fprintf(stderr,"please provide a host argument (will
> be placed in MPI_Info for MPI_Comm_spawn\n");
> return 1;
> }
> fprintf(stderr,"I am going to spawn 2 children on %s
> \n",argv[1]);
> int errs[2];
>
> MPI_Info_create( &info );
> MPI_Info_set(info,"host",argv[1]);
> MPI_Comm_spawn(argv[0],MPI_ARGV_NULL,2,info,
> 0,MPI_COMM_WORLD,&icom,errs);
> MPI_Intercomm_merge( icom, 0, &allmpi);
> MPI_Info_free(&info);
> }
> }else{
> fprintf(stderr,"I was started by MPI_Comm_spawn\n");
> MPI_Intercomm_merge( parent, 1, &allmpi);
> }
>
> MPI_Comm_rank(allmpi,&rank);
> MPI_Comm_size(allmpi,&size);
> MPI_Get_processor_name(name,&length);
> fprintf(stderr,"Hello my name is %s. I am %d of %d
> \n",name,rank,size);
>
> if (rank==0) {
> int k;
> float buf[128];
> memset(buf,0,sizeof(buf));
> fprintf(stderr,"rank zero sending data to all others\n");
> for (k=1;k<size;++k)
> MPI_Send( buf , 128 , MPI_FLOAT, k, 42 , allmpi);
> fprintf(stderr,"rank zero data from all others\n");
>
> for (k=1;k<size;++k) {
> MPI_Recv( buf , 128 , MPI_FLOAT, k, 42 , allmpi,&status);
> MPI_Get_count( &status, MPI_FLOAT, &count);
> if (count!= 128) {
> fprintf(stderr,"short read from %d (count=%d)
> \n",k,count);
> exit(1);
> }
> }
> }else{
> float buf[128];
> MPI_Recv( buf , 128 , MPI_FLOAT, 0, 42 , allmpi,&status);
> MPI_Get_count( &status, MPI_FLOAT, &count);
> if (count!= 128) {
> fprintf(stderr,"short read from 0 (count=%d)\n",count);
> exit(1);
> }
> MPI_Send( buf , 128 , MPI_FLOAT, 0, 42 , allmpi);
> }
> fprintf(stderr,"Exiting %s (%d of %d)\n",name,rank,size);
>
> MPI_Comm_free( &allmpi);
> MPI_Finalize();
> }
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users