Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI::Intracomm::Spawn and cluster configuration
From: Brian Budge (brian.budge_at_[hidden])
Date: 2012-08-22 11:56:36


Okay. Is there a tutorial or FAQ for setting everything up? Or is it
really just that simple? I don't need to run a copy of the orte
server somewhere?

if my current ip is 192.168.0.1,

0 > echo 192.168.0.11 > /tmp/hostfile
1 > echo 192.168.0.12 >> /tmp/hostfile
2 > export OMPI_MCA_orte_default_hostfile=/tmp/hostfile
3 > ./mySpawningExe

At this point, mySpawningExe will be the master, running on
192.168.0.1, and I can have spawned, for example, childExe on
192.168.0.11 and 192.168.0.12? Or childExe1 on 192.168.0.11 and
childExe2 on 192.168.0.12?

Thanks for the help.

  Brian

On Wed, Aug 22, 2012 at 7:15 AM, Ralph Castain <rhc_at_[hidden]> wrote:
> Sure, that's still true on all 1.3 or above releases. All you need to do is set the hostfile envar so we pick it up:
>
> OMPI_MCA_orte_default_hostfile=<foo>
>
>
> On Aug 21, 2012, at 7:23 PM, Brian Budge <brian.budge_at_[hidden]> wrote:
>
>> Hi. I know this is an old thread, but I'm curious if there are any
>> tutorials describing how to set this up? Is this still available on
>> newer open mpi versions?
>>
>> Thanks,
>> Brian
>>
>> On Fri, Jan 4, 2008 at 7:57 AM, Ralph Castain <rhc_at_[hidden]> wrote:
>>> Hi Elena
>>>
>>> I'm copying this to the user list just to correct a mis-statement on my part
>>> in an earlier message that went there. I had stated that a singleton could
>>> comm_spawn onto other nodes listed in a hostfile by setting an environmental
>>> variable that pointed us to the hostfile.
>>>
>>> This is incorrect in the 1.2 code series. That series does not allow
>>> singletons to read a hostfile at all. Hence, any comm_spawn done by a
>>> singleton can only launch child processes on the singleton's local host.
>>>
>>> This situation has been corrected for the upcoming 1.3 code series. For the
>>> 1.2 series, though, you will have to do it via an mpirun command line.
>>>
>>> Sorry for the confusion - I sometimes have too many code families to keep
>>> straight in this old mind!
>>>
>>> Ralph
>>>
>>>
>>> On 1/4/08 5:10 AM, "Elena Zhebel" <ezhebel_at_[hidden]> wrote:
>>>
>>>> Hello Ralph,
>>>>
>>>> Thank you very much for the explanations.
>>>> But I still do not get it running...
>>>>
>>>> For the case
>>>> mpirun -n 1 -hostfile my_hostfile -host my_master_host my_master.exe
>>>> everything works.
>>>>
>>>> For the case
>>>> ./my_master.exe
>>>> it does not.
>>>>
>>>> I did:
>>>> - create my_hostfile and put it in the $HOME/.openmpi/components/
>>>> my_hostfile :
>>>> bollenstreek slots=2 max_slots=3
>>>> octocore01 slots=8 max_slots=8
>>>> octocore02 slots=8 max_slots=8
>>>> clstr000 slots=2 max_slots=3
>>>> clstr001 slots=2 max_slots=3
>>>> clstr002 slots=2 max_slots=3
>>>> clstr003 slots=2 max_slots=3
>>>> clstr004 slots=2 max_slots=3
>>>> clstr005 slots=2 max_slots=3
>>>> clstr006 slots=2 max_slots=3
>>>> clstr007 slots=2 max_slots=3
>>>> - setenv OMPI_MCA_rds_hostfile_path my_hostfile (I put it in .tcshrc and
>>>> then source .tcshrc)
>>>> - in my_master.cpp I did
>>>> MPI_Info info1;
>>>> MPI_Info_create(&info1);
>>>> char* hostname =
>>>> "clstr002,clstr003,clstr005,clstr006,clstr007,octocore01,octocore02";
>>>> MPI_Info_set(info1, "host", hostname);
>>>>
>>>> _intercomm = intracomm.Spawn("./childexe", argv1, _nProc, info1, 0,
>>>> MPI_ERRCODES_IGNORE);
>>>>
>>>> - After I call the executable, I've got this error message
>>>>
>>>> bollenstreek: > ./my_master
>>>> number of processes to run: 1
>>>> --------------------------------------------------------------------------
>>>> Some of the requested hosts are not included in the current allocation for
>>>> the application:
>>>> ./childexe
>>>> The requested hosts were:
>>>> clstr002,clstr003,clstr005,clstr006,clstr007,octocore01,octocore02
>>>>
>>>> Verify that you have mapped the allocated resources properly using the
>>>> --host specification.
>>>> --------------------------------------------------------------------------
>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in file
>>>> base/rmaps_base_support_fns.c at line 225
>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in file
>>>> rmaps_rr.c at line 478
>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in file
>>>> base/rmaps_base_map_job.c at line 210
>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in file
>>>> rmgr_urm.c at line 372
>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in file
>>>> communicator/comm_dyn.c at line 608
>>>>
>>>> Did I miss something?
>>>> Thanks for help!
>>>>
>>>> Elena
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Ralph H Castain [mailto:rhc_at_[hidden]]
>>>> Sent: Tuesday, December 18, 2007 3:50 PM
>>>> To: Elena Zhebel; Open MPI Users <users_at_[hidden]>
>>>> Cc: Ralph H Castain
>>>> Subject: Re: [OMPI users] MPI::Intracomm::Spawn and cluster configuration
>>>>
>>>>
>>>>
>>>>
>>>> On 12/18/07 7:35 AM, "Elena Zhebel" <ezhebel_at_[hidden]> wrote:
>>>>
>>>>> Thanks a lot! Now it works!
>>>>> The solution is to use mpirun -n 1 -hostfile my.hosts *.exe and pass
>>>> MPI_Info
>>>>> Key to the Spawn function!
>>>>>
>>>>> One more question: is it necessary to start my "master" program with
>>>>> mpirun -n 1 -hostfile my_hostfile -host my_master_host my_master.exe ?
>>>>
>>>> No, it isn't necessary - assuming that my_master_host is the first host
>>>> listed in your hostfile! If you are only executing one my_master.exe (i.e.,
>>>> you gave -n 1 to mpirun), then we will automatically map that process onto
>>>> the first host in your hostfile.
>>>>
>>>> If you want my_master.exe to go on someone other than the first host in the
>>>> file, then you have to give us the -host option.
>>>>
>>>>>
>>>>> Are there other possibilities for easy start?
>>>>> I would say just to run ./my_master.exe , but then the master process
>>>> doesn't
>>>>> know about the available in the network hosts.
>>>>
>>>> You can set the hostfile parameter in your environment instead of on the
>>>> command line. Just set OMPI_MCA_rds_hostfile_path = my.hosts.
>>>>
>>>> You can then just run ./my_master.exe on the host where you want the master
>>>> to reside - everything should work the same.
>>>>
>>>> Just as an FYI: the name of that environmental variable is going to change
>>>> in the 1.3 release, but everything will still work the same.
>>>>
>>>> Hope that helps
>>>> Ralph
>>>>
>>>>
>>>>>
>>>>> Thanks and regards,
>>>>> Elena
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Ralph H Castain [mailto:rhc_at_[hidden]]
>>>>> Sent: Monday, December 17, 2007 5:49 PM
>>>>> To: Open MPI Users <users_at_[hidden]>; Elena Zhebel
>>>>> Cc: Ralph H Castain
>>>>> Subject: Re: [OMPI users] MPI::Intracomm::Spawn and cluster configuration
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 12/17/07 8:19 AM, "Elena Zhebel" <ezhebel_at_[hidden]> wrote:
>>>>>
>>>>>> Hello Ralph,
>>>>>>
>>>>>> Thank you for your answer.
>>>>>>
>>>>>> I'm using OpenMPI 1.2.3. , compiler glibc232, Linux Suse 10.0.
>>>>>> My "master" executable runs only on the one local host, then it spawns
>>>>>> "slaves" (with MPI::Intracomm::Spawn).
>>>>>> My question was: how to determine the hosts where these "slaves" will be
>>>>>> spawned?
>>>>>> You said: "You have to specify all of the hosts that can be used by
>>>>>> your job
>>>>>> in the original hostfile". How can I specify the host file? I can not
>>>>>> find it
>>>>>> in the documentation.
>>>>>
>>>>> Hmmm...sorry about the lack of documentation. I always assumed that the MPI
>>>>> folks in the project would document such things since it has little to do
>>>>> with the underlying run-time, but I guess that fell through the cracks.
>>>>>
>>>>> There are two parts to your question:
>>>>>
>>>>> 1. how to specify the hosts to be used for the entire job. I believe that
>>>> is
>>>>> somewhat covered here:
>>>>> http://www.open-mpi.org/faq/?category=running#simple-spmd-run
>>>>>
>>>>> That FAQ tells you what a hostfile should look like, though you may already
>>>>> know that. Basically, we require that you list -all- of the nodes that both
>>>>> your master and slave programs will use.
>>>>>
>>>>> 2. how to specify which nodes are available for the master, and which for
>>>>> the slave.
>>>>>
>>>>> You would specify the host for your master on the mpirun command line with
>>>>> something like:
>>>>>
>>>>> mpirun -n 1 -hostfile my_hostfile -host my_master_host my_master.exe
>>>>>
>>>>> This directs Open MPI to map that specified executable on the specified
>>>> host
>>>>> - note that my_master_host must have been in my_hostfile.
>>>>>
>>>>> Inside your master, you would create an MPI_Info key "host" that has a
>>>> value
>>>>> consisting of a string "host1,host2,host3" identifying the hosts you want
>>>>> your slave to execute upon. Those hosts must have been included in
>>>>> my_hostfile. Include that key in the MPI_Info array passed to your Spawn.
>>>>>
>>>>> We don't currently support providing a hostfile for the slaves (as opposed
>>>>> to the host-at-a-time string above). This may become available in a future
>>>>> release - TBD.
>>>>>
>>>>> Hope that helps
>>>>> Ralph
>>>>>
>>>>>>
>>>>>> Thanks and regards,
>>>>>> Elena
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On
>>>>>> Behalf Of Ralph H Castain
>>>>>> Sent: Monday, December 17, 2007 3:31 PM
>>>>>> To: Open MPI Users <users_at_[hidden]>
>>>>>> Cc: Ralph H Castain
>>>>>> Subject: Re: [OMPI users] MPI::Intracomm::Spawn and cluster
>>>>>> configuration
>>>>>>
>>>>>> On 12/12/07 5:46 AM, "Elena Zhebel" <ezhebel_at_[hidden]> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I'm working on a MPI application where I'm using OpenMPI instead of
>>>>>>> MPICH.
>>>>>>>
>>>>>>> In my "master" program I call the function MPI::Intracomm::Spawn which
>>>>>> spawns
>>>>>>> "slave" processes. It is not clear for me how to spawn the "slave"
>>>>>> processes
>>>>>>> over the network. Currently "master" creates "slaves" on the same
>>>>>>> host.
>>>>>>>
>>>>>>> If I use 'mpirun --hostfile openmpi.hosts' then processes are spawn
>>>>>>> over
>>>>>> the
>>>>>>> network as expected. But now I need to spawn processes over the
>>>>>>> network
>>>>>> from
>>>>>>> my own executable using MPI::Intracomm::Spawn, how can I achieve it?
>>>>>>>
>>>>>>
>>>>>> I'm not sure from your description exactly what you are trying to do,
>>>>>> nor in
>>>>>> what environment this is all operating within or what version of Open
>>>>>> MPI
>>>>>> you are using. Setting aside the environment and version issue, I'm
>>>>>> guessing
>>>>>> that you are running your executable over some specified set of hosts,
>>>>>> but
>>>>>> want to provide a different hostfile that specifies the hosts to be
>>>>>> used for
>>>>>> the "slave" processes. Correct?
>>>>>>
>>>>>> If that is correct, then I'm afraid you can't do that in any version
>>>>>> of Open
>>>>>> MPI today. You have to specify all of the hosts that can be used by
>>>>>> your job
>>>>>> in the original hostfile. You can then specify a subset of those hosts
>>>>>> to be
>>>>>> used by your original "master" program, and then specify a different
>>>>>> subset
>>>>>> to be used by the "slaves" when calling Spawn.
>>>>>>
>>>>>> But the system requires that you tell it -all- of the hosts that are
>>>>>> going
>>>>>> to be used at the beginning of the job.
>>>>>>
>>>>>> At the moment, there is no plan to remove that requirement, though
>>>>>> there has
>>>>>> been occasional discussion about doing so at some point in the future.
>>>>>> No
>>>>>> promises that it will happen, though - managed environments, in
>>>>>> particular,
>>>>>> currently object to the idea of changing the allocation on-the-fly. We
>>>>>> may,
>>>>>> though, make a provision for purely hostfile-based environments (i.e.,
>>>>>> unmanaged) at some time in the future.
>>>>>>
>>>>>> Ralph
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks in advance for any help.
>>>>>>>
>>>>>>> Elena
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users