De : users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] De la part de John Hearns
Envoyé : vendredi 21 janvier 2011 11:35
À : Open MPI Users
Objet : Re: [OMPI users] Help with some fundamentals
On 20 January 2011 16:50, Olivier SANNIER <Olivier.SANNIER_at_[hidden]> wrote:
> So there is no dynamic discovery of nodes available on the network.
> Unless, of course, if I was to write a tool that would do it before
> the actual run is started.
That is in essence what a batch scheduler does.
OK, to be honest it has to be set up with a list of ths hosts you have in the beginning.
(Actually - any Condor experts here - you can join a Condor pool dynamically can't you?)
Once the batch scheduler knows all the hosts you have available, you run a batch daemon on each machine, for example the PBS Mom process or the Gridengine execd The batch scheduler machine will keep track of which hosts respond - any which do not respond are marked as 'dwon' and you will not be able to schedule jobs on them.
the batch scheduler will decide which hosts are free to run jobs - based on how many jobs are already running on a host, and how busy the host is - indeed you can have your own metrics, such as the numebr of liceses free for commercial software.
the batch scheduler then gives your program a list of hostnames - which you in tuurn use with the 'mpirun' command which actually fires off the MPI processes.
Thanks, that makes it clearer to me now.