On 20 January 2011 16:50, Olivier SANNIER <Olivier.SANNIER_at_[hidden]> wrote:
> So there is no dynamic discovery of nodes available on the network. Unless,
> of course, if I was to write a tool that would do it before the actual run
> is started.
That is in essence what a batch scheduler does.
OK, to be honest it has to be set up with a list of ths hosts you have
in the beginning.
(Actually - any Condor experts here - you can join a Condor pool
dynamically can't you?)
Once the batch scheduler knows all the hosts you have available, you
run a batch daemon on each machine,
for example the PBS Mom process or the Gridengine execd
The batch scheduler machine will keep track of which hosts respond -
any which do not respond are marked as 'dwon' and you will
not be able to schedule jobs on them.
the batch scheduler will decide which hosts are free to run jobs -
based on how many jobs are already running on a host,
and how busy the host is - indeed you can have your own metrics, such
as the numebr of liceses free for commercial software.
the batch scheduler then gives your program a list of hostnames -
which you in tuurn use with the 'mpirun' command
which actually fires off the MPI processes.