> there appear to be some overlaps between the ls_* and lsb_* functions,
> but they seem basically compatible as far as i can tell. almost all
> the functions have a command line version as well, for example:
Like openmpi and orte, there are two layers in LSF. The ls_* API's
talk to what is/was historically called "LSF Base" and the lsb_* API's
talk to what is/was historically called "LSF Batch".
The ls_* API's are essentially "do it now" type functionality for
writing distributed applications that do not require batch functionality.
The ls_* functions do not honour any batch allocation or policy in
> lsb_getalloc()/none and lsb_launch()/blaunch are new with LSF 7.0, but
> appear to just be a different (simpler) interface to existing
> functionality in the LSB_* env vars and the ls_rexec()/lsgrun commands
> -- although, as you say, perhaps platform will hook or enhance them
> later. but, the key issue is that lsb_launch() just starts tasks -- it
> does not perform or interact with the queue or job control (much?).
> so, you can't use these functions to get an allocation in the first
> place, and you have to be careful not to use them as a way around the
> queuing system.
ls_* api's do not honour a batch allocation, while lsb_launch does.
lsb_launch will only allow you to start tasks on nodes allocated to
your jobs, and is subject to all the queue/job controls.
ls_rexec/lsgrun are not used to start batch jobs
In pre-7.0, the method for starting openmpi is essentially:
$bsub -n N -a openmpi mpirun.lsf a.out
Note that you only have the openmpi method and mpirun.lsf if you have
installed the hpc extensions.
> [ as a side note, the function ls_rexecv()/lsgrun is the one i have
> heard admins do not like because it can break queuing/accounting, and
> might try to disable somehow. i don't really buy that, because it's
> not you can disable it and have the system still work, since (as
> above) || job launching depends on it. i guess if you really don't
> care about || launching maybe you could. but, if used properly after a
> proper allocation i don't think there should (or even can) be a
> problem. ]
Job launching does not depend on it; and admins can explicitly
turn off support for ls_rexec/lsgrun while allowing lsb_launch to
continue to function -- thus ensuring that tasks of any form can only
be started on nodes allocated to the job.
> so, lsb_submit()/bsub is a combination allocate/launch -- you specify
> the allocation size you want, and when it's all ready, it runs the
> 'job' (really the job launcher) only on one (randomly chosen) 'head'
> node from the allocation, with the env vars set so the launcher can
> use ls_rexec/lsgrun functions to start the rest of the job. there are
> of course various script wrappers you can use (mpijob, pvmjob, etc)
> instead of your 'real job'. then, i think lsf *should* try to track
> what processes get started via the wrapper / head process so it knows
> they are part of the same job. i dunno if it really does that -- but,
> my guess is that at the least it assumes the allocation is in use
> until the original process ends. in any case, the wrapper / head
> process examines the environment vars and uses ls_rexec()/lsgrun or
> the like to actually run N copies of the 'real job' executable. in
> 7.0, it can conveniently use lsb_getalloc() and lsb_launch(), but that
> doesn't really change any semantics as far as i know. one could
> imaging that calling lsb_launch() instead of ls_rexec() might be
> preferable from a process tracking point of view, but i don't see why
> Platform couldn't hook ls_rexec() just as well as lsb_launch().
ls_rexec does not honour batch semantics. Prior to LSF7 there is
an additional parallel application manager that is started when the
-a openmpi option is specified. It handles I/O marshalling, signaling
and task accounting for the complete parallel job across all nodes.
In LSF7, this functionaly has been embedded directly into the RES daemon
and is invoked when lsb_launch is used.
yes you could use ls_rexec but it does not handle the I/O and process
marshalling - you need to handle that yourself if you use ls_rexec.
The first node is node random, it is the "best" match within the allocation
based on the resource requirements for the job
Since you are refering to the mpijob/pvmjob scripts I would guess
you do not have the HPC extensions installed, as these are fairly
simplistic wrappers that don't make use of the parallel application
> there is also an lsb_runjob() that is similar to lsb_launch(), but for
> an already submitted job. so, if one were to lsb_sumbit() with an
> option set to never launch it automatically, and then one were to run
> lsb_runjob(), you can avoid the queue and/or force the use of certain
> hosts? i guess this is also not a good function to use, but at least
> the queuing system would be aware of any bad behavior (queue skipping
> via ls_placereq() to get extra hosts, for instance) in this case ...
Not really - lsb_runjob() is essentially an admin function to force
a job to run irrespective of the current policies/priorities/allocations.
Unless you have administrator privs it will fail.
As for growing or shrinking the allocation for a job, that is on the
the roadmap for the near future. However, as Jeff has previously
mentioned, on a busy system you could end up waiting for a long time
to get additional nodes.
Essentially it boils down to make an asynchronous request for additional
resources and registering a callback for when something can be allocated.
Principal Technical Product Manager