> there appear to be some overlaps between the ls_* and lsb_* functions,
> but they seem basically compatible as far as i can tell. almost all
> the functions have a command line version as well, for example:
> lsb_submit()/bsub

  Like openmpi and orte, there are two layers in LSF.  The ls_* API's
  talk to what is/was historically called "LSF Base" and the lsb_* API's
  talk to what is/was historically called "LSF Batch".

  The ls_* API's are essentially "do it now" type functionality for
  writing distributed applications that do not require batch functionality.
  The ls_* functions do not honour any batch allocation or policy in
  any shapre
> lsb_getalloc()/none and lsb_launch()/blaunch are new with LSF 7.0, but
> appear to just be a different (simpler) interface to existing
> functionality in the LSB_* env vars and the ls_rexec()/lsgrun commands
> -- although, as you say, perhaps platform will hook or enhance them
> later. but, the key issue is that lsb_launch() just starts tasks -- it
> does not perform or interact with the queue or job control (much?).
> so, you can't use these functions to get an allocation in the first
> place, and you have to be careful not to use them as a way around the
> queuing system.

  ls_* api's do not honour a batch allocation, while lsb_launch does.
  lsb_launch will only allow you to start tasks on nodes allocated to
  your jobs, and is subject to all the queue/job controls.

  ls_rexec/lsgrun are not used to start batch jobs

  In pre-7.0, the method for starting openmpi is essentially:

  $bsub -n N -a openmpi mpirun.lsf a.out

  Note that you only have the openmpi method and mpirun.lsf if you have
  installed the hpc extensions.

> [ as a side note, the function ls_rexecv()/lsgrun is the one i have
> heard admins do not like because it can break queuing/accounting, and
> might try to disable somehow. i don't really buy that, because it's
> not you can disable it and have the system still work, since (as
> above) || job launching depends on it. i guess if you really don't
> care about || launching maybe you could. but, if used properly after a
> proper allocation i don't think there should (or even can) be a
> problem. ]

  Job launching does not depend on it; and admins can explicitly
  turn off support for ls_rexec/lsgrun while allowing lsb_launch to
  continue to function -- thus ensuring that tasks of any form can only
  be started on nodes allocated to the job.

> so, lsb_submit()/bsub is a combination allocate/launch -- you specify
> the allocation size you want, and when it's all ready, it runs the
> 'job' (really the job launcher) only on one (randomly chosen) 'head'
> node from the allocation, with the env vars set so the launcher can
> use ls_rexec/lsgrun functions to start the rest of the job. there are
> of course various script wrappers you can use (mpijob, pvmjob, etc)
> instead of your 'real job'. then, i think lsf *should* try to track
> what processes get started via the wrapper / head process so it knows
> they are part of the same job. i dunno if it really does that -- but,
> my guess is that at the least it assumes the allocation is in use
> until the original process ends. in any case, the wrapper / head
> process examines the environment vars and uses ls_rexec()/lsgrun or
> the like to actually run N copies of the 'real job' executable. in
> 7.0, it can conveniently use lsb_getalloc() and lsb_launch(), but that
> doesn't really change any semantics as far as i know. one could
> imaging that calling lsb_launch() instead of ls_rexec() might be
> preferable from a process tracking point of view, but i don't see why
> Platform couldn't hook ls_rexec() just as well as lsb_launch().

  ls_rexec does not honour batch semantics.  Prior to LSF7 there is
  an additional parallel application manager that is started when the
  -a openmpi option is specified.  It handles I/O marshalling, signaling
  and task accounting for the complete parallel job across all nodes.
  In LSF7, this functionaly has been embedded directly into the RES daemon
  and is invoked when lsb_launch is used.

  yes you could use ls_rexec but it does not handle the I/O and process
  marshalling - you need to handle that yourself if you use ls_rexec.

  The first node is node random, it is the "best" match within the allocation
  based on the resource requirements for the job

  Since you are refering to the mpijob/pvmjob scripts I would guess
  you do not have the HPC extensions installed, as these are fairly
  simplistic wrappers that don't make use of the parallel application

> there is also an lsb_runjob() that is similar to lsb_launch(), but for
> an already submitted job. so, if one were to lsb_sumbit() with an
> option set to never launch it automatically, and then one were to run
> lsb_runjob(), you can avoid the queue and/or force the use of certain
> hosts? i guess this is also not a good function to use, but at least
> the queuing system would be aware of any bad behavior (queue skipping
> via ls_placereq() to get extra hosts, for instance) in this case ...

  Not really - lsb_runjob() is essentially an admin function to force
  a job to run irrespective of the current policies/priorities/allocations.
  Unless you have administrator privs it will fail.

  As for growing or shrinking the allocation for a job, that is on the
  the roadmap for the near future.  However, as Jeff has previously
  mentioned, on a busy system you could end up waiting for a long time
  to get additional nodes.

  Essentially it boils down to make an asynchronous request for additional
  resources and registering a callback for when something can be allocated.


Bill McMillan
Principal Technical Product Manager
Platform Computing