Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Bill McMillan (bmcmillan_at_[hidden])
Date: 2007-07-17 23:01:29


> there appear to be some overlaps between the ls_* and lsb_* functions,

> but they seem basically compatible as far as i can tell. almost all
> the functions have a command line version as well, for example:
> lsb_submit()/bsub

  Like openmpi and orte, there are two layers in LSF. The ls_* API's
  talk to what is/was historically called "LSF Base" and the lsb_* API's
  talk to what is/was historically called "LSF Batch".

  The ls_* API's are essentially "do it now" type functionality for
  writing distributed applications that do not require batch
functionality.
  The ls_* functions do not honour any batch allocation or policy in
  any shapre
 
> lsb_getalloc()/none and lsb_launch()/blaunch are new with LSF 7.0, but

> appear to just be a different (simpler) interface to existing
> functionality in the LSB_* env vars and the ls_rexec()/lsgrun commands

> -- although, as you say, perhaps platform will hook or enhance them
> later. but, the key issue is that lsb_launch() just starts tasks -- it

> does not perform or interact with the queue or job control (much?).
> so, you can't use these functions to get an allocation in the first
> place, and you have to be careful not to use them as a way around the
> queuing system.

  ls_* api's do not honour a batch allocation, while lsb_launch does.
  lsb_launch will only allow you to start tasks on nodes allocated to
  your jobs, and is subject to all the queue/job controls.

  ls_rexec/lsgrun are not used to start batch jobs

  In pre-7.0, the method for starting openmpi is essentially:

  $bsub -n N -a openmpi mpirun.lsf a.out

  Note that you only have the openmpi method and mpirun.lsf if you have
  installed the hpc extensions.

> [ as a side note, the function ls_rexecv()/lsgrun is the one i have
> heard admins do not like because it can break queuing/accounting, and
> might try to disable somehow. i don't really buy that, because it's
> not you can disable it and have the system still work, since (as
> above) || job launching depends on it. i guess if you really don't
> care about || launching maybe you could. but, if used properly after a

> proper allocation i don't think there should (or even can) be a
> problem. ]

  Job launching does not depend on it; and admins can explicitly
  turn off support for ls_rexec/lsgrun while allowing lsb_launch to
  continue to function -- thus ensuring that tasks of any form can only
  be started on nodes allocated to the job.

> so, lsb_submit()/bsub is a combination allocate/launch -- you specify
> the allocation size you want, and when it's all ready, it runs the
> 'job' (really the job launcher) only on one (randomly chosen) 'head'
> node from the allocation, with the env vars set so the launcher can
> use ls_rexec/lsgrun functions to start the rest of the job. there are
> of course various script wrappers you can use (mpijob, pvmjob, etc)
> instead of your 'real job'. then, i think lsf *should* try to track
> what processes get started via the wrapper / head process so it knows
> they are part of the same job. i dunno if it really does that -- but,
> my guess is that at the least it assumes the allocation is in use
> until the original process ends. in any case, the wrapper / head
> process examines the environment vars and uses ls_rexec()/lsgrun or
> the like to actually run N copies of the 'real job' executable. in
> 7.0, it can conveniently use lsb_getalloc() and lsb_launch(), but that

> doesn't really change any semantics as far as i know. one could
> imaging that calling lsb_launch() instead of ls_rexec() might be
> preferable from a process tracking point of view, but i don't see why
> Platform couldn't hook ls_rexec() just as well as lsb_launch().

  ls_rexec does not honour batch semantics. Prior to LSF7 there is
  an additional parallel application manager that is started when the
  -a openmpi option is specified. It handles I/O marshalling, signaling
  and task accounting for the complete parallel job across all nodes.
  In LSF7, this functionaly has been embedded directly into the RES
daemon
  and is invoked when lsb_launch is used.

  yes you could use ls_rexec but it does not handle the I/O and process
  marshalling - you need to handle that yourself if you use ls_rexec.

  The first node is node random, it is the "best" match within the
allocation
  based on the resource requirements for the job

  Since you are refering to the mpijob/pvmjob scripts I would guess
  you do not have the HPC extensions installed, as these are fairly
  simplistic wrappers that don't make use of the parallel application
  manager.

> there is also an lsb_runjob() that is similar to lsb_launch(), but for

> an already submitted job. so, if one were to lsb_sumbit() with an
> option set to never launch it automatically, and then one were to run
> lsb_runjob(), you can avoid the queue and/or force the use of certain
> hosts? i guess this is also not a good function to use, but at least
> the queuing system would be aware of any bad behavior (queue skipping
> via ls_placereq() to get extra hosts, for instance) in this case ...

  Not really - lsb_runjob() is essentially an admin function to force
  a job to run irrespective of the current
policies/priorities/allocations.
  Unless you have administrator privs it will fail.

  As for growing or shrinking the allocation for a job, that is on the
  the roadmap for the near future. However, as Jeff has previously
  mentioned, on a busy system you could end up waiting for a long time
  to get additional nodes.

  Essentially it boils down to make an asynchronous request for
additional
  resources and registering a callback for when something can be
allocated.

  Regards,
  Bill

-------------
Bill McMillan
Principal Technical Product Manager
Platform Computing