On May 3, 2011, at 7:04 PM, Maurice Feskanich wrote:
> My team has been tasked with integrating our grid engine with Open MPI. I'm looking for information that would shed light on how this is done. In essence, I need to integrate the way LSF or SGE is integrated. I've looked at the FAQ, and nosed around in the code, but I don't have a clear idea of which APIs to implement, or where my plugins would be used.
Be happy to provide advice. Without knowing anything about your grid engine, it's a tad difficult to know exactly what you need. In the case of SGE, all that was required was to identify a few envars and point the rsh launcher to "qrsh". For LSF, it took a little more work.
There are three main frameworks generally involved:
1. ras - determines what nodes are being used for this job. You'll see a "gridengine" plugin there that might serve as a model - it supports SGE.
2. plm - actually launches the ORTE daemons on the remote nodes. You might need your own, or you might be able to piggy-back on rsh the way SGE did - all depends on the specifics of your launcher.
3. ess - this contains whatever logic required by the launched daemons to identify their process name. If you have a launcher like SGE's, then the name is provided on the daemon cmd line, so no plugin is required. If you launch like LSF, which uses a batch launch method, then the daemons typically use something in their environ to determine their name - and a plugin would be required.
Each framework is in its respective orte/mca/xxx directory, with each plugin appropriately named underneath that directory. You'll also find an xxx.h file in each framework that describes the API that each plugin must support - often, though, it is easier to understand that API by just using one of the existing plug-ins as an example.
HTH - feel free to ask questions.
> Any and all pointers will be much appreciated,
> Maury Feskanich
> Oracle Corp.
> devel mailing list