Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] status of LSF integration work?
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-02-12 15:22:33


There are two issues:

- You must have a recent enough version of LSF. I'm afraid I don't
remember the LSF version number offhand, but we both (OMPI and LSF)
had to make some changes/fixes to achieve compatibility.

- LSF compatibility in OMPI is scheduled for v1.3 (i.e., it doesn't
exist in the v1.2 series). As Ralph indicated, we're aware that it's
currently broken in the trunk -- it'll be fixed by the v1.3 release,
but I don't know exactly when. To be blunt: I wouldn't count on it in
a production environment until v1.3 is officially released. Betas may
become available before v1.3 goes gold that would be suitable for
testing, though.

Here's the OMPI v1.3 roadmap document -- it's more-or-less continually
updated:

     https://svn.open-mpi.org/trac/ompi/milestone/Open%20MPI%201.3

On Feb 11, 2008, at 10:36 PM, Ralph Castain wrote:

> Jeff and I chatted about this today, in fact. We know the LSF
> support is
> borked, but neither of us had time right now to fix it. We plan to
> do so,
> though, before the 1.3 release - just can't promise when.
>
> Ralph
>
>
>
> On 2/11/08 8:00 AM, "Eric Jones" <ejon_at_[hidden]> wrote:
>
>> Greetings, MPI mavens,
>>
>> Perhaps this belongs on users@, but since it's about development
>> status
>> I thought I start here. I've fairly recently gotten involved in
>> getting
>> an MPI environment configured for our institute. We have an existing
>> LSF cluster because most of our work is more High-Throughput than
>> High-Performance, so if I can use LSF to underlie our MPI
>> environment,
>> that'd be administratively easiest.
>>
>> I tried to compile the LSF support in the public SVN repo and
>> noticed it
>> was, er, broken. I'll include the trivial changes we made below.
>> But
>> the behavior is still fairly unpredictable, mostly involving mpirun
>> never spinning up daemons on other nodes.
>>
>> I saw mention that work was being suspended on LSF support pending
>> technical improvements on the LSF side (mentioning that Platform had
>> provided a patch or try.)
>>
>> Can I assume, based on the inactivity in the repo, that Platform
>> hasn't
>> resolved the issue?
>>
>> Thanks,
>> Eric
>>
>> ------------------------
>> Here're the diffs to get LSF support to compile. We also made a
>> change
>> so it would report the LSF failure code instead of an uninitialized
>> variable when it fails:
>>
>> Index: pls_lsf_module.c
>> ===================================================================
>> --- pls_lsf_module.c (revision 17234)
>> +++ pls_lsf_module.c (working copy)
>> @@ -304,7 +304,7 @@
>> */
>> if (lsb_launch(nodelist_argv, argv, LSF_DJOB_NOWAIT, env) < 0) {
>> ORTE_ERROR_LOG(ORTE_ERR_FAILED_TO_START);
>> - opal_output(0, "lsb_launch failed: %d", rc);
>> + opal_output(0, "lsb_launch failed: %d", lsberrno);
>> rc = ORTE_ERR_FAILED_TO_START;
>> goto cleanup;
>> }
>> @@ -356,7 +356,7 @@
>>
>> /* check for failed launch - if so, force terminate */
>> if (failed_launch) {
>> - if (ORTE_SUCCESS !=
>> +/* if (ORTE_SUCCESS != */
>> orte_pls_base_daemon_failed(jobid, false, -1, 0,
>> ORTE_JOB_STATE_FAILED_TO_START);
>> }
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
Cisco Systems