Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-07-17 07:37:51


On Jul 16, 2007, at 2:28 PM, Matthew Moskewicz wrote:

>> MPI-2 does support the MPI_COMM_JOIN and MPI_COMM_ACCEPT/
>> MPI_COMM_CONNECT models. We do support this in Open MPI, but the
>> restrictions (in terms of ORTE) may not be sufficient for you.
>
> perhaps i'll experiment -- any clues as to what the orte
> restrictions might be?

The main constraint is that you have to run a "persistent" orted that
will span all your MPI_COMM_WORLD's. We have only lightly tested
this scenario -- Ralph, can you comment more here?

>> - It also likely doesn't work yet; we started the integration work
>> and ran into a technical issue that required further discussion with
>> Platform. They're currently looking into it; we stopped the LSF work
>> in ORTE until they get back to us.
>
> i see -- i might be trying to work on the 6.x support today. can you
> give me any hints on what the problem was in case i run into the same
> issue?

Something was wrong with the lsb_launch() function; using it caused a
significant slowdown in the job and it generally wasn't behaving as
expected. Platform issued a fix for me yesterday (i.e., a one-off/
unsupported binary for development purposes) that I haven't gotten to
test yet.

>> - That being said, MPI_THREAD_MULTIPLE and MPI_COMM_SPAWN *might*
>> offer a way out here. But I think a) THREAD_MULTIPLE isn't working
>> yet (other OMPI members are working on this), and b) even when
>> THREAD_MULTIPLE works, there will be ORTE issues to deal with
>> (canceling pending resource allocations, etc.). Ralph mentioned that
>> someone else is working on such things on the TM/PBS/Torque side; I
>> haven't followed that effort closely.
>
> it seems that MPI_THREAD_MULTIPLE is to be avoided for now, but there
> are perhaps other workarounds (using threads in other ways, etc.).
> also, i'd love to hear about the existing efforts -- i'm hoping
> someone working on them might be reading this ... ;)

Ralph -- can you chime in on the TM/PBS/Torque efforts?

-- 
Jeff Squyres
Cisco Systems