Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI based HLA/RTI ?
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-04-16 09:51:06


Just curious: I thought ULFM dealt with recovering an MPI job where one or more processes fail. Is this correct?

HLA/RTI consists of processes that start at random times, run to completion, and then exit normally. While a failure could occur, most process terminations are normal and there is no need/intent to revive them. So it's mostly a case of massively exercising MPI's dynamic connect/accept/disconnect functions.

Do ULFM's structures have some utility for that purpose?

On Apr 16, 2013, at 3:20 AM, George Bosilca <bosilca_at_[hidden]> wrote:

> There is an ongoing effort to address the potential volatility of processes in MPI called ULFM. There is a working version available at http://fault-tolerance.org. It supports TCP, sm and IB (mostly). You will find some examples, and the document explaining the additional constructs needed in MPI to achieve this.
>
> George.
>
> On Apr 15, 2013, at 17:29 , John Chludzinski <john.chludzinski_at_[hidden]> wrote:
>
>> That would seem to preclude its use for an RTI. Unless you have a card up your sleeve?
>>
>> ---John
>>
>>
>> On Mon, Apr 15, 2013 at 11:23 AM, Ralph Castain <rhc_at_[hidden]> wrote:
>> It isn't the fact that there are multiple programs being used - we support that just fine. The problem with HLA/RTI is that it allows programs to come/go at will - i.e., not every program has to start at the same time, nor complete at the same time. MPI requires that all programs be executing at the beginning, and that all call finalize prior to anyone exiting.
>>
>>
>> On Apr 15, 2013, at 8:14 AM, John Chludzinski <john.chludzinski_at_[hidden]> wrote:
>>
>>> I just received an e-mail notifying me that MPI-2 supports MPMD. This would seen to be just what the doctor ordered?
>>>
>>> ---John
>>>
>>>
>>> On Mon, Apr 15, 2013 at 11:10 AM, Ralph Castain <rhc_at_[hidden]> wrote:
>>> FWIW: some of us are working on a variant of MPI that would indeed support what you describe - it would support send/recv (i.e., MPI-1), but not collectives, and so would allow communication between arbitrary programs.
>>>
>>> Not specifically targeting HLA/RTI, though I suppose a wrapper that conformed to that standard could be created.
>>>
>>> On Apr 15, 2013, at 7:50 AM, John Chludzinski <john.chludzinski_at_[hidden]> wrote:
>>>
>>> > This would be a departure from the SPMD paradigm that seems central to
>>> > MPI's design. Each process would be a completely different program
>>> > (piece of code) and I'm not sure how well that would working using
>>> > MPI?
>>> >
>>> > BTW, MPI is commonly used in the parallel discrete even world for
>>> > communication between LPs (federates in HLA). But these LPs are
>>> > usually the same program.
>>> >
>>> > ---John
>>> >
>>> > On Mon, Apr 15, 2013 at 10:22 AM, John Chludzinski
>>> > <john.chludzinski_at_[hidden]> wrote:
>>> >> Is anyone aware of an MPI based HLA/RTI (DoD High Level Architecture
>>> >> (HLA) / Runtime Infrastructure)?
>>> >>
>>> >> ---John
>>> > _______________________________________________
>>> > users mailing list
>>> > users_at_[hidden]
>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users