Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] PML selection logic
From: Brian W. Barrett (brbarret_at_[hidden])
Date: 2008-06-23 15:21:49


The selection code was added because frequently high speed interconnects
fail to initialize properly due to random stuff happening (yes, that's a
horrible statement, but true). We ran into a situation with some really
flaky machines where most of the processes would chose CM, but a couple
would fail to initialize the MTL and therefore chose OB1. This lead to a
hang situation, which is the worst of the worst.

I think #1 is adequate, although it doesn't handle spawn particularly
well. And spawn is generally used in environments where such network
mismatches are most likely to occur.

Brian

On Mon, 23 Jun 2008, Ralph H Castain wrote:

> Since my goal is to eliminate the modex completely for managed
> installations, could you give me a brief understanding of this eventual PML
> selection logic? It would help to hear an example of how and why different
> procs could get different answers - and why we would want to allow them to
> do so.
>
> Thanks
> Ralph
>
>
>
> On 6/23/08 11:59 AM, "Aurélien Bouteiller" <bouteill_at_[hidden]> wrote:
>
>> The first approach sounds fair enough to me. We should avoid 2 and 3
>> as the pml selection mechanism used to be
>> more complex before we reduced it to accommodate a major design bug in
>> the BTL selection process. When using the complete PML selection, BTL
>> would be initialized several times, leading to a variety of bugs.
>> Eventually the PML selection should return to its old self, when the
>> BTL bug gets fixed.
>>
>> Aurelien
>>
>> Le 23 juin 08 à 12:36, Ralph H Castain a écrit :
>>
>>> Yo all
>>>
>>> I've been doing further research into the modex and came across
>>> something I
>>> don't fully understand. It seems we have each process insert into
>>> the modex
>>> the name of the PML module that it selected. Once the modex has
>>> exchanged
>>> that info, it then loops across all procs in the job to check their
>>> selection, and aborts if any proc picked a different PML module.
>>>
>>> All well and good...assuming that procs actually -can- choose
>>> different PML
>>> modules and hence create an "abort" scenario. However, if I look
>>> inside the
>>> PML's at their selection logic, I find that a proc can ONLY pick a
>>> module
>>> other than ob1 if:
>>>
>>> 1. the user specifies the module to use via -mca pml xyz or by using a
>>> module specific mca param to adjust its priority. In this case,
>>> since the
>>> mca param is propagated, ALL procs have no choice but to pick that
>>> same
>>> module, so that can't cause us to abort (we will have already
>>> returned an
>>> error and aborted if the specified module can't run).
>>>
>>> 2. the pml/cm module detects that an MTL module was selected, and
>>> that it is
>>> other than "psm". In this case, the CM module will be selected
>>> because its
>>> default priority is higher than that of OB1.
>>>
>>> In looking deeper into the MTL selection logic, it appears to me
>>> that you
>>> either have the required capability or you don't. I can see that in
>>> some
>>> environments (e.g., rsh across unmanaged collections of machines),
>>> it might
>>> be possible for someone to launch across a set of machines where
>>> some do and
>>> some don't have the required support. However, in all other cases,
>>> this will
>>> be homogeneous across the system.
>>>
>>> Given this analysis (and someone more familiar with the PML should
>>> feel free
>>> to confirm or correct it), it seems to me that this could be
>>> streamlined via
>>> one or more means:
>>>
>>> 1. at the most, we could have rank=0 add the PML module name to the
>>> modex,
>>> and other procs simply check it against their own and return an
>>> error if
>>> they differ. This accomplishes the identical functionality to what
>>> we have
>>> today, but with much less info in the modex.
>>>
>>> 2. we could eliminate this info from the modex altogether by
>>> requiring the
>>> user to specify the PML module if they want something other than the
>>> default
>>> OB1. In this case, there can be no confusion over what each proc is
>>> to use.
>>> The CM module will attempt to init the MTL - if it cannot do so,
>>> then the
>>> job will return the correct error and tell the user that CM/MTL
>>> support is
>>> unavailable.
>>>
>>> 3. we could again eliminate the info by not inserting it into the
>>> modex if
>>> (a) the default PML module is selected, or (b) the user specified
>>> the PML
>>> module to be used. In the first case, each proc can simply check to
>>> see if
>>> they picked the default - if not, then we can insert the info to
>>> indicate
>>> the difference. Thus, in the "standard" case, no info will be
>>> inserted.
>>>
>>> In the second case, we will already get an error if the specified
>>> PML module
>>> could not be used. Hence, the modex check provides no additional
>>> info or
>>> value.
>>>
>>> I understand the motivation to support automation. However, in this
>>> case,
>>> the automation actually doesn't seem to buy us very much, and it isn't
>>> coming "free". So perhaps some change in how this is done would be
>>> in order?
>>>
>>> Ralph
>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>