Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [OMPI bugs] [Open MPI] #1435: Crash on PPC (with SMT off) when using mpi_paffinity alone
From: Lenny Verkhovsky (lenny.verkhovsky_at_[hidden])
Date: 2008-08-07 07:44:20


I really would like to help, but I am not sure how much time will I have in
the very near future ( we are expecting a babygirl delivery ).

On 8/6/08, Open MPI <bugs_at_[hidden]> wrote:
>
> #1435: Crash on PPC (with SMT off) when using mpi_paffinity alone
>
> -------------------+--------------------------------------------------------
>
> Reporter: jnysal | Owner: rhc
>
> Type: defect | Status: new
> Priority: major | Milestone: Open MPI 1.3
>
> Version: | Resolution:
> Keywords: |
>
> -------------------+--------------------------------------------------------
> Changes (by rhc):
>
> * owner: jnysal => rhc
> * status: assigned => new
>
>
> Comment:
>
> Several of us have had a telecon on this subject, and have a proposed
> solution:
>
> The real root of the problem here is that we never clearly delineated
> between physical and logical processors in OMPI. Instead, there was an
> implicit assumption that the two were one-and-the-same. Thus, if a user
> specified a slot_list, we just directly dumped that into the paffinity
> subsystem.
>
> Unfortunately, when we use paffinity_alone and automatically map the
> ranks
> to processors, we again just passed the info the paffinity subsystem -
> without clearly indicating whether this was a physical processor or
> logical processor.
>
> Our feeling is that we need to cleanly handle both physical and logical
> processor specifications. Accordingly, we propose to do the following:
>
> 1. modify the opal_paffinity_base_get API to add a boolean flag
> indicating
> we want logical (true) or physical (false) processor id's in the returned
> cpumask
>
> 2. modify the opal_paffinity_base_set API to add a boolean flag
> indicating
> we provided logical (true) or physical (false) processor id's in the
> cpumask
>
> 3. modify the opal_paffinity linux and solaris components to do the
> necessary mapping to handle the two cases so that we bind or return data
> according to the new flag
>
> 4. modify ompi_mpi_init so that mpi_paffinity_alone indicates the
> automatic binding is to be done on the basis of logical processor id's
>
> 5. modify the syntax of the slot_list mca param so that it defaults to
> logical processor ids, but allows the user to prepend the specification
> with a "P" or "p" to indicate these are physical processor id's. This
> will
> also be applied to the parsing of the rank_file mapping file.
>
> 6. modify the places that utilize that param to handle the new syntax,
> including the opal_paffinity_base_slot_list_set and its companion
> functions
>
> 7. update the documentation to reflect the changed syntax
>
> Terry has volunteered to modify the paffinity components. Ralph will do
> the ORTE-level stuff and mpi_init, and likely the slot_list stuff too
> (unless Lenny has time and is willing to help there?). This will be done
> on a new Hg branch that Ralph will create - will post the access info
> here
> later today.
>
> Any comments? Please post soon so we don't go too far down path before we
> hear them!
>
>
> --
> Ticket URL: <https://svn.open-mpi.org/trac/ompi/ticket/1435#comment:18>
>
> Open MPI <http://www.open-mpi.org/>
>
>
> _______________________________________________
> bugs mailing list
> bugs_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/bugs
>