Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [OMPI bugs] [Open MPI] #1435: Crash on PPC (with SMT off) when using mpi_paffinity alone
From: Terry Dontje (Terry.Dontje_at_[hidden])
Date: 2008-08-07 08:21:13


No problem Lenny, I am looking at this now.

--td

Lenny Verkhovsky wrote:
>
> I really would like to help, but I am not sure how much time will I
> have in the very near future ( we are expecting a babygirl delivery ).
>
>
> On 8/6/08, *Open MPI* <bugs_at_[hidden] <mailto:bugs_at_[hidden]>>
> wrote:
>
> #1435: Crash on PPC (with SMT off) when using mpi_paffinity alone
> -------------------+--------------------------------------------------------
>
> Reporter: jnysal | Owner: rhc
>
> Type: defect | Status: new
> Priority: major | Milestone: Open MPI 1.3
>
> Version: | Resolution:
> Keywords: |
> -------------------+--------------------------------------------------------
> Changes (by rhc):
>
> * owner: jnysal => rhc
> * status: assigned => new
>
>
> Comment:
>
> Several of us have had a telecon on this subject, and have a
> proposed
> solution:
>
> The real root of the problem here is that we never clearly
> delineated
> between physical and logical processors in OMPI. Instead, there
> was an
> implicit assumption that the two were one-and-the-same. Thus, if
> a user
> specified a slot_list, we just directly dumped that into the
> paffinity
> subsystem.
>
> Unfortunately, when we use paffinity_alone and automatically map
> the ranks
> to processors, we again just passed the info the paffinity
> subsystem -
> without clearly indicating whether this was a physical processor or
> logical processor.
>
> Our feeling is that we need to cleanly handle both physical and
> logical
> processor specifications. Accordingly, we propose to do the
> following:
>
> 1. modify the opal_paffinity_base_get API to add a boolean flag
> indicating
> we want logical (true) or physical (false) processor id's in the
> returned
> cpumask
>
> 2. modify the opal_paffinity_base_set API to add a boolean flag
> indicating
> we provided logical (true) or physical (false) processor id's in the
> cpumask
>
> 3. modify the opal_paffinity linux and solaris components to do the
> necessary mapping to handle the two cases so that we bind or
> return data
> according to the new flag
>
> 4. modify ompi_mpi_init so that mpi_paffinity_alone indicates the
> automatic binding is to be done on the basis of logical
> processor id's
>
> 5. modify the syntax of the slot_list mca param so that it
> defaults to
> logical processor ids, but allows the user to prepend the
> specification
> with a "P" or "p" to indicate these are physical processor id's.
> This will
> also be applied to the parsing of the rank_file mapping file.
>
> 6. modify the places that utilize that param to handle the new
> syntax,
> including the opal_paffinity_base_slot_list_set and its companion
> functions
>
> 7. update the documentation to reflect the changed syntax
>
> Terry has volunteered to modify the paffinity components. Ralph
> will do
> the ORTE-level stuff and mpi_init, and likely the slot_list
> stuff too
> (unless Lenny has time and is willing to help there?). This will
> be done
> on a new Hg branch that Ralph will create - will post the access
> info here
> later today.
>
> Any comments? Please post soon so we don't go too far down path
> before we
> hear them!
>
>
> --
> Ticket URL:
> <https://svn.open-mpi.org/trac/ompi/ticket/1435#comment:18>
>
> Open MPI <http://www.open-mpi.org/>
>
>
> _______________________________________________
> bugs mailing list
> bugs_at_[hidden] <mailto:bugs_at_[hidden]>
> http://www.open-mpi.org/mailman/listinfo.cgi/bugs
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>