I really would like to help, but I am not sure how much time will I have in the very near future ( we are expecting a babygirl delivery ). 


On 8/6/08, Open MPI <bugs@open-mpi.org> wrote:
#1435: Crash on PPC (with SMT off) when using mpi_paffinity alone
-------------------+--------------------------------------------------------

Reporter:  jnysal  |        Owner:  rhc

    Type:  defect  |       Status:  new
Priority:  major   |    Milestone:  Open MPI 1.3

Version:          |   Resolution:
Keywords:          |
-------------------+--------------------------------------------------------
Changes (by rhc):

  * owner:  jnysal => rhc
  * status:  assigned => new


Comment:

  Several of us have had a telecon on this subject, and have a proposed
  solution:

  The real root of the problem here is that we never clearly delineated
  between physical and logical processors in OMPI. Instead, there was an
  implicit assumption that the two were one-and-the-same. Thus, if a user
  specified a slot_list, we just directly dumped that into the paffinity
  subsystem.

  Unfortunately, when we use paffinity_alone and automatically map the ranks
  to processors, we again just passed the info the paffinity subsystem -
  without clearly indicating whether this was a physical processor or
  logical processor.

  Our feeling is that we need to cleanly handle both physical and logical
  processor specifications. Accordingly, we propose to do the following:

  1. modify the opal_paffinity_base_get API to add a boolean flag indicating
  we want logical (true) or physical (false) processor id's in the returned
  cpumask

  2. modify the opal_paffinity_base_set API to add a boolean flag indicating
  we provided logical (true) or physical (false) processor id's in the
  cpumask

  3. modify the opal_paffinity linux and solaris components to do the
  necessary mapping to handle the two cases so that we bind or return data
  according to the new flag

  4. modify ompi_mpi_init so that mpi_paffinity_alone indicates the
  automatic binding is to be done on the basis of logical processor id's

  5. modify the syntax of the slot_list mca param so that it defaults to
  logical processor ids, but allows the user to prepend the specification
  with a "P" or "p" to indicate these are physical processor id's. This will
  also be applied to the parsing of the rank_file mapping file.

  6. modify the places that utilize that param to handle the new syntax,
  including the opal_paffinity_base_slot_list_set and its companion
  functions

  7. update the documentation to reflect the changed syntax

  Terry has volunteered to modify the paffinity components. Ralph will do
  the ORTE-level stuff and mpi_init, and likely the slot_list stuff too
  (unless Lenny has time and is willing to help there?). This will be done
  on a new Hg branch that Ralph will create - will post the access info here
  later today.

  Any comments? Please post soon so we don't go too far down path before we
  hear them!


--
Ticket URL: <https://svn.open-mpi.org/trac/ompi/ticket/1435#comment:18>

Open MPI <http://www.open-mpi.org/>


_______________________________________________
bugs mailing list
bugs@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/bugs