Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] How to add a schedule algorithm to the pml
From: Rainer Keller (keller_at_[hidden])
Date: 2010-09-27 05:37:34

please note, that the slides of the conference's program have been uploaded to

Best regards,

On Wednesday 22 September 2010 17:53:12 Kenneth Lloyd wrote:
> Jeff,
> Is that EuroMPI2010 ob1 paper publicly available? I get involved in various
> NUMA partitioning/architecting studies and it seems there is not a lot of
> discussion in this area.
> Ken Lloyd
> ==================
> Kenneth A. Lloyd
> Watt Systems Technologies Inc.
> -----Original Message-----
> From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]] On
> Behalf Of Jeff Squyres Sent: Wednesday, September 22, 2010 6:00 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] How to add a schedule algorithm to the pml
> Sorry for the delay in replying -- I was in Europe for the past two weeks;
> travel always makes me waaaay behind on my INBOX...
> On Sep 14, 2010, at 9:56 PM, 张晶 wrote:
> > I tried to add a schedule algorithm to the pml component ,ob1 etc. Poorly
> > I can only find a paper named "Open MPI: A Flexible High Performance
> > MPI" and some annotation in the source file. From them , I know ob1 has
> > implemented round-robin& weighted distribution algorithm. But after
> > tracking the MPI_Send(),I cann't figure out the location of these
> > implement ,let alone to add a new schedule algorithm. I have two
> > questions :
> > 1.The location of the schedule algorithm ?
> It's complicated -- I'd say that the PML is probably among the most
> complicated sections of Open MPI because it is the main "engine" that
> enforces the MPI point-to-point semantics. The algorithm is fairly well
> distribute throughout the PML source code. :-\
> > 2.There are five components :cm,crcpw ,csum ,ob1,V in the pml framework .
> > The function of these components?
> cm: this component drives the MTL point-to-point components. It is mainly
> a thin wrapper for network transports that provide their own MPI-like
> matching semantics. Hence, most of the MPI semantics are effectively done
> in the lower layer (i.e., in the MTL components and their dependent
> libraries). You probably won't be able to do much here, because such
> transports (MX, Portals, etc.) do most of their semantics in the network
> layer -- not in Open MPI. If you have a matching network layer, this is
> the PML that you probably use (MX, Portals, PSM).
> crcpw: this is a fork of the ob1 PML; it add some failover semantics.
> csum: this is also a fork of the ob1 PML; it adds checksumming semantics
> (so you can tell if the underlying transport had an error).
> v: this PML uses logging and replay to effect some level of fault
> tolerance. It's a distant fork of the ob1 PML, but has quite a few
> significant differences.
> ob1: this is the "main" PML that most users use (TCP, shared memory,
> OpenFabrics, etc.). It gangs together one or more BTLs to send/receive
> messages across individual network transports. Hence, it supports true
> multi-device/multi-rail algorithms. The BML (BTL multiplexing layer) is a
> thin management later that marshals all the BTLs in the process together
> -- it's mainly array handling, etc. The ob1 PML is the one that decides
> multi-rail/device splitting, etc. The INRIA folks just published a paper
> last week at Euro MPI about adjusting the ob1 scheduling algorithm to also
> take NUMA/NUNA/NUIOA effects into account, not just raw bandwidth
> calculations.
> Hope this helps!

 Dr.-Ing. Rainer Keller
 HLRS                         Tel: ++49 (0)711-685 6 5858
 Nobelstrasse 19                 Fax: ++49 (0)711-685 6 5832
 70550 Stuttgart                    email: keller_at_[hidden]     
 Germany                             AIM/Skype:rusraink