Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Seeking input for an RFC
From: Joshua Ladd (joshual_at_[hidden])
Date: 2014-04-02 11:20:55


For Sandia and LANL's benefit, I am attaching the patch that implements the proposed changes. These are entirely preliminary/in-house changes and should not be considered a production grade solution - I just want to give folks a chance to see the basic ideas. Let me know if you guys need more info.

Best,

Josh

From: devel [mailto:devel-bounces_at_[hidden]] On Behalf Of Joshua Ladd
Sent: Tuesday, April 01, 2014 11:15 AM
To: Open MPI Developers (devel_at_[hidden])
Subject: [OMPI devel] Seeking input for an RFC

Soliciting input from the community:

WHAT: Modify PML cm component to remove unnecessary initializations, optimizing blocking operations

WHY: Remove overhead in fast-path by allowing a "direct mode" increases single packet latency

HOW: In PML cm, even if the request starts and ends within the scope of the blocking send/recv function,
              A full request, a structure of up to 488 bytes (not including the MTL request appendix size) may be initialized.
              The request includes the opmi_request_t structure, used by an underlying MTL component, the converter
              which corresponds to the datatype and other parameters - some of which are stored and only used if the
              request is asynchronous. This causes a significant amount of writes, especially when considering the send
              buffer could be as small as several bytes.

              The proposed patch introduces a "direct mode" (currently set iff the underlying MTL is "mxm", which is the
              only option I had available for testing), which when on cuts most of the initialization for blocking send and
              receive operations to include only the bare minimum required to function. Aside from initializing only a part
              of the request structure (field like "dst" and "tag" are passed again to the MTL_CALL macro rather than use
              the request struct anyway), the function uses a single pre-allocated request buffer - which is possible since
              the call is blocking. Our tests show that this increases packet rate by approximately 20% with 8-byte buffers.
              Note that the "redundant" if-conditions for irrelevant functions (e.g. recv_init) are removed by compiler,
              since the macro substitutes and gets "if (0 == 0)".

WHERE: Most of the files in ompi/mca/pml/cm .

WHEN: ?

Joshua S. Ladd, PhD
HPC Algorithms Engineer
Mellanox Technologies

Email: joshual_at_[hidden]<mailto:joshual_at_[hidden]>
Cell: +1 (865) 258 - 8898