On Jul 19, 2013, at 19:29 , "Barrett, Brian W" <bwbarre_at_[hidden]> wrote:
> On 7/19/13 10:58 AM, "George Bosilca" <bosilca_at_[hidden]> wrote:
>> 1. The BML endpoint structure (aka. BML proc) is well known and defined
>> in the bml.h. So it is not technically opaque
> It's opaque in that outside of the R2 BML, users were not supposed to poke
> at what's in proc_bml without using the BML interface. Some do, although
> that was easy to accommodate.
>> 2. When allocating an ompi_proc_t structure you will always have to
>> allocate for an array large enough to contain up to the max size detected
>> during configure. There is significant potential for oversized arrays in
>> one of the most space critical structure.
> It could, if we're not careful with our tag requests. In the prototype I
> wrote up, the sizes of endpoint storage in ompi_proc_t are as follows:
> * Current trunk: 16 bytes
> * Proposed trunk, no dynamic support, no MTLs: 8 bytes
> * Proposed trunk, dynamic support, no MTLs: 16 bytes
> * Proposed trunk, dynamic support, MXM, PSM, or MX: 24 bytes
> * Proposed trunk, Portals, no dynamic support: 16 bytes
> * Proposed trunk, Portals, dynamic support: 24 bytes
> * Proposed trunk, Portals, MX, PSM, or MXM, dynamic support: 32 bytes
> So, yes, you're right. But the situations where you see growth are not
> normal OMPI builds (for example, Portals & MXM support). In the common
> cases, we could actually shrink by 8 bytes by disabling dynamic support.
> It would also (finally) allow us to run the MTLs and BTLs simultaneously,
> which is something we haven't been able to do previously.
Sure, but if we look a little forward having such a mechanism available might raise interest from others components to take advantage of it, leading to a larger number of such TAGs. There is potential for a larger and sparser proc array (as not all modules that reserve a TAG will be loaded simultaneously) in ompi_proc_t.
I would like to propose a simpler solution. Imagine having a unique global index in the ompi_proc_t structure, one that will indicate the position of the ompi_proc_t in the global array of processors. One could use this unique index to access any type of information in similarly shaped arrays to our global ompi processor list. Thus all components that need to share some other information would be able to take advantage of ompi_proc_t index so share information they agree on on an array that they agree on. This extra array where they will access this information can either be created using their already shared infrastructure (if such infrastructure exists), or we can leverage the new MCA parameters infrastructure to create a hidden/internal parameter that point to the array.
What is the cost of this approach?
- There are several fields in the ompi_proc_t structure that can be used to store the global index. As an example we can take advantage of the proc->super.item_free that is never used in the context of ompi_proc_t (this field is only used for LIFO/FIFO). This is an int32_t so we're good in number of processes for a while. Thus compared with today there will be no extra storage cost for hang this global index.
- The cost of accessing the endpoints will be a load from the ompi_proc_t to get that global index and then another relative load (using this index and the array of endpoints). So exactly the same number of loads as the dynamic case, but one more compared with the "no dynamic support" case in your proposal.
- In terms of memory this solution provide an approach where there will never be an extra overhead. The ompi_proc_t is not changed, and the extra array of endpoints is only created if the components that share it, are all loaded and enabled.
>> 3. I don't know at which point this really matter but with this change
>> two Open MPI libraries might become binary incompatible (if the #define
>> is exchanged between nodes).
> The #defines are local process only. ompi_proc_ts aren't global
> structures (the pointer to them is), so there's no binary incompatibility.
> I hacked up a prototype in tmp-public/snl-proc-tags/ last night. It
> currently lacks dynamic support (since we have no users for that), but
> otherwise works.
> Brian W. Barrett
> Scalable System Software Group
> Sandia National Laboratories
> devel mailing list