Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: changes to modex
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-04-03 11:35:39


On Apr 3, 2008, at 11:16 AM, Jeff Squyres wrote:

> The size of the openib modex is explained in btl_openib_component.c in
> the branch. It's a packed message now; we don't just blindly copy an
> entire struct. Here's the comment:
>
> /* The message is packed into multiple parts:
> * 1. a uint8_t indicating the number of modules (ports) in the
> message
> * 2. for each module:
> * a. the common module data
> * b. a uint8_t indicating how many CPCs follow
> * c. for each CPC:
> * a. a uint8_t indicating the index of the CPC in the all[]
> * array in btl_openib_connect_base.c
> * b. a uint8_t indicating the priority of this CPC
> * c. a uint8_t indicating the length of the blob to follow
> * d. a blob that is only meaningful to that CPC
> */
>
> The common module data is what I sent in the other message.

Gaa.. I forgot to finish explaining the spreadsheet before I sent
this; sorry...

The 4 lines of oob/xoob/ibcm/rdmacm cpc sizes are how many bytes those
cpc's contribute (on a per-port basis) to the modex. "size 1" is what
they currently contribute. "size 2" is if Jon and I are able to shave
off a few more bytes (not entirely sure that's possible yet).

The machine 1 and machine 2 are three configurations each of two
sample machines.

The first block of numbers is how big the openib part of the modex is
when only using the ibcm cpc, when only using the rdmacm cpc, and when
using both the ibcm and rdmacm cpc's (i.e., both are sent in the
modex; one will "win" and be used at run-time). The overall number is
a result of plugging in the numbers from the machine parameters
(nodes, ppn, num ports) and the ibcm/rdmacm cpc sizes to the formula
at the top of the spreadsheet.

The second block of numbers if modifying the formula at the top of the
spreadsheet to calculate basically sending the per-port information
only once (this modified formula did not include sending a per-port
bitmap as came up later in the thread). The green numbers in that
block are the differences between these numbers and the first block.

The third block of numbers is the same as the second block, but using
the "size 2" cpc sizes. The green numbers are the differences between
these numbers and the first block; the blue numbers are the
differences between these numbers and the second block.

-----

Note: based on what came up later in the thread (e.g., not taking into
account carto and whatnot), the 2nd and 3rd blocks of numbers are not
entirely accurate. But they're likely still in the right ballpark.
My point was that the size differences from the 1st block and the 2nd/
3rd blocks seemed to be significant enough to warrant moving ahead
with a "reduce replication in the modex" scheme.

-- 
Jeff Squyres
Cisco Systems