Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: meaning of "btl_XXX_eager_limit"
From: Brian W. Barrett (brbarret_at_[hidden])
Date: 2009-07-23 12:06:29

On Thu, 23 Jul 2009, Jeff Squyres wrote:

> There are two solutions I can think of. Which should we do?
> a. Pass the (max?) PML header size down into the BTL during
> initialization such that the the btl_XXX_eager_limit can
> represent the max MPI data payload size (i.e., the BTL can size
> its buffers to accommodate its desired max eager payload size,
> its header size, and the PML header size). Thus, the
> eager_limit can truly be the MPI data payload size -- and easy
> to explain to users.

This will not work. Remember, the PML IS NOT THE ONLY USER OF THE BTLS.
I'm really getting sick of saying this, but it's true. There can be no
PML knowledge in the BTL, even if it's something simple like a header
size. And since PML headers change depending on the size and type of
message, this seems like a really stupid parameter to publish to the user.

> b. Stay with the current btl_XXX_eager_limit implementation (which
> OMPI has had for a long, long time) and add the code to check
> for btl_eager_limit less than the pml header size (per this past
> Tuesday's discussion). This is the minimal distance change.

Since there's already code in Terry's hands to do this, I vote for b.

> 2. OMPI currently does not publish enough information for a user to
> set eager_limit to be able to do BTL traffic shaping. That is, one
> really needs to know the (max) BTL header length and the (max) PML
> header length values to be able to calculate the correct
> eager_limit force a specific (max) BTL wire fragment size. Our
> proposed solution is to have ompi_info print out the (max) PML and
> BTL header sizes. Regardless of whether 1a) or 1b) is chosen, with
> these two pieces of information, a determined network administrator
> could calculate the max wire fragment size used by OMPI, and
> therefore be able to do at least some of traffic shaping.

Actually, there's no need to know the PML header size to shape traffic.
There's only need to know the BTL header, and I wouldn't be opposed to
changing the behavior so that the BTL eager limit parameter included the
btl header size (because the PML header is not a factor in determining
size of individual eager packets). It seems idiotic, but whatever - you
should more care about what the data size the user is sending than the MTU
size. Sending multiple MTUs should have little performance on a network
that doesn't suck and we shouldn't be doing all kinds of hacks to support
networks who's designers can't figure out which way is up.

Again, since there are multiple consumers of the BTLs, allowing network
designers to screw around with defaults to try and get what they want
(even when it isn't what they actually want) seems stupid. But as long as
you don't do 1a, I won't object to uselessness contained in ompi_info.