Jeff Squyres wrote:
>On Nov 9, 2007, at 1:24 PM, Don Kerr wrote:
>>both, I was thinking of listing what I think are multi-rail
>>but wanted to understand what the current state of things are
>I believe the OF portion of the FAQ describes what we do in the v1.2
>series (right Gleb?); I honestly don't remember what we do today on
>the trunk (I'm pretty sure that Gleb has tweaked it recently).
Gleb's response answered this.
>As for what we *should* do, it's a very complicated question. :-\
OK. I knew the "close to NIC" was a concern but was not aware an attempt
to tackle this began. I will look at the "carto" framework.
>This is where all these discussions regarding affinity, NUMA, and NUNA
>(non uniform network architecture) come into play. A "very simple"
>scenario may be something like this:
>- host A is UMA (perhaps even a uniprocessor) with 2 ports that are
>equidistant from the 1 MPI process on that host
>- host B is the same, except it only has 1 active port on the same IB
>subnet as host A's 2 ports
>- the ports on both hosts are all the same speed (e.g., DDR)
>- the ports all share a single, common, non-blocking switch
>But even with this "simple" case, the answer as to what you should do
>is still unclear. If host A is able to drive both of its DDR links at
>full speed, you're could cause congestion at the link to host B if the
>MPI process on host A opens two connections. But if host A is only
>able to drive the same effective bandwidth out of its two ports as it
>is through a single port, then the end effect is probably fairly
>negligible -- it might not make much of a difference at all as to
>whether the MPI process A opens 1 or 2 connections to host B.
>But then throw in other effects that I mentioned above (NUMA, NUNA,
>etc.), and the equation becomes much more complex. In some cases, it
>may be good to open 1 connection (e.g., bandwidth load balancing); in
>other cases it may be good to open 2 (e.g., congestion avoidance /
>spreading traffic around the network, particularly in the presence of
>other MPI jobs on the network). :-\
>Such NUNA architectures may sound unusual to some, but both IBM and HP
>sell [many] blade-based HPC solutions with NUNA internal IB networks.
>Specifically: this is a fairly common scenario.
>So this is a difficult question without a great answer. The hope is
>that the new carto framework that Sharon sent requirements around for
>will be able to at least make topology information available from both
>the host and the network so that BTLs can possibly make some
>intelligent decisions about what to do in these kinds of scenarios.