Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [OMPI svn] svn:open-mpi r17307
From: Adrian Knoth (adi_at_[hidden])
Date: 2008-01-30 03:17:03


On Tue, Jan 29, 2008 at 07:37:42PM -0500, George Bosilca wrote:

> The previous code was correct. Each IP address correspond to a
> specific endpoint, and therefore to a specific BTL. This enable us to
> have multiple TCP BTL at the same time, and allow the OB1 PML to
> stripe the data over all of them.
>
> Unfortunately, your commit disable the multi-rail over TCP. Please
> undo it.

That's exactly what I had in mind when I said "this might break
functionality".

So we need as many endpoints as IP addresses? Then, simply connecting
them leads to oversubscription: two parallel connections on the same
media. That's where the kernel index enters the scene: we'll have to
make sure not to open two parallel connections to the same remote kernel
index.

I'll revert the patch and come up with another solution, but for the
moment, let me point out that the assumption "One interface, one
address" isn't true. So, the previous code was also wrong.

I hope not to run into model limitations: avoiding oversubscription
means to keep the number of endpoints per peer lower than the amount of
his interfaces, but accepting incoming connections from this peer means
to have all his addresses (probably more than #remote_NICs) available in
order to accept them.

As mentioned earlier: it's very common to have multiple addresses per
interface, and it's the kernel who assigns the source address, so
there's nothing one could say about an incoming connection. Only that it
could be any of all exported addresses. Any.

-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany
private: http://adi.thur.de