Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Multi-rail on openib
From: Sylvain Jeaugey (sylvain.jeaugey_at_[hidden])
Date: 2009-06-08 04:38:47


Hi Tom,

Yes, there is a goal in mind, and definetly not performance : we are
working on device failover, i.e when a network adapter or switch fails,
use the remaining one. We don't intend to improve performance with
multi-rail (which as you said, will not happen unless you have a DDR card
with PCI Exp 8x Gen2 and a very nice routing - and money to pay for the
doubled network :)).

The goal here is to use port 1 of each card as a primary way of
communication with a fat tree and port 2 as a failover solution with a
very light network, just to avoid aborting the MPI app or at least reach a
checkpoint.

Don't worry, another team is working on opensm, so that routing stays
optimal.

Thanks for your warnings however, it's true that a lot of people see these
"double port IB cards" as "doubled performance".

Sylvain

On Fri, 5 Jun 2009, Nifty Tom Mitchell wrote:

> On Fri, Jun 05, 2009 at 09:52:39AM -0400, Jeff Squyres wrote:
>>
>> See this FAQ entry for a description:
>>
>> http://www.open-mpi.org/faq/?category=openfabrics#ofa-port-wireup
>>
>> Right now, there's no way to force a particular connection pattern on
>> the openib btl at run-time. The startup sequence has gotten
>> sufficiently complicated / muddied over the years that it would be quite
>> difficult to do so. Pasha is in the middle of revamping parts of the
>> openib startup (see http://bitbucket.org/pasha/ompi-ofacm/); it *may* be
>> desirable to fully clean up the full openib btl startup sequence when
>> he's all finished.
>>
>>
>> On Jun 5, 2009, at 9:48 AM, Mouhamed Gueye wrote:
>>
>>> Hi all,
>>>
>>> I am working on multi-rail IB and I was wondering how connections are
>>> established between ports. I have two hosts, each with 2 ports on a
>>> same IB card, connected to the same switch.
>>>
>
> Is there a goal in mind?
>
> In general multi-rail cards run into bandwidth and congestion issues
> with the host bus. If your card's system side interface cannot support
> the bandwidth of twin IB links then it is possible that bandwidth would
> be reduced by the interaction.
>
> If the host bus and memory system is fast enough then
> work with the vendor.
>
> In addition to system bandwidth the subnet manager may need to be enhanced
> to be multi-port card aware. Since IB fabric routes are static it is possible
> to route or use pairs of links in an identical enough way that there is
> little bandwidth gain when multiple switches are involved.
>
> Your two host case case may be simple enough....to explore
> and/or generate illuminating or misleading results.
> It is a good place to start.
>
> Start with a look at opensm and the fabric then watch how Open MPI
> or your applications use the resulting LIDs. If you are using IB directly
> and not MPI then the list of protocol choices grows dramatically but still
> centers on LIDs as assigned by the subnet manager (see opensm).
>
> How man CPU cores (ranks) are you working with?
>
> Do be specific about the IB hardware and associated firmware....
> there are multiple choices out there and the vendor may be able to help.......
>
> --
> T o m M i t c h e l l
> Found me a new hat, now what?
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>