Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] RoCE (IBoE) & OpenMPI
From: Michael Shuey (shuey_at_[hidden])
Date: 2011-02-18 13:39:59

It's a little different in RoCE. There's no subnet manager, so (as
near as I can tell) you don't really have a subnet ID. Instead, the
GID = GUID + VLAN tag (more or less). gid[0] has special bits in the
VLAN tag section, to indicate that packets relating to this GID don't
get a VLAN tag. Unfortunately, without a VLAN tag, those packets lack
priority bits - meaning they can't be matched to a lossless class on
our Cisco switches.

RoCE HCAs keep a GID table, like normal HCAs. Every time you bring up
a vlan interface, another entry gets automatically added to the table.
 If I select one of these other GIDs, packets get a VLAN tag, and that
contains the necessary priority bits (well, assuming I selected the
right IB service level, which is mapped to the priority tag in the
VLAN header) for the traffic to match a lossless class of service on
the switch.

For this to work, I really need for the IB client to select a
non-default GID. A few test programs included in OFED will do this,
but I'm not sure OpenMPI will. Any thoughts?

Mike Shuey
On Fri, Feb 18, 2011 at 9:30 AM, Jeff Squyres <jsquyres_at_[hidden]> wrote:
> Greetings Mike.  I'll answer today because Fri-Sat is the weekend in Israel (i.e., the MPI team at Mellanox won't see this until Sunday).
> I don't have a lot of experience with RoCE; do you need a different GUID or a different subnet ID?  At least in IB, the GID = GUID + Subnet ID.  The GUID should be your unique port ID and the subnet ID is, well, the subnet ID.  :-)
> Changing either of these in IB is an administrative function, not a user-level function.  Meaning: I'm *guessing* that the same is true for RoCE -- changing the subnet ID (which is what I'm further guessing you need to do) should be somewhere in the root-level setup for RoCE.  Once you set a different subnet ID, Open MPI should just use it.
> On Feb 18, 2011, at 8:17 AM, Michael Shuey wrote:
>> I've been looking into OpenMPI's support for RoCE (Mellanox's recent
>> Infiniband-over-Ethernet) lately.  While it's promising, I've hit a
>> snag: RoCE requires lossless ethernet, and on my switches the only way
>> to guarantee this is with CoS.  RoCE adapters cannot emit CoS priority
>> tags unless the client program selects an IB service level and uses a
>> non-default GID.
>> There's a command-line option in OpenMPI to pick an IB SL, but I can't
>> find one for picking a different GID.  Does this exist for the openib
>> btl?  Or am I going about this the wrong way?
>> --
>> Mike Shuey
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> _______________________________________________
> users mailing list
> users_at_[hidden]