It's a little different in RoCE. There's no subnet manager, so (as
near as I can tell) you don't really have a subnet ID. Instead, the
GID = GUID + VLAN tag (more or less). gid has special bits in the
VLAN tag section, to indicate that packets relating to this GID don't
get a VLAN tag. Unfortunately, without a VLAN tag, those packets lack
priority bits - meaning they can't be matched to a lossless class on
our Cisco switches.
RoCE HCAs keep a GID table, like normal HCAs. Every time you bring up
a vlan interface, another entry gets automatically added to the table.
If I select one of these other GIDs, packets get a VLAN tag, and that
contains the necessary priority bits (well, assuming I selected the
right IB service level, which is mapped to the priority tag in the
VLAN header) for the traffic to match a lossless class of service on
For this to work, I really need for the IB client to select a
non-default GID. A few test programs included in OFED will do this,
but I'm not sure OpenMPI will. Any thoughts?
On Fri, Feb 18, 2011 at 9:30 AM, Jeff Squyres <jsquyres_at_[hidden]> wrote:
> Greetings Mike. I'll answer today because Fri-Sat is the weekend in Israel (i.e., the MPI team at Mellanox won't see this until Sunday).
> I don't have a lot of experience with RoCE; do you need a different GUID or a different subnet ID? At least in IB, the GID = GUID + Subnet ID. The GUID should be your unique port ID and the subnet ID is, well, the subnet ID. :-)
> Changing either of these in IB is an administrative function, not a user-level function. Meaning: I'm *guessing* that the same is true for RoCE -- changing the subnet ID (which is what I'm further guessing you need to do) should be somewhere in the root-level setup for RoCE. Once you set a different subnet ID, Open MPI should just use it.
> On Feb 18, 2011, at 8:17 AM, Michael Shuey wrote:
>> I've been looking into OpenMPI's support for RoCE (Mellanox's recent
>> Infiniband-over-Ethernet) lately. While it's promising, I've hit a
>> snag: RoCE requires lossless ethernet, and on my switches the only way
>> to guarantee this is with CoS. RoCE adapters cannot emit CoS priority
>> tags unless the client program selects an IB service level and uses a
>> non-default GID.
>> There's a command-line option in OpenMPI to pick an IB SL, but I can't
>> find one for picking a different GID. Does this exist for the openib
>> btl? Or am I going about this the wrong way?
>> Mike Shuey
>> users mailing list
> Jeff Squyres
> For corporate legal information go to:
> users mailing list