Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] "Re: RoCE (IBoE) & OpenMPI"
From: Eli Cohen (eli_at_[hidden])
Date: 2011-03-22 10:07:25

this discussion has been brought to my attention so I joined this
mailing list to try to help.
As you already stated that the SL maps correctly to PCP when using
ibv_rc_pingpong, I assume OpenMPI works over rdma_cm. In that cases
please note the following:
1. If you're using OFED-1.5.2, than if if the rdma_cm socket is bound
to VLAN net device, all egress traffic will bear a default priority of
2. The default priority is controlled by a module parameter to
rdma_cm.ko named def_prec2sl.
3. You may change the priority on a per socket basis (overriding the
module parameter) by using setsockopt() to set the option
RDMA_OPTION_ID_TOS to the required value of the TOS.
4. The TOS is mapped to SL according to the following formula: SL = TOS >> 5

I hope that clears things.

> Late yesterday I did have a chance to test the patch Jeff provided
> (against 1.4.3 - testing 1.5.x is on the docket for today). While it
> works, in that I can specify a gid_index, it doesn't do everything
> required - my traffic won't match a lossless CoS on the ethernet
> switch. Specifying a GID is only half of it; I really need to also
> specify a service level.
> The bottom 3 bits of the IB SL are mapped to ethernet's PCP bits in
> the VLAN tag. With a non-default gid, I can select an available VLAN
> (so RoCE's packets will include the PCP bits), but the only way to
> specify a priority is to use an SL. So far, the only RoCE-enabled app
> I've been able to make work correctly (such that traffic matches a
> lossless CoS on the switch) is ibv_rc_pingpong - and then, I need to
> use both a specific GID and a specific SL.
> The slides Pavel found seem a little misleading to me. The VLAN isn't
> determined by bound netdev; all VLAN netdevs map to the same IB
> adapter for RoCE. VLAN is determined by gid index. Also, the SL
> isn't determined by a set kernel policy; it's provided via the IB
> interfaces. As near as I can tell from Mellanox's documentation, OFED
> test apps, and the driver source, a RoCE adapter is an Infiniband card
> in almost all respects (even more so than an iWARP adapter).