Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: [OMPI users] "Re: RoCE (IBoE) & OpenMPI"
From: Eli Cohen (eli_at_[hidden])
Date: 2011-03-22 10:07:25


Hi,
this discussion has been brought to my attention so I joined this
mailing list to try to help.
As you already stated that the SL maps correctly to PCP when using
ibv_rc_pingpong, I assume OpenMPI works over rdma_cm. In that cases
please note the following:
1. If you're using OFED-1.5.2, than if if the rdma_cm socket is bound
to VLAN net device, all egress traffic will bear a default priority of
3.
2. The default priority is controlled by a module parameter to
rdma_cm.ko named def_prec2sl.
3. You may change the priority on a per socket basis (overriding the
module parameter) by using setsockopt() to set the option
RDMA_OPTION_ID_TOS to the required value of the TOS.
4. The TOS is mapped to SL according to the following formula: SL = TOS >> 5

I hope that clears things.

> Late yesterday I did have a chance to test the patch Jeff provided
> (against 1.4.3 - testing 1.5.x is on the docket for today). While it
> works, in that I can specify a gid_index, it doesn't do everything
> required - my traffic won't match a lossless CoS on the ethernet
> switch. Specifying a GID is only half of it; I really need to also
> specify a service level.
> The bottom 3 bits of the IB SL are mapped to ethernet's PCP bits in
> the VLAN tag. With a non-default gid, I can select an available VLAN
> (so RoCE's packets will include the PCP bits), but the only way to
> specify a priority is to use an SL. So far, the only RoCE-enabled app
> I've been able to make work correctly (such that traffic matches a
> lossless CoS on the switch) is ibv_rc_pingpong - and then, I need to
> use both a specific GID and a specific SL.
> The slides Pavel found seem a little misleading to me. The VLAN isn't
> determined by bound netdev; all VLAN netdevs map to the same IB
> adapter for RoCE. VLAN is determined by gid index. Also, the SL
> isn't determined by a set kernel policy; it's provided via the IB
> interfaces. As near as I can tell from Mellanox's documentation, OFED
> test apps, and the driver source, a RoCE adapter is an Infiniband card
> in almost all respects (even more so than an iWARP adapter).