Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] RoCE (IBoE) & OpenMPI
From: Shamis, Pavel (shamisp_at_[hidden])
Date: 2011-02-19 00:22:50

As far as I remember we don't allow to user to specify SL for RoCE. RoCE considered kinda ethernet device and RDMACM connection manager is used to setup the connections. it means that in order to select network X or Y, you may use ip/netmask (btl_openib_ipaddr_include) .
Pavel (Pasha) Shamis

Application Performance Tools Group
Computer Science and Math Division
Oak Ridge National Laboratory
On Feb 18, 2011, at 4:14 PM, Michael Shuey wrote:
> Per-node GID & SL settings == bad.  Site-wide GID & SL settings == good.
> If this could be an MCA param (like btl_openib_ib_service_level)
> that'd be great - we already have a global config file of similar
> params.  We'd definitely want the same N everywhere.
> --
> Mike Shuey
> On Fri, Feb 18, 2011 at 3:44 PM, Jeff Squyres <jsquyres_at_[hidden]> wrote:
>> On Feb 18, 2011, at 1:39 PM, Michael Shuey wrote:
>>> RoCE HCAs keep a GID table, like normal HCAs.  Every time you bring up
>>> a vlan interface, another entry gets automatically added to the table.
>>> If I select one of these other GIDs, packets get a VLAN tag, and that
>>> contains the necessary priority bits (well, assuming I selected the
>>> right IB service level, which is mapped to the priority tag in the
>>> VLAN header) for the traffic to match a lossless class of service on
>>> the switch.
>> Ah -- I see it now (it's been a looong time since I've looked in Open MPI's verbs code!).  We query and simply take the 0th GID from a given IBV device port's GID table.
>>> For this to work, I really need for the IB client to select a
>>> non-default GID.  A few test programs included in OFED will do this,
>>> but I'm not sure OpenMPI will.  Any thoughts?
>> Yes, we can do this.  It's pretty easy to add an MCA parameter to select the Nth GID rather than always taking the 0th.
>> To make this simple, can you make it so that the value of N is the same across all nodes in your cluster?  Then you can set a site-wide MCA param for that value of N and be done with this issue.  If we have to have a per-node setting of N, it could get a little hairy (it's do-able, but... it's a heckuva lot easier if N is the same everywhere).
>> --
>> Jeff Squyres
>> jsquyres_at_[hidden]
>> For corporate legal information go to:
> _______________________________________________
> users mailing list
> users_at_[hidden]