Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] heterogeneous OpenFabrics adapters
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-05-12 20:37:54


Short version:
--------------

I propose that we should disallow multiple different
mca_btl_openib_receive_queues values (or receive_queues values from
the INI file) to be used in a single MPI job for the v1.3 series.

More details:
-------------

The reason I'm looking into this heterogeneity stuff is to help
Chelsio support their iWARP NIC in OMPI. Their NIC needs a specific
value for mca_btl_openib_receive_queues (specifically: it does not
support SRQ and it has the wireup race condition that we discussed
before).

The major problem is that all the BSRQ information is currently stored
in on the openib component -- it is *not* maintained on a per-HCA (or
per port) basis. We *could* move all the BSRQ info to live on the
hca_t struct (or even the openib module struct), but it has at least 3
big consequences:

1. It would touch a lot of code. But touching all this code is
relatively low risk; it will be easy to check for correctness because
the changes will either compile or not.

2. There are functions (some of which are static inline) that read the
BSRQ data. These functions would have to take an additional (hca_t*)
(or (btl_openib_module_t*)) parameter.

3. Getting to the BSRQ info will take at least 1 or 2 more
dereferences (e.g., module->hca->bsrq_info or module->bsrq_info...).

I'm not too concerned about #1 (it's grunt work), but I am a bit
concerned about #2 and #3 since at least some of these places are in
the critical performance path.

Given these concerns, I propose the following v1.3:

- Add a "receive_queues" field to the INI file so that the Chelsio
adapter can run out of the box (i.e., "mpirun -np 4 a.out" with hosts
containing Chelsio NICs will get a value for btl_openib_receive_queues
that will work).

- NetEffect NICs will also require overriding
btl_openib_receive_queues, but will likely have a different value than
Chelsio NICs (they don't have the wireup race condition that Chelsio
does).

- Because the BSRQ info is on the component (i.e., global), we should
detect when multiple different receive_queues values are specified and
gracefully abort.

I think it'll be quite uncommon to have a need for two different
receive_queues values, and that this proposal will be fine for v1.3

Comments?

On May 12, 2008, at 6:44 PM, Jeff Squyres wrote:

> After looking at the code a bit, I realized that I completely forgot
> that the INI file was invented to solve at least the heterogeneous-
> adapters-in-a-host problem.
>
> So I amended the ticket to reflect that that problem is already
> solved. The other part is not, though -- consider two MPI procs on
> different hosts, each with an iWARP NIC, but one NIC supports SRQs and
> one does not.
>
>
> On May 12, 2008, at 5:36 PM, Jeff Squyres wrote:
>
>> I think that this issue has come up before, but I filed a ticket
>> about it because at least one developer (Jon) has a system with both
>> IB and iWARP adapters:
>>
>> https://svn.open-mpi.org/trac/ompi/ticket/1282
>>
>> My question: do we care about the heterogeneous adapter scenarios?
>> For v1.3? For v1.4? For ...some version in the future?
>>
>> I think the first issue I identified in the ticket is grunt work to
>> fix (annoying and tedious, but not difficult), but the second one
>> will be a little dicey -- it has scalability issues (e.g., sending
>> around all info in the modex, etc.).
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
Cisco Systems