On Monday 12 May 2008 07:37:54 pm Jeff Squyres wrote:
> Short version:
> I propose that we should disallow multiple different
> mca_btl_openib_receive_queues values (or receive_queues values from
> the INI file) to be used in a single MPI job for the v1.3 series.
> More details:
> The reason I'm looking into this heterogeneity stuff is to help
> Chelsio support their iWARP NIC in OMPI. Their NIC needs a specific
> value for mca_btl_openib_receive_queues (specifically: it does not
> support SRQ and it has the wireup race condition that we discussed
> The major problem is that all the BSRQ information is currently stored
> in on the openib component -- it is *not* maintained on a per-HCA (or
> per port) basis. We *could* move all the BSRQ info to live on the
> hca_t struct (or even the openib module struct), but it has at least 3
> big consequences:
> 1. It would touch a lot of code. But touching all this code is
> relatively low risk; it will be easy to check for correctness because
> the changes will either compile or not.
> 2. There are functions (some of which are static inline) that read the
> BSRQ data. These functions would have to take an additional (hca_t*)
> (or (btl_openib_module_t*)) parameter.
> 3. Getting to the BSRQ info will take at least 1 or 2 more
> dereferences (e.g., module->hca->bsrq_info or module->bsrq_info...).
> I'm not too concerned about #1 (it's grunt work), but I am a bit
> concerned about #2 and #3 since at least some of these places are in
> the critical performance path.
> Given these concerns, I propose the following v1.3:
> - Add a "receive_queues" field to the INI file so that the Chelsio
> adapter can run out of the box (i.e., "mpirun -np 4 a.out" with hosts
> containing Chelsio NICs will get a value for btl_openib_receive_queues
> that will work).
> - NetEffect NICs will also require overriding
> btl_openib_receive_queues, but will likely have a different value than
> Chelsio NICs (they don't have the wireup race condition that Chelsio
> - Because the BSRQ info is on the component (i.e., global), we should
> detect when multiple different receive_queues values are specified and
> gracefully abort.
How would we verify that the remote receive_queues values are the same? By
passing around the receive_queues values in the modex (which I thought we
were trying to reduce) or would we pass this around during cpc setup (for
those that can support this)?
> I think it'll be quite uncommon to have a need for two different
> receive_queues values, and that this proposal will be fine for v1.3
Sounds reasonable to me.
> On May 12, 2008, at 6:44 PM, Jeff Squyres wrote:
> > After looking at the code a bit, I realized that I completely forgot
> > that the INI file was invented to solve at least the heterogeneous-
> > adapters-in-a-host problem.
> > So I amended the ticket to reflect that that problem is already
> > solved. The other part is not, though -- consider two MPI procs on
> > different hosts, each with an iWARP NIC, but one NIC supports SRQs and
> > one does not.
> > On May 12, 2008, at 5:36 PM, Jeff Squyres wrote:
> >> I think that this issue has come up before, but I filed a ticket
> >> about it because at least one developer (Jon) has a system with both
> >> IB and iWARP adapters:
> >> https://svn.open-mpi.org/trac/ompi/ticket/1282
> >> My question: do we care about the heterogeneous adapter scenarios?
> >> For v1.3? For v1.4? For ...some version in the future?
> >> I think the first issue I identified in the ticket is grunt work to
> >> fix (annoying and tedious, but not difficult), but the second one
> >> will be a little dicey -- it has scalability issues (e.g., sending
> >> around all info in the modex, etc.).
> >> --
> >> Jeff Squyres
> >> Cisco Systems
> > --
> > Jeff Squyres
> > Cisco Systems
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel