Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] openib btl code review
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-05-15 08:14:29


Pasha tells me he'll be able to review the patch next week, so I'll
wait to commit until then. I added the patch to the ticket, just so
that it doesn't get lost.

Any other reviewers would be welcome... :-)

On May 14, 2008, at 5:39 PM, Jeff Squyres wrote:

> https://svn.open-mpi.org/trac/ompi/ticket/1285 turned out to be more
> complicated than expected (of course). The startup in the openib
> btl mixes resource discovery and initialization (vs. doing
> discovery, deciding which hcas/ports/lids to use, initializing them,
> and then assigning resources to them), which made the whole "putting
> the BSRQ receive_queues value in the INI file" a bit more difficult
> than expected -- the code got a bit hairy. I would appreciate some
> more eyes on this code before I commit it; thanks.
> ---------
>
> The attached patch solves two problems:
>
> - allow receive_queues to be specified in the INI file
> - detect when multiple different receive_queues are specified and
> gracefully abort
>
> However, accomplishing these goals ran into multiple difficulties.
> By putting receive_queues in the INI file:
>
> 1. we may not find the value until we've already traversed multiple
> HCAs
> 2. we may find multiple different receive_queues values
>
> But since the openib btl initializes as it discovers each HCA/port/
> LID (including the BSRQ data), if we find a new receive_queues value
> late in the discovery process, then all the BSRQ data that was
> previously initialized will likely be invalid. So I had to pull all
> the BSRQ initialization out until after the rest of the discovery /
> initialization process.
>
> Additionally, note that if the user specifies the MCA parameter
> btl_openib_receive_queues, it trumps whatever was in the INI file.
> So in this case, there can never be a receive_queues conflict.
>
> The attached patch does the following (Jon wrote part of this, too):
>
> - some random style cleanup
> - fix a few minor memory leaks
> - adapt _ini.c to accept the "receive_queues" field in the file
> - move 90% of _setup_qps() from _ini.c to _component.c
> - move what was left of _setup_qps() into the main
> _register_mca_params() function
> - adapt init_one_hca() to detect conflicting receive_queues values
> from the INI file
> - after the _component.c loop calling init_one_hca():
> - call setup_qps() to parse the final receive_queues string value
> - traverse all resulting btls and initialize their HCAs (if they
> weren't already): setup some lists and call prepare_hca_for_use()
>
> I tested this code on a dual-HCA system where I artificially put in
> differing receive_queues values in the INI file for the two
> different types of HCAs that I have and it all seemed to work. But
> I'd appreciate some more eyes on the code to sanity check.
>
> If I hear nothing back by COB tomorrow, I'll commit. Thanks.
>
> --
> Jeff Squyres
> Cisco Systems
> <receive-queues.patch><mime-attachment.txt><mime-attachment.txt>

-- 
Jeff Squyres
Cisco Systems