Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Merging in the CPC work
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-04-24 14:54:25


I did this in https://svn.open-mpi.org/trac/ompi/changeset/18279; the
message is now gone if IBCM is not installed on the host.

If you care: I actually used open() instead of stat(), because that
way I can also ensure that the current user is able to both read and
write to the device (which is also required).

On Apr 24, 2008, at 11:03 AM, Jeff Squyres (jsquyres) wrote:

> ...actually, thinking about this a bit more, it might be easy to try
> to stat /dev/infiniband/ucmX before calling ib_cm_open_device. I'll
> check into it this afternoon.
>
> -jms
> Sent from my PDA. No type good.
>
> -----Original Message-----
> From: Jeff Squyres (jsquyres)
> Sent: Thursday, April 24, 2008 10:56 AM Eastern Standard Time
> To: pasha_at_[hidden]
> Cc: Open MPI Developers
> Subject: Re: [OMPI devel] Merging in the CPC work
>
> Its unavoidable in the current rev of libibcm :( - sean hefty tells
> me that he'll remove that message in the next release.
>
> For the time being, mayhe the right solution in ompi is to not try
> to use ibcm unless its specifically requested. :(
>
> -jms
> Sent from my PDA. No type good.
>
> -----Original Message-----
> From: Pavel Shamis (Pasha) [mailto:pasha_at_[hidden]]
> Sent: Thursday, April 24, 2008 10:52 AM Eastern Standard Time
> To: Jeff Squyres (jsquyres)
> Cc: Open MPI Developers
> Subject: Re: [OMPI devel] Merging in the CPC work
>
> The trivial tests Pass and now I'm running full testing.
> In the NOT_XRC tests i got:
>
> libibcm: unable to open /dev/infiniband/ucm0
> libibcm: couldn't read ABI version
>
> But the test PASS successfully. So as I understood it use OOB. Can we
> prevent the message somehow ?
>
> Jeff Squyres wrote:
> > Thanks! That's a result of some [helpful] error messages and
> handling
> > that I added yesterday when ibcm is not configured on the host.
> >
> > Fixed in r18273.
> >
> >
> > On Apr 24, 2008, at 8:22 AM, Pavel Shamis (Pasha) wrote:
> >
> >> The patch below resolves the segfault :
> >>
> >> -- btl_openib_connect_ibcm.c.orig 2008-04-24
> 15:14:28.500676000
> >> +0300
> >> +++ btl_openib_connect_ibcm.c 2008-04-24 15:15:08.961168000 +0300
> >> @@ -328,7 +328,7 @@
> >> {
> >> int rc;
> >> modex_msg_t *msg;
> >> - ibcm_module_t *m;
> >> + ibcm_module_t *m = NULL;
> >> opal_list_item_t *item;
> >> ibcm_listen_cm_id_t *cmh;
> >> ibcm_module_list_item_t *imli;
> >>
> >>
> >> Jeff Squyres wrote:
> >>> I had a linker error with the rdmacm library yesterday that I
> fixed
> >>> later, sorry.
> >>>
> >>> Could you try it again? You'll need to svn up, re-autogen,
> etc. It
> >>> should be obvious whether I fixed it -- even trivial apps will
> work
> >>> or not work.
> >>>
> >>> Thanks.
> >>>
> >>>
> >>> On Apr 24, 2008, at 6:24 AM, Gleb Natapov wrote:
> >>>
> >>>> On Thu, Apr 24, 2008 at 11:50:10AM +0300, Pavel Shamis (Pasha)
> wrote:
> >>>>> Jeff,
> >>>>> All my tests fail.
> >>>>> XRC disabled tests failed with:
> >>>>> mtt/installs/Zq_9/install/lib/openmpi/mca_btl_openib.so:
> undefined
> >>>>> symbol: rdma_create_event_channel
> >>>>> XRC enabled failed with segfault , I will take a look later
> today.
> >>>> Well it is a little bit better for me. I compiled only OOB
> connection
> >>>> manager and ompi passes simple testing.
> >>>>
> >>>>>
> >>>>> Pasha
> >>>>>
> >>>>> Jeff Squyres wrote:
> >>>>>> As we discussed yesterday, I have started the merge from the /
> tmp-
> >>>>>> public/openib-cpc2 branch. "oob" is currently the default.
> >>>>>>
> >>>>>> Unfortunately, it caused quite a few conflicts when I merged
> with
> >>>>>> the
> >>>>>> trunk, so I created a new temp branch and put all the work
> there:
> >>>>>> /tmp-
> >>>>>> public/openib-cpc3.
> >>>>>>
> >>>>>> Could all the IB and iWARP vendors and any other interested
> parties
> >>>>>> please try this branch before we bring it back to the trunk?
> Please
> >>>>>> test all functionality that you care about -- XRC, etc. I'd
> like to
> >>>>>> bring it back to the trunk COB Thursday. Please let me know
> if this
> >>>>>> is too soon.
> >>>>>>
> >>>>>> You can force the selection of a different CPC with the
> >>>>>> btl_openib_cpc_include MCA param:
> >>>>>>
> >>>>>> mpirun --mca btl_openib_cpc_include oob ...
> >>>>>> mpirun --mca btl_openib_cpc_include xoob ...
> >>>>>> mpirun --mca btl_openib_cpc_include rdma_cm ...
> >>>>>> mpirun --mca btl_openib_cpc_include ibcm ...
> >>>>>>
> >>>>>> You might want to concentrate on testing oob and xoob to
> ensure that
> >>>>>> we didn't cause any regressions. The ibcm and rdma_cm CPCs
> probably
> >>>>>> still have some rough edges (and the IBCM package in OFED
> itself may
> >>>>>> not be 100% -- that's one of the things we're evaluating. It's
> >>>>>> known
> >>>>>> to not install properly on RHEL4U4, for example -- you have to
> >>>>>> manually mknod and chmod a device in /dev/infiniband for every
> >>>>>> HCA in
> >>>>>> the host).
> >>>>>>
> >>>>>> Thanks.
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Pavel Shamis (Pasha)
> >>>>> Mellanox Technologies
> >>>>>
> >>>>> _______________________________________________
> >>>>> devel mailing list
> >>>>> devel_at_[hidden]
> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>>
> >>>> --
> >>>> Gleb.
> >>>> _______________________________________________
> >>>> devel mailing list
> >>>> devel_at_[hidden]
> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>
> >>>
> >>
> >>
> >> --
> >> Pavel Shamis (Pasha)
> >> Mellanox Technologies
> >>
> >
> >
>
>
> --
> Pavel Shamis (Pasha)
> Mellanox Technologies
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
Cisco Systems