Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] Merging in the CPC work
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-04-24 14:54:25


I did this in https://svn.open-mpi.org/trac/ompi/changeset/18279; the
message is now gone if IBCM is not installed on the host.

If you care: I actually used open() instead of stat(), because that
way I can also ensure that the current user is able to both read and
write to the device (which is also required).

On Apr 24, 2008, at 11:03 AM, Jeff Squyres (jsquyres) wrote:

> ...actually, thinking about this a bit more, it might be easy to try
> to stat /dev/infiniband/ucmX before calling ib_cm_open_device. I'll
> check into it this afternoon.
>
> -jms
> Sent from my PDA. No type good.
>
> -----Original Message-----
> From: Jeff Squyres (jsquyres)
> Sent: Thursday, April 24, 2008 10:56 AM Eastern Standard Time
> To: pasha_at_[hidden]
> Cc: Open MPI Developers
> Subject: Re: [OMPI devel] Merging in the CPC work
>
> Its unavoidable in the current rev of libibcm :( - sean hefty tells
> me that he'll remove that message in the next release.
>
> For the time being, mayhe the right solution in ompi is to not try
> to use ibcm unless its specifically requested. :(
>
> -jms
> Sent from my PDA. No type good.
>
> -----Original Message-----
> From: Pavel Shamis (Pasha) [mailto:pasha_at_[hidden]]
> Sent: Thursday, April 24, 2008 10:52 AM Eastern Standard Time
> To: Jeff Squyres (jsquyres)
> Cc: Open MPI Developers
> Subject: Re: [OMPI devel] Merging in the CPC work
>
> The trivial tests Pass and now I'm running full testing.
> In the NOT_XRC tests i got:
>
> libibcm: unable to open /dev/infiniband/ucm0
> libibcm: couldn't read ABI version
>
> But the test PASS successfully. So as I understood it use OOB. Can we
> prevent the message somehow ?
>
> Jeff Squyres wrote:
> > Thanks! That's a result of some [helpful] error messages and
> handling
> > that I added yesterday when ibcm is not configured on the host.
> >
> > Fixed in r18273.
> >
> >
> > On Apr 24, 2008, at 8:22 AM, Pavel Shamis (Pasha) wrote:
> >
> >> The patch below resolves the segfault :
> >>
> >> -- btl_openib_connect_ibcm.c.orig 2008-04-24
> 15:14:28.500676000
> >> +0300
> >> +++ btl_openib_connect_ibcm.c 2008-04-24 15:15:08.961168000 +0300
> >> @@ -328,7 +328,7 @@
> >> {
> >> int rc;
> >> modex_msg_t *msg;
> >> - ibcm_module_t *m;
> >> + ibcm_module_t *m = NULL;
> >> opal_list_item_t *item;
> >> ibcm_listen_cm_id_t *cmh;
> >> ibcm_module_list_item_t *imli;
> >>
> >>
> >> Jeff Squyres wrote:
> >>> I had a linker error with the rdmacm library yesterday that I
> fixed
> >>> later, sorry.
> >>>
> >>> Could you try it again? You'll need to svn up, re-autogen,
> etc. It
> >>> should be obvious whether I fixed it -- even trivial apps will
> work
> >>> or not work.
> >>>
> >>> Thanks.
> >>>
> >>>
> >>> On Apr 24, 2008, at 6:24 AM, Gleb Natapov wrote:
> >>>
> >>>> On Thu, Apr 24, 2008 at 11:50:10AM +0300, Pavel Shamis (Pasha)
> wrote:
> >>>>> Jeff,
> >>>>> All my tests fail.
> >>>>> XRC disabled tests failed with:
> >>>>> mtt/installs/Zq_9/install/lib/openmpi/mca_btl_openib.so:
> undefined
> >>>>> symbol: rdma_create_event_channel
> >>>>> XRC enabled failed with segfault , I will take a look later
> today.
> >>>> Well it is a little bit better for me. I compiled only OOB
> connection
> >>>> manager and ompi passes simple testing.
> >>>>
> >>>>>
> >>>>> Pasha
> >>>>>
> >>>>> Jeff Squyres wrote:
> >>>>>> As we discussed yesterday, I have started the merge from the /
> tmp-
> >>>>>> public/openib-cpc2 branch. "oob" is currently the default.
> >>>>>>
> >>>>>> Unfortunately, it caused quite a few conflicts when I merged
> with
> >>>>>> the
> >>>>>> trunk, so I created a new temp branch and put all the work
> there:
> >>>>>> /tmp-
> >>>>>> public/openib-cpc3.
> >>>>>>
> >>>>>> Could all the IB and iWARP vendors and any other interested
> parties
> >>>>>> please try this branch before we bring it back to the trunk?
> Please
> >>>>>> test all functionality that you care about -- XRC, etc. I'd
> like to
> >>>>>> bring it back to the trunk COB Thursday. Please let me know
> if this
> >>>>>> is too soon.
> >>>>>>
> >>>>>> You can force the selection of a different CPC with the
> >>>>>> btl_openib_cpc_include MCA param:
> >>>>>>
> >>>>>> mpirun --mca btl_openib_cpc_include oob ...
> >>>>>> mpirun --mca btl_openib_cpc_include xoob ...
> >>>>>> mpirun --mca btl_openib_cpc_include rdma_cm ...
> >>>>>> mpirun --mca btl_openib_cpc_include ibcm ...
> >>>>>>
> >>>>>> You might want to concentrate on testing oob and xoob to
> ensure that
> >>>>>> we didn't cause any regressions. The ibcm and rdma_cm CPCs
> probably
> >>>>>> still have some rough edges (and the IBCM package in OFED
> itself may
> >>>>>> not be 100% -- that's one of the things we're evaluating. It's
> >>>>>> known
> >>>>>> to not install properly on RHEL4U4, for example -- you have to
> >>>>>> manually mknod and chmod a device in /dev/infiniband for every
> >>>>>> HCA in
> >>>>>> the host).
> >>>>>>
> >>>>>> Thanks.
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Pavel Shamis (Pasha)
> >>>>> Mellanox Technologies
> >>>>>
> >>>>> _______________________________________________
> >>>>> devel mailing list
> >>>>> devel_at_[hidden]
> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>>
> >>>> --
> >>>> Gleb.
> >>>> _______________________________________________
> >>>> devel mailing list
> >>>> devel_at_[hidden]
> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>
> >>>
> >>
> >>
> >> --
> >> Pavel Shamis (Pasha)
> >> Mellanox Technologies
> >>
> >
> >
>
>
> --
> Pavel Shamis (Pasha)
> Mellanox Technologies
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
Cisco Systems