Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] IBCM error
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-07-13 09:13:15


I think you said opposite things: Lenny's command line did not
specifically ask for ibcm, but it was used anyway. Lenny -- did you
explicitly request it somewhere else (e.g., env var or MCA param file)?

I suspect that you did not; I suspect (without looking at the code
again) that ibcm tried to select itself and failed on the
ibcm_listen() call, so it fell back to oob. This might have to be
another workaround in OMPI, perhaps something like this:

if (ibcm_listen() fails)
    if (ibcm explicitly requested)
        print_warning()
    fail to use ibcm

Has this been filed as a bug at openfabrics.org? I don't think that I
filed it when Brad and I were testing on RoadRunner -- it would
probably be good if someone filed it.

On Jul 13, 2008, at 8:56 AM, Lenny Verkhovsky wrote:

> Pasha is right, I didn't disabled it.
>
> On 7/13/08, Pavel Shamis (Pasha) <pasha_at_[hidden]> wrote:
> Jeff Squyres wrote:
> Brad and I did some scale testing of IBCM and saw this error
> sometimes. It seemed to happen with higher frequency when you
> increased the number of processes on a single node.
>
> I talked to Sean Hefty about it, but we never figured out a
> definitive cause or solution. My best guess is that there is
> something wonky about multiple processes simultaneously interacting
> with the IBCM kernel driver from userspace; but I don't know jack
> about kernel stuff, so that's a total SWAG.
>
> Thanks for reminding me of this issue; I admit that I had forgotten
> about it. :-( Pasha -- should IBCM not be the default?
> It is not default. I guess Lenny configured it explicitly, is not it ?
>
> Pasha.
>
>
>
>
>
> On Jul 13, 2008, at 7:08 AM, Lenny Verkhovsky wrote:
>
> Hi,
>
> I am getting this error sometimes.
>
> /home/USERS/lenny/OMPI_COMP_PATH/bin/mpirun -np 100 -hostfile /home/
> USERS/lenny/TESTS/COMPILERS/hostfile /home/USERS/lenny/TESTS/
> COMPILERS/hello
> [witch24][[32428,1],96][../../../../../ompi/mca/btl/openib/connect/
> btl_openib_connect_ibcm.c:769:ibcm_component_query] failed to
> ib_cm_listen 10 times: rc=-1, errno=22
> Hello world! I'm 0 of 100 on witch2
>
>
> Best Regards
>
> Lenny.
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
Cisco Systems