Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] IBCM error
From: Pavel Shamis (Pasha) (pasha_at_[hidden])
Date: 2008-07-13 12:43:35


Fixed in https://svn.open-mpi.org/trac/ompi/changeset/18897

Is it any other know IBCM issue ?

Regards,
Pasha

Jeff Squyres wrote:
> I think you said opposite things: Lenny's command line did not
> specifically ask for ibcm, but it was used anyway. Lenny -- did you
> explicitly request it somewhere else (e.g., env var or MCA param file)?
>
> I suspect that you did not; I suspect (without looking at the code
> again) that ibcm tried to select itself and failed on the
> ibcm_listen() call, so it fell back to oob. This might have to be
> another workaround in OMPI, perhaps something like this:
>
> if (ibcm_listen() fails)
> if (ibcm explicitly requested)
> print_warning()
> fail to use ibcm
>
> Has this been filed as a bug at openfabrics.org? I don't think that I
> filed it when Brad and I were testing on RoadRunner -- it would
> probably be good if someone filed it.
>
>
>
> On Jul 13, 2008, at 8:56 AM, Lenny Verkhovsky wrote:
>
>> Pasha is right, I didn't disabled it.
>>
>> On 7/13/08, Pavel Shamis (Pasha) <pasha_at_[hidden]> wrote:
>> Jeff Squyres wrote:
>> Brad and I did some scale testing of IBCM and saw this error
>> sometimes. It seemed to happen with higher frequency when you
>> increased the number of processes on a single node.
>>
>> I talked to Sean Hefty about it, but we never figured out a
>> definitive cause or solution. My best guess is that there is
>> something wonky about multiple processes simultaneously interacting
>> with the IBCM kernel driver from userspace; but I don't know jack
>> about kernel stuff, so that's a total SWAG.
>>
>> Thanks for reminding me of this issue; I admit that I had forgotten
>> about it. :-( Pasha -- should IBCM not be the default?
>> It is not default. I guess Lenny configured it explicitly, is not it ?
>>
>> Pasha.
>>
>>
>>
>>
>>
>> On Jul 13, 2008, at 7:08 AM, Lenny Verkhovsky wrote:
>>
>> Hi,
>>
>> I am getting this error sometimes.
>>
>> /home/USERS/lenny/OMPI_COMP_PATH/bin/mpirun -np 100 -hostfile
>> /home/USERS/lenny/TESTS/COMPILERS/hostfile
>> /home/USERS/lenny/TESTS/COMPILERS/hello
>> [witch24][[32428,1],96][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_ibcm.c:769:ibcm_component_query]
>> failed to ib_cm_listen 10 times: rc=-1, errno=22
>> Hello world! I'm 0 of 100 on witch2
>>
>>
>> Best Regards
>>
>> Lenny.
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>