Seems to be fixed.

On 7/14/08, Lenny Verkhovsky <lenny.verkhovsky@gmail.com> wrote:

../configure --with-memory-manager=ptmalloc2 --with-openib 

I guess not. I always use same configure line, and only recently I started to see this error. 


On 7/13/08, Jeff Squyres <jsquyres@cisco.com> wrote:
I think you said opposite things: Lenny's command line did not specifically ask for ibcm, but it was used anyway.  Lenny -- did you explicitly request it somewhere else (e.g., env var or MCA param file)?

I suspect that you did not; I suspect (without looking at the code again) that ibcm tried to select itself and failed on the ibcm_listen() call, so it fell back to oob.  This might have to be another workaround in OMPI, perhaps something like this:

if (ibcm_listen() fails)
  if (ibcm explicitly requested)
      print_warning()
  fail to use ibcm

Has this been filed as a bug at openfabrics.org?  I don't think that I filed it when Brad and I were testing on RoadRunner -- it would probably be good if someone filed it.




On Jul 13, 2008, at 8:56 AM, Lenny Verkhovsky wrote:

Pasha is right, I didn't disabled it.

On 7/13/08, Pavel Shamis (Pasha) <pasha@dev.mellanox.co.il> wrote: Jeff Squyres wrote:
Brad and I did some scale testing of IBCM and saw this error sometimes.  It seemed to happen with higher frequency when you increased the number of processes on a single node.

I talked to Sean Hefty about it, but we never figured out a definitive cause or solution.  My best guess is that there is something wonky about multiple processes simultaneously interacting with the IBCM kernel driver from userspace; but I don't know jack about kernel stuff, so that's a total SWAG.

Thanks for reminding me of this issue; I admit that I had forgotten about it.  :-(  Pasha -- should IBCM not be the default?
It is not default. I guess Lenny configured it explicitly, is not it ?

Pasha.





On Jul 13, 2008, at 7:08 AM, Lenny Verkhovsky wrote:

Hi,

I am getting this error sometimes.

/home/USERS/lenny/OMPI_COMP_PATH/bin/mpirun -np 100 -hostfile /home/USERS/lenny/TESTS/COMPILERS/hostfile /home/USERS/lenny/TESTS/COMPILERS/hello
[witch24][[32428,1],96][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_ibcm.c:769:ibcm_component_query] failed to ib_cm_listen 10 times: rc=-1, errno=22
Hello world! I'm 0 of 100 on witch2


Best Regards

Lenny.


_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Jeff Squyres
Cisco Systems


_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel