On Apr 25, 2009, at 11:59 AM, Anton Starikov wrote:
> I can confirm that I have exactly the same problem, also on Dell
> system, even with latest openpmpi.
> Our system is:
> Dell M905
> OpenSUSE 11.1
> kernel: 220.127.116.11-0.1-default
> ofed-1.4-21.12 from SUSE repositories.
> But what I can also add, it not only affect openmpi, if this messages
> are triggered after mpirun:
> [node032][[9340,1],11][btl_openib_component.c:3002:poll_device] error
> polling HP CQ with -2 errno says Success
> Then IB stack hangs. You cannot even reload it, have to reboot node.
Something that severe should not be able to be caused by Open MPI.
Specifically: Open MPI should not be able to hang the OFED stack.
Have you run layer 0 diagnostics to know that your fabric is clean?
You might want to contact your IB vendor to find out how to do that.