Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] users Digest, Vol 1217, Issue 2, Message3
From: jan (jan_at_[hidden])
Date: 2009-04-30 05:33:07


Thank You Jeff Squyres. Could you suggest the method to run layer 0
diagnostics to know that if the fabric is clean. I have contacted Dell
local(Taiwan). I don't think they are familiar with Openmpi even the
infiniband module. Does anyone have the IB stack hangs problem with Mellanox
ConnectX product?

Thank you again.

Best Regards,

Gloria Jan
Wavelink Technology Inc

>> I can confirm that I have exactly the same problem, also on Dell
>> system, even with latest openpmpi.
>>
>> Our system is:
>>
>> Dell M905
>> OpenSUSE 11.1
>> kernel: 2.6.27.21-0.1-default
>> ofed-1.4-21.12 from SUSE repositories.
>> OpenMPI-1.3.2
>>
>>
>> But what I can also add, it not only affect openmpi, if this messages
>> are triggered after mpirun:
>> [node032][[9340,1],11][btl_openib_component.c:3002:poll_device] error
>> polling HP CQ with -2 errno says Success
>>
>> Then IB stack hangs. You cannot even reload it, have to reboot node.
>>
>
>
> Something that severe should not be able to be caused by Open MPI.
> Specifically: Open MPI should not be able to hang the OFED stack.
> Have you run layer 0 diagnostics to know that your fabric is clean?
> You might want to contact your IB vendor to find out how to do that.
>
> --
> Jeff Squyres
> Cisco Systems
>
>
>
> ------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> End of users Digest, Vol 1217, Issue 2
> **************************************
>