Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] jobs are hanging with btl_openib_component error
From: Shamis, Pavel (shamisp_at_[hidden])
Date: 2013-06-17 14:03:09


You may use tools like this http://linux.die.net/man/1/ibdiagnet
to debug your ib network problems. Most likely, you have some bad cable or connector somewhere in the network.
The tool should be able to pin-point the problem.

Pavel (Pasha) Shamis

---
Computer Science Research Group
Computer Science and Math Division
Oak Ridge National Laboratory
On Jun 17, 2013, at 9:41 AM, Jeff Squyres (jsquyres) <jsquyres_at_[hidden]<mailto:jsquyres_at_[hidden]>> wrote:
That sounds like there's a problem with your InfiniBand fabric.
You should run a complete level-0 diagnostic on your IB network.
On Jun 17, 2013, at 5:23 AM, "Singh, Bharati (GE Global Research, consultant)" <Bharati.Singh_at_[hidden]<mailto:Bharati.Singh_at_[hidden]>> wrote:
Hi Team,
Our users jobs are hanging and we notice below errors.
[[61410,1],65][btl_openib_component.c:3238:handle_wc] from bng1aviationdc22 to: bng1aviationdc26 error polling LP CQ with status RETRY EXCEEDED ERROR status number 12 for wr_id 774739584 opcode 1  vendor error 129 qp_idx 0
PFA file for more information.
Thanks,
Bharati Singh
*****************************************************************************
**                                                                         **
** WARNING:  This email contains an attachment of a very suspicious type.  **
** You are urged NOT to open this attachment unless you are absolutely     **
** sure it is legitimate.  Opening this attachment may cause irreparable   **
** damage to your computer and your files.  If you have any questions      **
** about the validity of this message, PLEASE SEEK HELP BEFORE OPENING IT. **
**                                                                         **
** This warning was added by the IU Computer Science Dept. mail scanner.   **
*****************************************************************************
<output.14807.zip>_______________________________________________
users mailing list
users_at_[hidden]<mailto:users_at_[hidden]>
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Jeff Squyres
jsquyres_at_[hidden]<mailto:jsquyres_at_[hidden]>
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
_______________________________________________
users mailing list
users_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/users