Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] jobs are hanging with btl_openib_component error
From: Shamis, Pavel (shamisp_at_[hidden])
Date: 2013-06-17 14:03:09

You may use tools like this
to debug your ib network problems. Most likely, you have some bad cable or connector somewhere in the network.
The tool should be able to pin-point the problem.

Pavel (Pasha) Shamis

Computer Science Research Group
Computer Science and Math Division
Oak Ridge National Laboratory
On Jun 17, 2013, at 9:41 AM, Jeff Squyres (jsquyres) <jsquyres_at_[hidden]<mailto:jsquyres_at_[hidden]>> wrote:
That sounds like there's a problem with your InfiniBand fabric.
You should run a complete level-0 diagnostic on your IB network.
On Jun 17, 2013, at 5:23 AM, "Singh, Bharati (GE Global Research, consultant)" <Bharati.Singh_at_[hidden]<mailto:Bharati.Singh_at_[hidden]>> wrote:
Hi Team,
Our users jobs are hanging and we notice below errors.
[[61410,1],65][btl_openib_component.c:3238:handle_wc] from bng1aviationdc22 to: bng1aviationdc26 error polling LP CQ with status RETRY EXCEEDED ERROR status number 12 for wr_id 774739584 opcode 1  vendor error 129 qp_idx 0
PFA file for more information.
Bharati Singh
**                                                                         **
** WARNING:  This email contains an attachment of a very suspicious type.  **
** You are urged NOT to open this attachment unless you are absolutely     **
** sure it is legitimate.  Opening this attachment may cause irreparable   **
** damage to your computer and your files.  If you have any questions      **
** about the validity of this message, PLEASE SEEK HELP BEFORE OPENING IT. **
**                                                                         **
** This warning was added by the IU Computer Science Dept. mail scanner.   **
users mailing list
Jeff Squyres
For corporate legal information go to:
users mailing list