Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] IBV_EVENT_QP_ACCESS_ERR
From: Brock Palen (brockp_at_[hidden])
Date: 2013-01-23 15:17:20


have a user whos code at scale dies reliably with the errors (new hosts each time):

We have been using for this code:
-mca btl_openib_receive_queues X,4096,128:X,12288,128:X,65536,12

Without that option it dies with an out of memory message reliably.

Note this code runs fine at the same scale on Pilaties (NASA SGI box) using MPT,

Are we running out of QP? Is that possible?

--------------------------------------------------------------------------
The OpenFabrics stack has reported a network error event. Open MPI
will try to continue, but your job may end up failing.

  Local host: nyx5608.engin.umich.edu
  MPI process PID: 42036
  Error number: 3 (IBV_EVENT_QP_ACCESS_ERR)

This error may indicate connectivity problems within the fabric;
please contact your system administrator.
--------------------------------------------------------------------------
[[9462,1],3][../../../../../openmpi-1.6/ompi/mca/btl/openib/btl_openib_component.c:3394:handle_wc] from nyx5608.engin.umich.edu to: nyx5022 error polling LP CQ with status INVALID REQUEST ERROR status number 9 for wr_id 14d6d00 opcode 0 vendor error 138 qp_idx 0
--------------------------------------------------------------------------
The OpenFabrics stack has reported a network error event. Open MPI
will try to continue, but your job may end up failing.

  Local host: (null)
  MPI process PID: 42038
  Error number: 3 (IBV_EVENT_QP_ACCESS_ERR)

This error may indicate connectivity problems within the fabric;
please contact your system administrator.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
brockp_at_[hidden]
(734)936-1985