Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] IBV_EVENT_QP_ACCESS_ERR
From: Shamis, Pavel (shamisp_at_[hidden])
Date: 2013-01-23 15:52:00


> have a user whos code at scale dies reliably with the errors (new hosts each time):
>
> We have been using for this code:
> -mca btl_openib_receive_queues X,4096,128:X,12288,128:X,65536,12
>
> Without that option it dies with an out of memory message reliably.
>
> Note this code runs fine at the same scale on Pilaties (NASA SGI box) using MPT,
>
> Are we running out of QP? Is that possible?

I don't think this running-out-of-QP error.

The initiator gets NACK on request, which essentially says that the request isn't good. The passive side reports QP access error.
Do you observe this error on small scale runs ? let's say 8-16 nodes ?

Did you try to replace all the "X" with "S" and see what happens ? Do you know what OFED version is installed on your system ?
Last time I tested the XRC (X) with OFED 1.5.1. I'm wandering if newer OFED version changed XRC behavior.
 

Regards,
Pasha
 

>
> --------------------------------------------------------------------------
> The OpenFabrics stack has reported a network error event. Open MPI
> will try to continue, but your job may end up failing.
>
> Local host: nyx5608.engin.umich.edu
> MPI process PID: 42036
> Error number: 3 (IBV_EVENT_QP_ACCESS_ERR)
>
> This error may indicate connectivity problems within the fabric;
> please contact your system administrator.
> --------------------------------------------------------------------------
> [[9462,1],3][../../../../../openmpi-1.6/ompi/mca/btl/openib/btl_openib_component.c:3394:handle_wc] from nyx5608.engin.umich.edu to: nyx5022 error polling LP CQ with status INVALID REQUEST ERROR status number 9 for wr_id 14d6d00 opcode 0 vendor error 138 qp_idx 0
> --------------------------------------------------------------------------
> The OpenFabrics stack has reported a network error event. Open MPI
> will try to continue, but your job may end up failing.
>
> Local host: (null)
> MPI process PID: 42038
> Error number: 3 (IBV_EVENT_QP_ACCESS_ERR)
>
> This error may indicate connectivity problems within the fabric;
> please contact your system administrator.
>
>
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> brockp_at_[hidden]
> (734)936-1985
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users