Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Big job, InfiniBand, MPI_Alltoallv and ibv_create_qp failed
From: Paul Kapinos (kapinos_at_[hidden])
Date: 2013-07-30 13:42:28


Dear Open MPI experts,

An user at our cluster has a problem running a kinda of big job:
(- the job using 3024 processes (12 per node, 252 nodes) runs fine)
- the job using 4032 processes (12 per node, 336 nodes) produce the error
attached below.

Well, the http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages is
well-known one; both recommended tweakables (user limits and registered memory
size) are at MAX now, nevertheless someone queue pair could not be created.

Our blind guess is the number of completion queues is exhausted.

What happen' when raising the value from standard to max?
What max size of Open MPI jobs have been seen at all?
What max size of Open MPI jobs *using MPI_Alltoallv* have been seen at all?
Is there a way to manage the size/the number of queue pairs? (XRC not availabe)
Is there a way to tell MPI_Alltoallv to use less queue pairs, even when this
could lead to slow-down?

There is a suspicious parameter in the mlx4_core module:
$ modinfo mlx4_core | grep log_num_cq
parm: log_num_cq:log maximum number of CQs per HCA (int)

Is this the tweakable parameter?
What is the default, and max value?

Any help would be welcome...

Best,

Paul Kapinos

P.S. There should be no connection problen somewhere between the nodes; a test
job with 1x process on each node has been ran sucessfully just before starting
the actual job, which also ran OK for a while - until calling MPI_Alltoallv.

--------------------------------------------------------------------------
A process failed to create a queue pair. This usually means either
the device has run out of queue pairs (too many connections) or
there are insufficient resources available to allocate a queue pair
(out of memory). The latter can happen if either 1) insufficient
memory is available, or 2) no more physical memory can be registered
with the device.

For more information on memory registration see the Open MPI FAQs at:
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages

Local host: linuxbmc1156.rz.RWTH-Aachen.DE
Local device: mlx4_0
Queue pair type: Reliable connected (RC)
--------------------------------------------------------------------------
[linuxbmc1156.rz.RWTH-Aachen.DE][[3703,1],4021][connect/btl_openib_connect_oob.c:867:rml_recv_cb]
error in endpoint reply start connect
[linuxbmc1156.rz.RWTH-Aachen.DE:9632] *** An error occurred in MPI_Alltoallv
[linuxbmc1156.rz.RWTH-Aachen.DE:9632] *** on communicator MPI_COMM_WORLD
[linuxbmc1156.rz.RWTH-Aachen.DE:9632] *** MPI_ERR_OTHER: known error not in list
[linuxbmc1156.rz.RWTH-Aachen.DE:9632] *** MPI_ERRORS_ARE_FATAL: your MPI job
will now abort
[linuxbmc1156.rz.RWTH-Aachen.DE][[3703,1],4024][connect/btl_openib_connect_oob.c:867:rml_recv_cb]
error in endpoint reply start connect
[linuxbmc1156.rz.RWTH-Aachen.DE][[3703,1],4027][connect/btl_openib_connect_oob.c:867:rml_recv_cb]
error in endpoint reply start connect
[linuxbmc0840.rz.RWTH-Aachen.DE][[3703,1],10][connect/btl_openib_connect_oob.c:867:rml_recv_cb]
error in endpoint reply start connect
[linuxbmc0840.rz.RWTH-Aachen.DE][[3703,1],1][connect/btl_openib_connect_oob.c:867:rml_recv_cb]
error in endpoint reply start connect
[linuxbmc0840.rz.RWTH-Aachen.DE:17696] [[3703,0],0]-[[3703,1],10]
mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104)
[linuxbmc0840.rz.RWTH-Aachen.DE:17696] [[3703,0],0]-[[3703,1],8]
mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104)
[linuxbmc0840.rz.RWTH-Aachen.DE:17696] [[3703,0],0]-[[3703,1],9]
mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104)
[linuxbmc0840.rz.RWTH-Aachen.DE:17696] [[3703,0],0]-[[3703,1],1]
mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104)
[linuxbmc0840.rz.RWTH-Aachen.DE:17696] 9 more processes have sent help message
help-mpi-btl-openib-cpc-base.txt / ibv_create_qp failed
[linuxbmc0840.rz.RWTH-Aachen.DE:17696] Set MCA parameter
"orte_base_help_aggregate" to 0 to see all help / error messages
[linuxbmc0840.rz.RWTH-Aachen.DE:17696] 3 more processes have sent help message
help-mpi-errors.txt / mpi_errors_are_fatal

-- 
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915