i=2 ; while [ $i -lt 82 ] ; do echo "************ NP=$i************* " ; /hpc/home/USERS/lenny/OMPI_ORTE_TRUNK_RDMACM/bin/mpirun -np $i -hostfile /hpc/home/USERS/lenny/TESTS/TRUNK/hostfile1 -mca btl_openib_cpc_include rdmacm -mca btl_openib_cpc_exclude oob /hpc/home/USERS/lenny/TESTS/TRUNK/mpi_p1_4_RDMACM -t lt ; sleep 1 ; let i=i+2; done ************ NP=2************* LT (2) (size min max avg) 1 0.584006 0.584006 0.584006 ************ NP=4************* LT (4) (size min max avg) 1 0.602007 0.652432 0.627220 ************ NP=6************* LT (6) (size min max avg) 1 1.181483 1.307964 1.256982 ************ NP=8************* LT (8) (size min max avg) 1 0.832558 1.245499 1.035035 ************ NP=10************* LT (10) (size min max avg) 1 0.798106 1.157045 0.900745 ************ NP=12************* LT (12) (size min max avg) 1 0.819445 1.237512 1.094878 ************ NP=14************* LT (14) (size min max avg) 1 0.844002 1.285434 1.123854 ************ NP=16************* LT (16) (size min max avg) 1 1.190543 1.667500 1.284570 ************ NP=18************* -------------------------------------------------------------------------- The RDMA CM returned an event error while attempting to make a connection. This type of error usually indicates a network configuration error. Local host: witch5 Local device: mthca0 Error name: RDMA_CM_EVENT_ROUTE_ERROR Peer: witch3 Your MPI job will now abort, sorry. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun has exited due to process rank 13 with PID 15978 on node witch5 exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -------------------------------------------------------------------------- ************ NP=20************* LT (20) (size min max avg) 1 1.206994 1.291037 1.235116 ************ NP=22************* LT (22) (size min max avg) 1 1.145005 1.383901 1.243873 ************ NP=24************* LT (24) (size min max avg) 1 0.845075 1.280546 1.183709 ************ NP=26************* LT (26) (size min max avg) 1 0.841498 1.284003 1.120705 ************ NP=28************* LT (28) (size min max avg) 1 0.801086 1.294971 1.126783 ************ NP=30************* LT (30) (size min max avg) 1 0.819445 1.295924 1.176206 ************ NP=32************* LT (32) (size min max avg) 1 0.629067 1.634479 1.032442 ************ NP=34************* LT (34) (size min max avg) 1 0.828505 1.322985 1.108247 ************ NP=36************* LT (36) (size min max avg) 1 0.633955 1.318455 1.034319 ************ NP=38************* LT (38) (size min max avg) 1 0.663042 1.345992 1.160201 ************ NP=40************* LT (40) (size min max avg) 1 0.824571 1.335979 1.099044 ************ NP=42************* LT (42) (size min max avg) 1 0.647068 1.253963 0.966288 ************ NP=44************* LT (44) (size min max avg) 1 0.701427 1.294494 1.039743 ************ NP=46************* LT (46) (size min max avg) 1 0.817895 1.593471 1.129596 ************ NP=48************* [witch8][[25541,1],27][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:407:rdmacm_setup_qp] Failed to create qp with 9390800 [witch8][[25541,1],27][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:829:handle_connect_request] rdmacm_setup_qp error -1 [[25541,1],28][../../../../../ompi/mca/btl/openib/btl_openib_component.c:2825:handle_wc] from witch9 to: witch8 error polling HP CQ with status WORK REQUEST FLUSHED ERROR status number 5 for wr_id 12586344 opcode 0 qp_idx 0 -------------------------------------------------------------------------- mpirun has exited due to process rank 27 with PID 24585 on node witch8 exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -------------------------------------------------------------------------- ************ NP=50************* LT (50) (size min max avg) 1 0.845909 1.265526 1.096773 ************ NP=52************* -------------------------------------------------------------------------- The RDMA CM returned an event error while attempting to make a connection. This type of error usually indicates a network configuration error. Local host: witch15 Local device: mlx4_0 Error name: RDMA_CM_EVENT_ROUTE_ERROR Peer: witch6 Your MPI job will now abort, sorry. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun has exited due to process rank 51 with PID 7906 on node witch15 exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -------------------------------------------------------------------------- ************ NP=54************* LT (54) (size min max avg) 1 0.645518 1.283407 1.001539 ************ NP=56************* LT (56) (size min max avg) 1 0.651002 1.306057 1.055053 ************ NP=58************* LT (58) (size min max avg) 1 0.656962 1.314521 1.019219 ************ NP=60************* LT (60) (size min max avg) 1 0.805974 1.660466 1.101641 ************ NP=62************* LT (62) (size min max avg) 1 0.693917 1.608014 1.045942 ************ NP=64************* LT (64) (size min max avg) 1 0.628471 1.295567 0.956681 ************ NP=66************* LT (66) (size min max avg) 1 0.640035 1.539946 0.999606 ************ NP=68************* LT (68) (size min max avg) 1 0.648499 1.273990 0.907807 ************ NP=70************* [witch13][[24821,1],40][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:407:rdmacm_setup_qp] Failed to create qp with 9396256 [witch13][[24821,1],40][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:1296:finish_connect] rdmacm_setup_qp error -1 -------------------------------------------------------------------------- mpirun has exited due to process rank 40 with PID 8677 on node witch13 exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -------------------------------------------------------------------------- ************ NP=72************* LT (72) (size min max avg) 1 0.635982 1.258969 0.985656 ************ NP=74************* LT (74) (size min max avg) 1 0.664473 1.313448 1.022242 ************ NP=76************* LT (76) (size min max avg) 1 0.675559 1.300573 0.997245 ************ NP=78************* [witch7][[24751,1],20][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:407:rdmacm_setup_qp] Failed to create qp with 9404384-------------------------------------------------------------------------- mpirun has exited due to process rank 20 with PID 6335 on node witch7 exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -------------------------------------------------------------------------- [witch7][[24751,1],20][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:1296:finish_connect] rdmacm_setup_qp error -1 ************ NP=80************* -------------------------------------------------------------------------- The RDMA CM returned an event error while attempting to make a connection. This type of error usually indicates a network configuration error. Local host: witch9 Local device: mthca0 Error name: RDMA_CM_EVENT_UNREACHABLE Peer: witch18 Your MPI job will now abort, sorry. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun has exited due to process rank 30 with PID 10925 on node witch9 exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -------------------------------------------------------------------------- [witch1:16113] 2 more processes have sent help message help-mpi-btl-openib-cpc-rdmacm.txt / rdma cm event error [witch1:16113] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages