-------------------------------------------------------------------------- WARNING: It appears that your OpenFabrics subsystem is configured to only allow registering part of your physical memory. This can cause MPI jobs to run with erratic performance, hang, and/or crash. This may be caused by your OpenFabrics vendor limiting the amount of physical memory that can be registered. You should investigate the relevant Linux kernel module parameters that control how much physical memory can be registered, and increase them to allow registering all physical memory on your machine. See this Open MPI FAQ item for more information on these Linux kernel module parameters: http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages Local host: linuxbmc0221.rz.RWTH-Aachen.DE Registerable memory: 32768 MiB Total memory: 98293 MiB -------------------------------------------------------------------------- process 5 starts test process 2 starts test process 3 starts test process 1 starts test process 0 starts test Epsilon = 0.0000000010 process 4 starts test [cluster.rz.RWTH-Aachen.DE:28988] 5 more processes have sent help message help-mpi-btl-openib.txt / reg mem limit low [cluster.rz.RWTH-Aachen.DE:28988] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages ####### process 5 yields partial sum: 108000000.4 ####### process 1 yields partial sum: 108000000.4 size, block: 1080000000 1080000001 MPI_Reduce in einem! ####### process 2 yields partial sum: 108000000.4 size, block: 1080000000 1080000001 MPI_Reduce in einem! ####### process 3 yields partial sum: 108000000.4 ####### process 0 yields partial sum: 108000000.4 size, block: 1080000000 1080000001 MPI_Reduce in einem! size, block: 1080000000 1080000001 MPI_Reduce in einem! ####### process 4 yields partial sum: 108000000.4 size, block: 1080000000 1080000001 MPI_Reduce in einem! size, block: 1080000000 1080000001 MPI_Reduce in einem! -------------------------------------------------------------------------- A process failed to create a queue pair. This usually means either the device has run out of queue pairs (too many connections) or there are insufficient resources available to allocate a queue pair (out of memory). The latter can happen if either 1) insufficient memory is available, or 2) no more physical memory can be registered with the device. For more information on memory registration see the Open MPI FAQs at: http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages Local host: linuxbmc0191.rz.RWTH-Aachen.DE Local device: mlx4_0 Queue pair type: Reliable connected (RC) -------------------------------------------------------------------------- [linuxbmc0191.rz.RWTH-Aachen.DE:21884] *** An error occurred in MPI_Barrier [linuxbmc0191.rz.RWTH-Aachen.DE:21884] *** on communicator MPI_COMM_WORLD [linuxbmc0191.rz.RWTH-Aachen.DE:21884] *** MPI_ERR_OTHER: known error not in list [linuxbmc0191.rz.RWTH-Aachen.DE:21884] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort [cluster.rz.RWTH-Aachen.DE:28988] 1 more process has sent help message help-mpi-btl-openib-cpc-base.txt / ibv_create_qp failed [cluster.rz.RWTH-Aachen.DE:28988] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal -------------------------------------------------------------------------- WARNING: A process refused to die! Host: linuxbmc0055.rz.RWTH-Aachen.DE PID: 32263 This process may still be running and/or consuming resources. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpiexec has exited due to process rank 2 with PID 13153 on node linuxbmc0105 exiting improperly. There are two reasons this could occur: 1. this process did not call "init" before exiting, but others in the job did. This can cause a job to hang indefinitely while it waits for all processes to call "init". By rule, if one process calls "init", then ALL processes must call "init" prior to termination. 2. this process called "init", but exited without calling "finalize". By rule, all processes that call "init" MUST call "finalize" prior to exiting or it will be considered an "abnormal termination" This may have caused other processes in the application to be terminated by signals sent by mpiexec (as reported here). -------------------------------------------------------------------------- [cluster.rz.RWTH-Aachen.DE:28988] 3 more processes have sent help message help-odls-default.txt / odls-default:could-not-kill Failure executing command /opt/MPI/openmpi-1.6.1rc2/linux/intel/bin/mpiexec -x LD_LIBRARY_PATH -x PATH -x OMP_NUM_THREADS -x MPI_NAME -mca oob_tcp_if_include ib0 -mca btl_tcp_if_include ib0 --hostfile /tmp/pk224850/cluster_52665/hostfile-28863 -np 6 a.out 1080000000 1080000001