-------------------------------------------------------------------------- WARNING: It appears that your OpenFabrics subsystem is configured to only allow registering part of your physical memory. This can cause MPI jobs to run with erratic performance, hang, and/or crash. This may be caused by your OpenFabrics vendor limiting the amount of physical memory that can be registered. You should investigate the relevant Linux kernel module parameters that control how much physical memory can be registered, and increase them to allow registering all physical memory on your machine. See this Open MPI FAQ item for more information on these Linux kernel module parameters: http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages Local host: linuxbmc0219.rz.RWTH-Aachen.DE Registerable memory: 32768 MiB Total memory: 98293 MiB -------------------------------------------------------------------------- process 1 starts test process 3 starts test process 2 starts test process 0 starts test Epsilon = 0.0000000010 process 5 starts test process 4 starts test [cluster.rz.RWTH-Aachen.DE:40669] 5 more processes have sent help message help-mpi-btl-openib.txt / reg mem limit low [cluster.rz.RWTH-Aachen.DE:40669] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages ####### process 0 yields partial sum: 108000000.4 ####### process 5 yields partial sum: 108000000.4 ####### process 2 yields partial sum: 108000000.4 size, block: 1080000000 1080000001 MPI_Reduce in einem! ####### process 1 yields partial sum: 108000000.4 size, block: 1080000000 1080000001 MPI_Reduce in einem! ####### process 3 yields partial sum: 108000000.4 ####### process 4 yields partial sum: 108000000.4 size, block: 1080000000 1080000001 MPI_Reduce in einem! size, block: 1080000000 1080000001 MPI_Reduce in einem! size, block: 1080000000 1080000001 MPI_Reduce in einem! size, block: 1080000000 1080000001 MPI_Reduce in einem! difference occured: 2.597129 [SOLL | IST] = ( 648000000.00 | 648000002.60) Elapsed time: 81.982594 -------------------------------------------------------------------------- A process failed to create a queue pair. This usually means either the device has run out of queue pairs (too many connections) or there are insufficient resources available to allocate a queue pair (out of memory). The latter can happen if either 1) insufficient memory is available, or 2) no more physical memory can be registered with the device. For more information on memory registration see the Open MPI FAQs at: http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages Local host: linuxbmc0219.rz.RWTH-Aachen.DE Local device: mlx4_0 Queue pair type: Reliable connected (RC) -------------------------------------------------------------------------- [linuxbmc0219.rz.RWTH-Aachen.DE:20388] *** An error occurred in MPI_Barrier [linuxbmc0219.rz.RWTH-Aachen.DE:20388] *** on communicator MPI_COMM_WORLD [linuxbmc0219.rz.RWTH-Aachen.DE:20388] *** MPI_ERR_OTHER: known error not in list [linuxbmc0219.rz.RWTH-Aachen.DE:20388] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort rank: 3 VmPeak: 25775824 kB Master's Summe: 647999996.699953 rank: 0 VmPeak: 17338196 kB rank: 4 VmPeak: 25775712 kB rank: 2 VmPeak: 25775668 kB rank: 5 VmPeak: 17104192 kB -------------------------------------------------------------------------- mpiexec has exited due to process rank 3 with PID 20385 on node linuxbmc0219 exiting improperly. There are two reasons this could occur: 1. this process did not call "init" before exiting, but others in the job did. This can cause a job to hang indefinitely while it waits for all processes to call "init". By rule, if one process calls "init", then ALL processes must call "init" prior to termination. 2. this process called "init", but exited without calling "finalize". By rule, all processes that call "init" MUST call "finalize" prior to exiting or it will be considered an "abnormal termination" This may have caused other processes in the application to be terminated by signals sent by mpiexec (as reported here). -------------------------------------------------------------------------- rank: 1 VmPeak: 25775812 kB Failure executing command /opt/MPI/openmpi-1.6.1rc2/linux/intel/bin/mpiexec -x LD_LIBRARY_PATH -x PATH -x OMP_NUM_THREADS -x MPI_NAME --hostfile /tmp/pk224850/cluster_2867/hostfile-40538 -np 6 memusage a.out 1080000000 1080000001