Are there are tuning parameters than I can use to reduce the
amount of memory used by OpenMPI? I would very much like to use OpenMPI
instead of MVAPICH, but I’m on a cluster where memory usage is the most
important consideration. Here are three results which capture the problem:
With the “leave_pinned” behavior turned on, I
get good performance (19.528, lower is better)
mpirun --prefix /usr/mpi/intel/openmpi-1.2.8 --machinefile
/var/spool/torque/aux/7972.fwnaeglingio -np 28 --mca btl ^tcp --mca
mpi_leave_pinned 1 --mca mpool_base_use_mem_hooks 1 -x LD_LIBRARY_PATH -x
MPI_ENVIRONMENT=1 /tmp/7972.fwnaeglingio/falconv4_ibm_openmpi -cycles 100 -ri
restart.0 -ro /tmp/7972.fwnaeglingio/restart.0
Compute rate
(processor-microseconds/cell/cycle): 19.528
Total memory usage: 38155.3477 MB (38.1553
GB)
Turning off the leave_pinned behavior, I get considerably
slower performance (28.788), but the memory usage is unchanged (still 38 GB)
mpirun --prefix /usr/mpi/intel/openmpi-1.2.8 --machinefile
/var/spool/torque/aux/7972.fwnaeglingio -np 28 -x LD_LIBRARY_PATH -x
MPI_ENVIRONMENT=1 /tmp/7972.fwnaeglingio/falconv4_ibm_openmpi -cycles 100 -ri
restart.0 -ro /tmp/7972.fwnaeglingio/restart.0
Compute rate
(processor-microseconds/cell/cycle): 28.788
Total memory usage: 38335.7656 MB (38.3358
GB)
Using MVAPICH, the performance is in the middle (23.6), but
the memory usage is reduced by 5 to 6 GB out of 38 GB, a significant decrease
to me.
/usr/mpi/intel/mvapich-1.1.0/bin/mpirun_rsh -ssh -np 28
-hostfile /var/spool/torque/aux/7972.fwnaeglingio
LD_LIBRARY_PATH="/usr/mpi/intel/mvapich-1.1.0/lib/shared:/usr/mpi/intel/openmpi-1.2.8/lib64:/appserv/intel/fce/10.1.008/lib:/appserv/intel/cce/10.1.008/lib"
MPI_ENVIRONMENT=1 /tmp/7972.fwnaeglingio/falconv4_ibm_mvapich -cycles 100 -ri
restart.0 -ro /tmp/7972.fwnaeglingio/restart.0
Compute rate
(processor-microseconds/cell/cycle): 23.608
Total memory usage: 32753.0586 MB (32.7531
GB)
I didn’t see anything in the FAQ that discusses memory
usage other than the impact of the “leave_pinned” option, which
apparently does not affect the memory usage in my case. But I figure
there must be a justification why OpenMPI would use 6 GB more than MVAPICH on
the same case.
Thanks for any insights. Also attached is the output
of ompi_info –a.