I'm using OpenMPI 1.4.3 and have been running a particular case on 120, 240, 480 and 960 processes. My time-per-work metric reports 60, 30, 15, 15. If I do the same run with MVAPICH 1.2, I get 60, 30, 15, 8. There is something running very slowly with OpenMPI 1.4.3 as the process count goes from 480 up to 960.
Also this case has been really troublesome at 960, reliability-wise. Initially, the OpenMPI cases would reach a certain point in the application with some weird communication patterns, and they would die with the following messages:
c4n01][[14679,1],5][connect/btl_openib_connect_oob.c:464:qp_create_one] error creating qp errno says Cannot allocate memory
I then added this parameter:
'--mca btl_openib_receive_queues X,128,256,192,128:X,2048,256,128,32:X,12288,256,128,32:X,65536,256,128,32'
and it runs... but as I said above, it runs 2x slower than MVAPICH. All of it is very repeatable.
How can I determine the source of the problem here?
Thanks for any advice,