On Apr 14, 2006, at 9:33 AM, Lee D. Peterson wrote:
> This problem went away yesterday. There was no intervening reboot of
> my cluster or a recompile of the code. So all I can surmise is
> something got cleaned up in a cron script. Wierd.
Very strange. Could there have been a networking issue (switch
restart or something)?
> Anyways, now I've benchmarked the HPL using OpenMPI vs LAM-MPI. The
> OpenMPI runs about 13% to sometimes 50% slower than the LAM-MPI. I'm
> running over TCP and using SSH.
Our TCP performance on Open MPI is not as good as it is in LAM/MPI,
so it's not totally surprising. 50% is, however, much more than we
expected. There are some pathalogically bad cases that can occur
with multi-NIC (especially our unoptimized multi-NIC support). It
would be interesting to see what the performance would be if you only
use one NIC. You can specify the NIC to use with the
btl_tcp_if_include MCA parameter:
mpirun -np X -mca btl_tcp_if_include en0 <app>
Hope this helps,
Open MPI developer