Thanks for your help. The hanging problem came back again a day ago.
However, I can now run only if I use either "-mca btl_tcp_if_include
en0" or "-mca btl_tcp_if_include en1". Using btl_tcp_if_exclude on
either en0 or en1 doesn't work.
Regarding the TCP performance, I ran the HPL benchmark again and see
typically 85% to 90% of the LAM-MPI speed, provided the problem size
isn't too small.
On Apr 16, 2006, at 12:21 PM, Brian Barrett wrote:
> On Apr 14, 2006, at 9:33 AM, Lee D. Peterson wrote:
>> This problem went away yesterday. There was no intervening reboot of
>> my cluster or a recompile of the code. So all I can surmise is
>> something got cleaned up in a cron script. Wierd.
> Very strange. Could there have been a networking issue (switch
> restart or something)?
>> Anyways, now I've benchmarked the HPL using OpenMPI vs LAM-MPI. The
>> OpenMPI runs about 13% to sometimes 50% slower than the LAM-MPI. I'm
>> running over TCP and using SSH.
> Our TCP performance on Open MPI is not as good as it is in LAM/MPI,
> so it's not totally surprising. 50% is, however, much more than we
> expected. There are some pathalogically bad cases that can occur
> with multi-NIC (especially our unoptimized multi-NIC support). It
> would be interesting to see what the performance would be if you only
> use one NIC. You can specify the NIC to use with the
> btl_tcp_if_include MCA parameter:
> mpirun -np X -mca btl_tcp_if_include en0 <app>
> Hope this helps,
> Brian Barrett
> Open MPI developer
> users mailing list