Our cluster has servers with either a single port or a dual port Myrinet card. In case of a dual card, only one port is connected to the Myrinet switch. The OpenMPI library is configured with "--with-mx=..." option and it works fine when I submit jobs to single port servers only. However, when I try to include a server with a dual port card, I get a bunch of errors like the following:
[compute-08:17788] mx_connect fail for unknown 60dd464f9d nic_id with key aaaaffff (error Destination NIC not found in network table)
60dd464f9d is the wrong MAC address corresponding to port 1 (not connected) when port 0 is connected to the switch and has MAC 60dd464f9c.
This is how (try to) I run the job:
1. mpiexec -np 32 -host compute-08,compute-17,compute-18,compute-16 -mca mtl mx --mca pml cm ./wrf.exe
2. Using a similar command but via Sun Grid Engine.
The OS is Centos 6.4, 64bit. OpenMPI 1.6.5 compiled from the official src rpm with gcc 4.4.7, MX library 1.2.16 manually compiled. Again, this configuration works fine when only single port servers are used.
Is there a way to tell OpenMPI to stick to the one port that is connected? I haven't found any options through ompi_info or via google... Any help will be greatly appreciated.
RWDI - One of Canada's 50 Best Managed Companies
This communication is intended for the sole use of the party to whom it was addressed and may contain information that is privileged and/or confidential. Any other distribution, copying or disclosure is strictly prohibited. If you received this email in error, please notify us immediately by replying to this email and delete the message without retaining any hard or electronic copies of same.
Outgoing emails are scanned for viruses, but no warranty is made to their absence in this email or attachments.