Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] [EXTERNAL] OpenMPI with dual port Myrinet cards
From: George Bosilca (bosilca_at_[hidden])
Date: 2014-01-13 20:23:14


If the MX library supports a mapper, Open MPI takes provide selective hardware activation. Look at the MCA parameters supported by the MX devices to get more info (ompi_info —mca btl mx). The one that is of interest in this particular case is mx_if_include, allowing MX based jobs to only use the Myrinet card that has a mapper matching the provided key. You should set this MCA parameter to the last 6 digits of your mapper MAC (—mca btl mx_if_include abcdef).


On Jan 14, 2014, at 00:36 , Barrett, Brian W <bwbarre_at_[hidden]> wrote:

> Victor -
> I don't think our multi-port support with MX is particularly well tested (I know I don't test that path).
> It looks like you might be able to work around the problem by setting -mca mtl_mx_endpoint_num 1 on the mpirun command line, which will only use the first port found. But I could be wrong.
> Brian
> On 1/9/14 5:02 PM, "Victor Prosolin" <Victor.Prosolin_at_[hidden]> wrote:
>> H,
>> Our cluster has servers with either a single port or a dual port Myrinet card. In case of a dual card, only one port is connected to the Myrinet switch. The OpenMPI library is configured with “--with-mx=…” option and it works fine when I submit jobs to single port servers only. However, when I try to include a server with a dual port card, I get a bunch of errors like the following:
>> [compute-08:17788] mx_connect fail for unknown 60dd464f9d nic_id with key aaaaffff (error Destination NIC not found in network table)
>> 60dd464f9d is the wrong MAC address corresponding to port 1 (not connected) when port 0 is connected to the switch and has MAC 60dd464f9c.
>> This is how (try to) I run the job:
>> 1. mpiexec -np 32 -host compute-08,compute-17,compute-18,compute-16 -mca mtl mx --mca pml cm ./wrf.exe
>> or
>> 2. Using a similar command but via Sun Grid Engine.
>> The OS is Centos 6.4, 64bit. OpenMPI 1.6.5 compiled from the official src rpm with gcc 4.4.7, MX library 1.2.16 manually compiled. Again, this configuration works fine when only single port servers are used.
>> Is there a way to tell OpenMPI to stick to the one port that is connected? I haven’t found any options through ompi_info or via google… Any help will be greatly appreciated.
>> Sincerely,
>> Victor.
> --
> Brian W. Barrett
> Scalable System Software Group
> Sandia National Laboratories
> _______________________________________________
> users mailing list
> users_at_[hidden]