Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] mixed myrinet/non-myrinet nodes
From: 8mj6tc902_at_[hidden]
Date: 2008-01-16 02:20:50

> Subject: Re: [OMPI users] mixed myrinet/non-myrinet nodes
> From: M D Jones (jonesm_at_[hidden])
> Date: 2008-01-15 14:07:19
> Hmm, that is the way that I expected it to work as well -
> we see the warnings also, but closely followed by the
> errors (I've been trying both 1.2.5 and a recent 1.3
> snapshot with the same behavior). You don't have the
> mx driver loaded on the nodes that do not have a myrinet
> card, do you?

Well, the driver isn't "loaded" (ie: the kernel module isn't loaded),
but the library ( is available. If that library isn't
available, OpenMPI will probably fail when it tries to call the mx
functions (even if only to find there's no myrinet card available).

> Our mx is a touch behind yours (1.2.3),
> but I agree that it appears to be something in the process
> startup that is at fault, so it doesn't seem likely that
> the mx version is to blame (perhaps just the fact that it
> is not installed on those nodes?).
> Matt
> On Wed, 16 Jan 2008, 8mj6tc902_at_[hidden] wrote:
>> We also have a mixed myrinet/ip cluster, and maybe I'm missing some
>> nuance of your configuration, but openmpi seems to work fine for me "as
>> is" with no --mca options across mixed nodes (there's a bunch of
>> warnings at the beginning where the non-mx nodes realize they don't have
>> myrinet cards and the mx nodes realize they can't talk mx to the non-mx
>> nodes, but everything completes fine, so I assumed OpenMPI was working
>> things out the transport details on it's own (and was quite pleased
>> about that)).
>> I just did a quick test to confirm that it is in fact still using mx in
>> that situation, and it is. I'm running OpenMPI 1.2.4 and MX 1.2.3.
>> It sounds to me based on those "PML add procs failed" messages that
>> OpenMPI is dying on start up on the non-mx nodes unless you explicitly
>> disable mx at runtime (perhaps because they're expecting the mx library
>> to be there, but it's not?)
>> |openmpi-users/Allow| wrote:
>>> Date: Tue, 15 Jan 2008 10:25:00 -0500 (EST)
>>> From: M D Jones <jonesm_at_[hidden]>
>>> Subject: Re: [OMPI users] mixed myrinet/non-myrinet nodes
>>> To: Open MPI Users <users_at_[hidden]>
>>> Message-ID: <Pine.LNX.4.64.0801151018430.18528_at_[hidden]>
>>> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
>>> Hmm, that combination seems to hang on me - but
>>> '--mca pml ob1 --mca btl ^mx' does indeed do the trick.
>>> Many thanks!
>>> Matt
>>> On Tue, 15 Jan 2008, George Bosilca wrote:
>>>> This case actually works. We run into it few days ago, when we discovered
>>>> that one of the compute nodes in a cluster didn't get his Myrinet card
>>>> installed properly ... The performance were horrible but the application run
>>>> to completion.
>>>> You will have to use the following flags: --mca pml ob1 --mca btl mx,tcp,self

[A dream that comes true can't really be called a dream.]