We also have a mixed myrinet/ip cluster, and maybe I'm missing some
nuance of your configuration, but openmpi seems to work fine for me "as
is" with no --mca options across mixed nodes (there's a bunch of
warnings at the beginning where the non-mx nodes realize they don't have
myrinet cards and the mx nodes realize they can't talk mx to the non-mx
nodes, but everything completes fine, so I assumed OpenMPI was working
things out the transport details on it's own (and was quite pleased
I just did a quick test to confirm that it is in fact still using mx in
that situation, and it is. I'm running OpenMPI 1.2.4 and MX 1.2.3.
It sounds to me based on those "PML add procs failed" messages that
OpenMPI is dying on start up on the non-mx nodes unless you explicitly
disable mx at runtime (perhaps because they're expecting the mx library
to be there, but it's not?)
users-request-at-open-mpi.org |openmpi-users/Allow| wrote:
> Date: Tue, 15 Jan 2008 10:25:00 -0500 (EST)
> From: M D Jones <jonesm_at_[hidden]>
> Subject: Re: [OMPI users] mixed myrinet/non-myrinet nodes
> To: Open MPI Users <users_at_[hidden]>
> Message-ID: <Pine.LNX.4.64.0801151018430.18528_at_[hidden]>
> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
> Hmm, that combination seems to hang on me - but
> '--mca pml ob1 --mca btl ^mx' does indeed do the trick.
> Many thanks!
> On Tue, 15 Jan 2008, George Bosilca wrote:
>> This case actually works. We run into it few days ago, when we discovered
>> that one of the compute nodes in a cluster didn't get his Myrinet card
>> installed properly ... The performance were horrible but the application run
>> to completion.
>> You will have to use the following flags: --mca pml ob1 --mca btl mx,tcp,self
[A dream that comes true can't really be called a dream.]