Hmm, that combination seems to hang on me - but
'--mca pml ob1 --mca btl ^mx' does indeed do the trick.
On Tue, 15 Jan 2008, George Bosilca wrote:
> This case actually works. We run into it few days ago, when we discovered
> that one of the compute nodes in a cluster didn't get his Myrinet card
> installed properly ... The performance were horrible but the application run
> to completion.
> You will have to use the following flags: --mca pml ob1 --mca btl mx,tcp,self
> On Jan 15, 2008, at 8:49 AM, M Jones wrote:
>> We have a mixed environment in which roughly 2/3 of the nodes
>> in our cluster have myrinet (mx 1.2.1), while the full cluster has
>> gigE. Running open-mpi exclusively on myrinet nodes or exclusively
>> on non-myrinet nodes is fine, but mixing the two nodes types
>> results in a runtime error (PML add procs failed), no matter what --mca
>> flags I try to use to push the traffic onto tcp (note that
>> --mca mtl ^mx --mca btl ^mx does appear to use tcp, as long as all
>> of the nodes have myrinet cards, but not in the mixed case).
>> I thought that we would be able to use a single open-mpi build to
>> support both networks (and users would be able to request mx nodes if
>> they need them using the batch queuing system, which they are
>> already accustomed to). Am I missing something (or just doing
>> something dumb)? Compiling mpi implementations for each compiler suite
>> is bad enough, add in separate builds for networks and it just gets
>> worse ...
>> users mailing list