Using the --mca btl ^mx totally prevents use of mx interface. So
everybody uses tcp (even mx capable nodes). If you want a mixed
configuration you have to enforce use of the ob1 pml, but let the mx
btl be used where it is suitable (it will be disabled at runtime if it
can't run). You're problem is not solved yet.
Le 15 janv. 08 à 10:25, M D Jones a écrit :
> Hmm, that combination seems to hang on me - but
> '--mca pml ob1 --mca btl ^mx' does indeed do the trick.
> Many thanks!
> On Tue, 15 Jan 2008, George Bosilca wrote:
>> This case actually works. We run into it few days ago, when we
>> that one of the compute nodes in a cluster didn't get his Myrinet
>> installed properly ... The performance were horrible but the
>> application run
>> to completion.
>> You will have to use the following flags: --mca pml ob1 --mca btl
>> On Jan 15, 2008, at 8:49 AM, M Jones wrote:
>>> We have a mixed environment in which roughly 2/3 of the nodes
>>> in our cluster have myrinet (mx 1.2.1), while the full cluster has
>>> gigE. Running open-mpi exclusively on myrinet nodes or exclusively
>>> on non-myrinet nodes is fine, but mixing the two nodes types
>>> results in a runtime error (PML add procs failed), no matter what
>>> flags I try to use to push the traffic onto tcp (note that
>>> --mca mtl ^mx --mca btl ^mx does appear to use tcp, as long as all
>>> of the nodes have myrinet cards, but not in the mixed case).
>>> I thought that we would be able to use a single open-mpi build to
>>> support both networks (and users would be able to request mx nodes
>>> they need them using the batch queuing system, which they are
>>> already accustomed to). Am I missing something (or just doing
>>> something dumb)? Compiling mpi implementations for each compiler
>>> is bad enough, add in separate builds for networks and it just gets
>>> worse ...
>>> users mailing list
> users mailing list