Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Need help running jobs across different IB vendors
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2013-10-15 16:16:54


Short version:
--------------

What you really want is:

    mpirun --mca pml ob1 ...

The "--mca mtl ^psm" way will get the same result, but forcing pml=ob1 is really a slightly Better solution (from a semantic perspective)

More detail:
------------

Similarly, there's actually 3 different PMLs (PML = point-to-point message layer -- it's the layer that effects MPI point-to-point semantics, and drives an underlying transport layer). Here's a section from the README:

- There are three MPI network models available: "ob1", "csum", and
  "cm". "ob1" and "csum" use BTL ("Byte Transfer Layer") components
  for each supported network. "cm" uses MTL ("Matching Transport
  Layer") components for each supported network.

  - "ob1" supports a variety of networks that can be used in
    combination with each other (per OS constraints; e.g., there are
    reports that the GM and OpenFabrics kernel drivers do not operate
    well together):

    - OpenFabrics: InfiniBand, iWARP, and RoCE
    - Loopback (send-to-self)
    - Myrinet MX and Open-MX
    - Portals
    - Quadrics Elan
    - Shared memory
    - TCP
    - SCTP
    - uDAPL
    - Windows Verbs

  - "csum" is exactly the same as "ob1", except that it performs
    additional data integrity checks to ensure that the received data
    is intact (vs. trusting the underlying network to deliver the data
    correctly). csum supports all the same networks as ob1, but there
    is a performance penalty for the additional integrity checks.

  - "cm" supports a smaller number of networks (and they cannot be
    used together), but may provide better better overall MPI
    performance:

    - Myrinet MX and Open-MX
    - InfiniPath PSM
    - Mellanox MXM
    - Portals

    Open MPI will, by default, choose to use "cm" when the InfiniPath
    PSM or the Mellanox MXM MTL can be used. Otherwise, "ob1" will be
    used and the corresponding BTLs will be selected. "csum" will never
    be selected by default. Users can force the use of ob1 or cm if
    desired by setting the "pml" MCA parameter at run-time:

      shell$ mpirun --mca pml ob1 ...
      or
      shell$ mpirun --mca pml csum ...
      or
      shell$ mpirun --mca pml cm ...

This means that: if you force ob1 (or csum), then only BTLs will be used. If you force cm, then only MTLs will be used. If you don't specify which PML to use, then OMPI will prefer cm/MTLs (if it finds any available MTLs) over ob1/BTLs.

On Oct 15, 2013, at 12:38 PM, Kevin M. Hildebrand <kevin_at_[hidden]> wrote:

> Ahhh, that's the piece I was missing. I've been trying to debug everything I could think of related to 'btl', and was completely unaware that 'mtl' was also a transport.
>
> If I run a job using --mca mtl ^psm, it does indeed run properly across all of my nodes. (Whether or not that's the 'right' thing to do is yet to be determined.)
>
> Thanks for your help!
>
> Kevin
>
>
> -----Original Message-----
> From: users [mailto:users-bounces_at_[hidden]] On Behalf Of Dave Love
> Sent: Tuesday, October 15, 2013 10:16 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] Need help running jobs across different IB vendors
>
> "Kevin M. Hildebrand" <kevin_at_[hidden]> writes:
>
>> Hi, I'm trying to run an OpenMPI 1.6.5 job across a set of nodes, some
>> with Mellanox cards and some with Qlogic cards.
>
> Maybe you shouldn't... (I'm blessed in one cluster with three somewhat
> incompatible types of QLogic card and a set of Mellanox ones, but
> they're in separate islands, apart from the two different SDR ones.)
>
>> I'm getting errors indicating "At least one pair of MPI processes are unable to reach each other for MPI communications". As far as I can tell all of the nodes are properly configured and able to reach each other, via IP and non-IP connections.
>> I've also discovered that even if I turn off the IB transport via "--mca btl tcp,self" I'm still getting the same issue.
>> The test works fine if I run it confined to hosts with identical IB cards.
>> I'd appreciate some assistance in figuring out what I'm doing wrong.
>
> I assume the QLogic cards are using PSM. You'd need to force them to
> use openib with something like --mca mtl ^psm and make sure they have
> the ipathverbs library available. You probably won't like the resulting
> performance -- users here noticed when one set fell back to openib from
> psm recently.
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/