Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] About the Open-MPI point-to-point messaging layers
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2012-07-02 17:21:25


On Jun 30, 2012, at 9:46 PM, Sébastien Boisvert wrote:

> I really like Open-MPI and its Modular Component Architecture.
> The --mca parameters are so useful for learning and testing things !

Good!

> So here are my questions.
>
> I know that the default point-to-point messaging layer is ob1
> (the Obi-Wan Kenobi PML). I know that there is also the PML
> cm (the Connor MacLeod PML).
>
> From what I understand, the force is strong with Obi-Wan Kenobi, so he
> can use various byte transfer layers (BTLs).
> And there can be only one highlander (probably Connor MacLeod) so
> when I use the MTL psm, I can not use any of the BTLs because Connor
> MacLeod can only be alone at the end.

Exactly.

> But what about the PML csum ?
>
> What exactly is the PML csum (checksum) doing ?

csum is a clone of ob1 and that adds checksumming as a data check -- it is helpful in some environments where you're not entirely sure if your underlying "reliable" transport may actually be silently corrupting data under the covers.

That being said, I'm not sure how much csum (hahah! Apple Mail keeps autocorrecting that to "scum" :-) ) has kept up with all the recent ob1 advances. So it may actually be lagging a bit. As I understand it, csum will likely not be included in v1.7.

> Which code is the PML csum using for actually transferring stuff between
> MPI ranks ? BTLs or MTLs or something else or nothing ?

BTLs.

> I have searched the web a little but have not found much about it.

It was created by a vendor for a very specific purpose on a very specific network. It hasn't seen much use since then.

> If I use the MTL psm, can the PML csum be used to detect message
> corruption ? I guess the answer is no because csum is not Connor MacLeod.
>
> I have read that when the MTL psm is used, all the Open-MPI BTL objects are
> disabled.

Correct.

> What code would the PML dr use to move bytes around should it
> be stable and production-ready ?

dr was never finished. It was meant to be a fault-tolerant version of ob1. So, sadly, it also didn't keep up with all the changes in ob1 over the years, and was also never finished.

> And my final question:
>
> When a company design a new interconnect, why choose the MTL architecture
> (and thus the PML cm) instead of the BTL architecture (with the ob1 PML) ?

BTLs are relatively easy to write. They work for any old byte-pushing network.

MTLs require a bit more MPI co-design with the network. MTLs are for networks that can either natively perform MPI-style message matching on the network or emulate it well enough (e.g., PSM does it all in software, as does MXM).

> It seems to me that ob1 and BTLs are mature and that BTLs self and sm are quite
> useful and bug-free for what I know. New code should only do the case when the two
> MPI processes are on different nodes, right ?

Correct.

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/