Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Amateur Guidance
From: Timothy Hayes (hayesti_at_[hidden])
Date: 2008-11-07 11:41:33


Hi everyone,

Thank you all for your replies. I've now read those additional papers and
went through the slides of the Open MPI workshop. I'm still a bit hazy on
the architecture of Open MPI (especially relevant to my project) so what
I've done is written what I think I understand about process to process
communication. I have a few specific questions, but maybe you could point me
in the right direction if I'm way off or maybe expand on areas where I'm a
little vague.

http://macneill.cs.tcd.ie/~hayesti/ompi.jpg>

N.B. The XEN component in the BTL layer represents what I'm trying to make.

When mpirun() is invoked, the following takes place

1. An out of band TCP channel is established between the process and
every other process. This is located in the ORTE (Open Runtime Environment)
-> MCA (Modular Component Architecture) -> OOB (Out of Band) -> TCP.

2. A PML (Point-to-Point Management Layer) is created, defaulting to
'ob1' which can handle multiple communication interfaces simultaneously.
This is located in OMPI (Open MPI) -> MCA (Modular Component Architecture)
-> PML (Point-to-Point Management Layer) -> ob1

3. 'ob1' attempts to set up one or more BTLs (Byte Transport Layer)
components. These components are for establishing a point of contact with
another process for data transfer. Examples include loopback for itself,
shared memory for inter-process communication, TCP/IP for processes located
on separate machines. There exist specialist components like infinibands
should hardware and infrastructure become available.

4. Each component is cohesive and is responsible for finding the
availability of resources specific to its own operation. Each component will
return zero, one or many module instances depending on circumstance.

5. The out of band TCP channel is then used to communicate each
process' instantiated modules to every other process.

Questions with regard to the above

Is the OOB channel permanent for the duration of mpirun()?

I've read in places that the functions modex_send() & modex_recv() are used
to communicate on the OOB channel, but I see mca_oob_tcp_send and
mca_oob_tcp_recv declared in the header file. Is modex something else?

What exactly is queried and returned when a BTL component creates modules.
For example, if I run 4 MPI processes on the same machine, will the sm
component return 1 sm module to communicate with each other process or 3 sm
modules to communicate with 1 distinct module?

Once again, those 5 points are really sparse and they're sparse because I
don't know the detail myself. If anyone could shed some light on the process
I would be really grateful.

Kind regards

Tim Hayes

2008/11/3 Jeff Squyres <jsquyres_at_[hidden]>

> On Nov 3, 2008, at 10:39 AM, Eugene Loh wrote:
>
> Main answer: no great docs to look at. I think I've asked some OMPI
>> experts and that was basically the answer they gave me.
>>
>
> This is unfortunately the current state of the art -- no one has had time
> to write up good docs.
>
> Galen pointed to the new papers -- our main PML these days is "ob1" (teg
> died a long time ago).
>
> PML = Point to point messaging layer; it's basically the layer that is
> right behind MPI_SEND and friends.
>
> The ob1 PML uses BTL modules underneath. BTL = Byte transfer layer;
> individual modules that send bytes back and forth over individual transports
> (e.g., shared memory, TCP, openfabrics, etc.). There's a BTL for each of
> the major transports that we support. The protocols that ob1 uses are
> described nicely in the papers that Galen sent, but the specific function
> interfaces are only best described in ompi/mca/btl/btl.h.
>
> Alternatively, we have a "cm" PML which uses MTL modules underneath. MTL =
> Matching transport layer; it's basically for transports that expose very
> MPI-like interfaces (e.g., elan, tports, PSM, portals, MX). This cm
> component is extremely thin; it basically provides a shim between Open MPI
> and the underlying transport.
>
> The big difference between cm and ob1 is that ob1 is a progress engine that
> tracks multiple transport interfaces (e.g., shared memory, tcp, openfabrics,
> ...etc. -- and therefore potentially multiple BTL module instances) and cm
> is a thin shim that simply translates between OMPI and the back-end
> interface -- cm will only use *ONE* MTL module instance. Specifically: it
> is expected that the one MTL module will do all the progression, striping,
> ...or whatever it wants to do to move bytes from A to B by itself (very
> little/no help at all from OMPI's infrastructure).
>
> Does that help some?
>
> --
> Jeff Squyres
> Cisco Systems
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
>
http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>