Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2006-11-02 11:53:35

Adding Craig Rasmussen from LANL into the CC list...

On Oct 31, 2006, at 10:26 AM, Michael Kluskens wrote:

> OpenMPI tickets 39 & 55 deal with problems with the Fortran 90
> large interface with regards to:
> ticket/39>
> #55: MPI_GATHER with arrays of different dimensions <https://
> Attached is a patch to deal with these two issues as applied
> against OpenMPI-1.3a1r12364.

Thanks for the patch! Before committing this, though, I think more
needs to be done and I want to understand it before doing so (part of
this is me thinking it out while I write this e-mail...). Also, be
aware that SC is 1.5 weeks away, so I may not be able to get to
address this issue before then (SC tends to be all-consuming).

1. The "same type" heuristic for the "large" F90 module was not
intended to cover all possible scenarios. You're absolutely right
that assuming the same time makes no sense for some of the
interfaces. The problem is that the obvious alternative (all
possible scenarios) creates an exponential number of interfaces (in
the millions). So "large" was an attempt to provide *some* of the
interfaces -- but [your] experience has shown that this can do more
harm than good (i.e., make some legal MPI applications uncompilable
because we provide *some* interfaces to MPI_GATHER, but not all).

1a. It gets worse because of MPI's semantics for MPI_GATHER. You
pointed out one scenario -- it doesn't make sense to supply "integer"
for both the sendbuf and recvbuf because the root will need an
integer array to receive all the values (similar logic applies to
MPI_SCATTER and other collectives -- so what you did for MPI_GATHER
would need to be applied to several others as well).

1b. But even worse than that is the fact that, for MPI_GATHER, the
receive buffer is not relevant on non-root processes. So it's valid
for *any* type to be passed for non-root processes (leading to the
exponential interface explosion described above).

So having *some* interfaces for MPI_GATHER can be a problem for both
1a and 1b -- perfectly valid/legal MPI apps will fail to compile.

I'm not sure what the right balance is here -- how do we allow for
both 1a and 1b without creating millions of interfaces? Your patch
created MPI_GATHER interfaces for all the same types, but allowing
any dimension mix. With the default max dimension level of 4 in
OMPI's interfaces, this created 90 new interfaces for MPI_GATHER,
calculated (and verified with some grep/wc'ing):

For src buffer of dimension: 0 1 2 3 4
Create this many recvbuf types: 4 + 4 + 3 + 2 + 1 = 14

For each src/recvbuf combination, create this many interfaces:

(char + logical + (integer * 4) + (real * 2) + (complex * 2)) = 10

Where 4, 2, and 2 are the number of integer, real, and complex types
supported by the compiler on my machines (e.g., gfortran on OSX/intel
and Linux/EM64T).

So this created 14 * 10 = 140 interfaces, as opposed to the 50 that
were there before the patch (5 dimensions of src/recvbuf * 10 types =
50), resulting in 90 new interfaces.

This effort will need to be duplicated by several other collectives:

- allgather, allgatherv
- alltoall, alltoallv, alltoallw
- gather, gatherv
- scatter, scatterv

So an increase of 9 * 90 = 810 new interfaces. Not too bad,
considering the alternative (exponential). But consider that the
"large" interface only has (by my count via egrep/wc) 4013
interfaces. This would be increasing its size by about 20%. This is
certainly not a show-stopper, but something to consider.

Note that if you go higher than OMPI's default 4 dimensions, the
number of new interfaces gets considerably larger (e.g., for 7
dimensions you get 35 send/recv type combinations instead of 14, so
(35 * 10 * 9) = 3150 total interfaces (just for the collectives), if
I did my math right.

2. You also identified another scenario that needs to be fixed --
support for MPI_IN_PLACE in certain collectives (MPI_REDUCE is not
the only collective that supports it). It doesn't seem to be a Good
Idea to allow the INTEGER type to be mixed with any other type for
send/recvbuf combinations, just to allow MPI_IN_PLACE. This
potentially adds in send/recvbuf signatures that we want to disallow
(even though they are potentially valid MPI applications!) -- e.g.,
INTEGER and FLOAT. What if a user accidentally supplied an INTEGER
for the sendbuf that wasn't MPI_IN_PLACE? That's what the type
system is supposed to be preventing.

I don't know enough about the type system of F90, but it strikes me
that we should be able to create a unique type for MPI_IN_PLACE
(don't know why I didn't think of this before for some of the MPI
sentinel values... :-\ ) and therefore have a safe mechanism for this
sentinel value.

This would add 10 interfaces for every function that supports
MPI_IN_PLACE; a pretty small increase.

This same technique should probably be applied to some of the other
sentinel values, such as MPI_ARGVS_NULL and MPI_STATUSES_IGNORE.


All that being said, what does it mean?

I think #2 is easily enough fixed (just require the time to do so),
and has minimal impact on the number of interfaces. Implementing MPI
sentinel values with unique types also makes user apps that much more
safe (i.e., they won't accidentally pass in an incorrect type that
would be mistaken -- by the interface -- for a valid signature).

#1 is still a problem. No matter how we slice it, we're going to
leave out valid combinations of send/recv buffers that will prevent
potentially legal MPI applications from compiling. This is as
opposed to not having F90 interfaces for the 2-choice-buffer
functions at all, which would mean that F90 apps using MPI_GATHER
(for example) would simply fall back to the F77 interfaces where no
type checking is done. End result: all MPI F90 apps can compile.

Simply put, with the trivial, small, and medium module sizes, all
valid MPI F90 applications can compile and run. With the large size,
unless we do the exponential interface explosion, we will be
potentially excluding some legal MPI F90 applications -- they *will
not be able to compile* (without workarounds). This is what I meant
by ticket 55's title "F90 "large" interface may not entirely make

So there are multiple options here:

1. Keep chasing a "good" definition of "large" such that most/all
current MPI F90 apps can compile. The problem is that this target
can change over time, and keep requiring maintenance.

2. Stop pursuing "large" because of the problems mentioned above.
This has the potential problem of not providing type safety to F90
MPI apps for the MPI collective interfaces, but at least all apps can
compile, and there's only a small number of 2-choice-buffer functions
that do not get the type safety from F90 (i.e., several MPI
collective functions).

3. Start implementing the proposed F03 MPI interfaces that don't have
the same problems as the F90 MPI interfaces.

I have to admit that I'm leaning more towards #2 (and I wish that
someone who has the time would do #3!) and discarding #1...


Jeff Squyres
Server Virtualization Business Unit
Cisco Systems