Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Michael Kluskens (mklus_at_[hidden])
Date: 2006-11-02 15:18:28


On Nov 2, 2006, at 11:53 AM, Jeff Squyres wrote:

> Adding Craig Rasmussen from LANL into the CC list...
>
> On Oct 31, 2006, at 10:26 AM, Michael Kluskens wrote:
>
>> OpenMPI tickets 39 & 55 deal with problems with the Fortran 90
>> large interface with regards to:
>>
>> #39: MPI_IN_PLACE in MPI_REDUCE <https://svn.open-mpi.org/trac/ompi/
>> ticket/39>
>> #55: MPI_GATHER with arrays of different dimensions <https://
>> svn.open-mpi.org/trac/ompi/ticket/55>
>>
>> Attached is a patch to deal with these two issues as applied
>> against OpenMPI-1.3a1r12364.
>
> Thanks for the patch! Before committing this, though, I think more
> needs to be done and I want to understand it before doing so (part of
> this is me thinking it out while I write this e-mail...). Also, be
> aware that SC is 1.5 weeks away, so I may not be able to get to
> address this issue before then (SC tends to be all-consuming).

Understood, just didn't wish to see this die or get worse.

> 1. The "same type" heuristic for the "large" F90 module was not
> intended to cover all possible scenarios. You're absolutely right
> that assuming the same
dimension (sp)
> makes no sense for some of the
> interfaces. The problem is that the obvious alternative (all
> possible scenarios) creates an exponential number of interfaces (in
> the millions).

I think it can be limited by including reasonable scenarios. As is
it's not very useful but as is it at least can be patched by the end-
builder.

> So "large" was an attempt to provide *some* of the
> interfaces -- but [your] experience has shown that this can do more
> harm than good (i.e., make some legal MPI applications uncompilable
> because we provide *some* interfaces to MPI_GATHER, but not all).

This is a serious issue in my opinion. I suspect that virtually
every use of MPI_GATHER and the others would fail with the large
interfaces as is, there by making sure no one would be able to use
the large interfaces on a multiuser system.

> 1a. It gets worse because of MPI's semantics for MPI_GATHER. You
> pointed out one scenario -- it doesn't make sense to supply "integer"
> for both the sendbuf and recvbuf because the root will need an
> integer array to receive all the values (similar logic applies to
> MPI_SCATTER and other collectives -- so what you did for MPI_GATHER
> would need to be applied to several others as well).

Agreed. I limited my patch to that which I could test with working
code and could justify work time wise.

> 1b. But even worse than that is the fact that, for MPI_GATHER, the
> receive buffer is not relevant on non-root processes. So it's valid
> for *any* type to be passed for non-root processes (leading to the
> exponential interface explosion described above).

I would consider this to be very bad programming practice and not a
good idea to support in the large interface regardless of the cost.

One issue is that derived datatypes will never (?) work with the
large interfaces, for that matter I would guess that derived
datatypes probably don't work with medium and possibly small
interfaces. I don't know if there is away around that issue at all
in F90/F95, some places may have to do two installations. I don't
think giving up on all interfaces that conflict with derived types is
a good solution.

> So having *some* interfaces for MPI_GATHER can be a problem for both
> 1a and 1b -- perfectly valid/legal MPI apps will fail to compile.
>
> I'm not sure what the right balance is here -- how do we allow for
> both 1a and 1b without creating millions of interfaces? Your patch
> created MPI_GATHER interfaces for all the same types, but allowing
> any dimension mix. With the default max dimension level of 4 in
> OMPI's interfaces, this created 90 new interfaces for MPI_GATHER,
> calculated (and verified with some grep/wc'ing):
>
> For src buffer of dimension: 0 1 2 3 4
> Create this many recvbuf types: 4 + 4 + 3 + 2 + 1 = 14

An alternative would be to allow same and one less dimension for
large (called dim+1 below), and make all dimensions be optional some
way. I know that having these extra interfaces allowed me to find
serious oversights on my part by permitting me to compile with the
large interfaces.

> For each src/recvbuf combination, create this many interfaces:
>
> (char + logical + (integer * 4) + (real * 2) + (complex * 2)) = 10
>
> Where 4, 2, and 2 are the number of integer, real, and complex types
> supported by the compiler on my machines (e.g., gfortran on OSX/intel
> and Linux/EM64T).
>
> So this created 14 * 10 = 140 interfaces, as opposed to the 50 that
> were there before the patch (5 dimensions of src/recvbuf * 10 types =
> 50), resulting in 90 new interfaces.
>
> This effort will need to be duplicated by several other collectives:
>
> - allgather, allgatherv
> - alltoall, alltoallv, alltoallw
> - gather, gatherv
> - scatter, scatterv
>
> So an increase of 9 * 90 = 810 new interfaces. Not too bad,
> considering the alternative (exponential). But consider that the
> "large" interface only has (by my count via egrep/wc) 4013
> interfaces. This would be increasing its size by about 20%. This is
> certainly not a show-stopper, but something to consider.

Without some increase (all or dim+1) I suspect large interfaces will
be useless for anyone (or any site) accessing one of these 10
routines anywhere in their program.

> Note that if you go higher than OMPI's default 4 dimensions, the
> number of new interfaces gets considerably larger (e.g., for 7
> dimensions you get 35 send/recv type combinations instead of 14, so
> (35 * 10 * 9) = 3150 total interfaces (just for the collectives), if
> I did my math right.
>
> 2. You also identified another scenario that needs to be fixed --
> support for MPI_IN_PLACE in certain collectives (MPI_REDUCE is not
> the only collective that supports it). It doesn't seem to be a Good
> Idea to allow the INTEGER type to be mixed with any other type for
> send/recvbuf combinations, just to allow MPI_IN_PLACE. This
> potentially adds in send/recvbuf signatures that we want to disallow
> (even though they are potentially valid MPI applications!) -- e.g.,
> INTEGER and FLOAT. What if a user accidentally supplied an INTEGER
> for the sendbuf that wasn't MPI_IN_PLACE? That's what the type
> system is supposed to be preventing.
>
> I don't know enough about the type system of F90, but it strikes me
> that we should be able to create a unique type for MPI_IN_PLACE
> (don't know why I didn't think of this before for some of the MPI
> sentinel values... :-\ ) and therefore have a safe mechanism for this
> sentinel value.

This would be very good approach, allowing large interfaces to be
used with MPI_IN_PLACE but preventing this alternative error. That's
a bit more complicated then I'm ready to patch myself.

> This would add 10 interfaces for every function that supports
> MPI_IN_PLACE; a pretty small increase.
>
> This same technique should probably be applied to some of the other
> sentinel values, such as MPI_ARGVS_NULL and MPI_STATUSES_IGNORE.

I agree on that as well, but don't have experience using these to
understand all their issues.

> ---------------
>
> All that being said, what does it mean?
>
> I think #2 is easily enough fixed (just require the time to do so),
> and has minimal impact on the number of interfaces. Implementing MPI
> sentinel values with unique types also makes user apps that much more
> safe (i.e., they won't accidentally pass in an incorrect type that
> would be mistaken -- by the interface -- for a valid signature).

Or pass the sentinel values into places they should not be passed.

> #1 is still a problem. No matter how we slice it, we're going to
> leave out valid combinations of send/recv buffers that will prevent
> potentially legal MPI applications from compiling. This is as
> opposed to not having F90 interfaces for the 2-choice-buffer
> functions at all, which would mean that F90 apps using MPI_GATHER
> (for example) would simply fall back to the F77 interfaces where no
> type checking is done. End result: all MPI F90 apps can compile.
>
> Simply put, with the trivial, small, and medium module sizes, all
> valid MPI F90 applications can compile and run.

Well maybe not as I point out above with derived types, again not a
reason to ditch interfaces completely, they do more good then harm.

> With the large size,
> unless we do the exponential interface explosion, we will be
> potentially excluding some legal MPI F90 applications -- they *will
> not be able to compile* (without workarounds). This is what I meant
> by ticket 55's title "F90 "large" interface may not entirely make
> sense".
>
> So there are multiple options here:
>
> 1. Keep chasing a "good" definition of "large" such that most/all
> current MPI F90 apps can compile. The problem is that this target
> can change over time, and keep requiring maintenance.
>
> 2. Stop pursuing "large" because of the problems mentioned above.
> This has the potential problem of not providing type safety to F90
> MPI apps for the MPI collective interfaces, but at least all apps can
> compile, and there's only a small number of 2-choice-buffer functions
> that do not get the type safety from F90 (i.e., several MPI
> collective functions).
>
> 3. Start implementing the proposed F03 MPI interfaces that don't have
> the same problems as the F90 MPI interfaces.
>
> I have to admit that I'm leaning more towards #2 (and I wish that
> someone who has the time would do #3!) and discarding #1...

I dislike #2 intensely because then I and others couldn't at least
patch the interface scripts before building OpenMPI.

#1 is preferred and just give the users/builders clear notice they
may not cover everything and perhaps a hint as to what directory has
the files to be patched to extend the large interface a bit further.

#3 would be nice but I don't see enough F03 support in enough
compilers at this time. I don't even have a book on the F03 changes
and I program Fortran most of the day virtually every weekday. It
took our group till about 2000 to start using Fortran 90 and almost
everything we do is in Fortran.

Michael