Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] failure with zero-length Reduce() andbothsbuf=rbuf=NULL
From: Lisandro Dalcin (dalcinl_at_[hidden])
Date: 2010-02-10 11:41:29

On 10 February 2010 11:48, Jeff Squyres <jsquyres_at_[hidden]> wrote:
> On Feb 10, 2010, at 8:40 AM, Lisandro Dalcín wrote:
>> > Note that from a standards perspective, note that MPI_REDUCE *does* require at least one element -- MPI-2.2 p163:34-35:
>> >
>> >   "Each process can provide one element, or a sequence of elements..."
>> Are you really convinced that such sentence means that zero elements
>> is illegal?
> As Bill Gropp would say, "there is no legal and illegal -- there is only what is defined by the spec."  :-)
> The text defines that MPI_REDUCE is supposed to be called with one or more elements.  It does not define what happens when zero elements are used.  It is therefore undefined what happens.  And therefore not portable.  Some MPI's may allow it; some may not.  MPI programmer beware.
>> I have the feeling that this corner case was not taken
>> into account at the time that wording was written (wich dates back to
>> MPI 1.1 standard).
>> Is there a rationale for requiring at least one element? Is this worth
>> a change/clarification in the MPI standard?
> The Forum has been historically resistant to syntactic sugar.  Arguably, you could have a correct program by adding an if statement:
>    if (count > 0) MPI_Reduce(...)
> More specifically: MPI's core functionality revolves around message passing, not providing no-ops.  I feel quite comfortable stating that if you want a no-op, do it in the application (e.g., via an "if" statement).  Put simply: if you don't want a reduction, don't call MPI_REDUCE.
>> > So I think that George's assertion is correct: your test code is incorrect.
>> Well, you have to grant me that a zero-length reduction seems
>> something plausible to test. I still think OMPI is following too
>> strictly the wording "Each process can provide one element". Again,
>> this sentence comes from MPI-1.1 .
> Er... much of the wording in MPI-2.2 comes from MPI-1.0.  :-)  This one sentence is no different than thousands of others.
>> Please, do not take me wrong. If there is an actual issue with
>> zero-length reductions, I want to know about it. Otherwise, I would
>> like to ask you to revisit OMPI behavior. I'm still thinking that
>> there is no good reason for zero-length reductions to invalid
>> operations, they should be just non-op calls.
> You still have to pass a bunch of other stuff to make MPI_REDUCE not cause an MPI exception (such as a valid datatype, etc.).  Why is count>0 any different?
>> > But that's not what is causing your example to fail.  Here's the issue in OMPI's MPI_Reduce:
>> >
>> >        } else if ((ompi_comm_rank(comm) != root && MPI_IN_PLACE == sendbuf) ||
>> >                   (ompi_comm_rank(comm) == root && ((MPI_IN_PLACE == recvbuf) || (sendbuf == recvbuf)))) {
>> >            err = MPI_ERR_ARG;
>> >
>> > The "sendbuf == recvbuf" check is what causes the MPI exception.  I would say that we're not consistent about disallowing that (e.g., such checks are not in MPI_SCAN and the others you cited).
>> Yes, I understand that. But in the case that zero-length reductions
>> were valid, the check should not fall-back there...
> Per my above statements, I don't agree with your implication here.  :-)
> And also remember that OMPI *does* allow zero-length reductions, but only because we were bludgeoned into it.  So there is no "fall-back" to the buffer test -- the buffer test is orthogonal to the count test because we allow count==0.
>> But NULL is a very special case. Using (ptr=NULL,len=0) for
>> zero-length arrays is common out there.
> Let's be clear: the problem is not that your buffers are NULL.  It's the fact that sendbuf==recvbuf in the call to MPI_REDUCE, regardless of whether they are NULL or something else.

OK. I mostly agree/accept all your previous comments.

>> In short, I still think that (sendbuf=NULL,recvbuf=NULL,count=0)
>> should work. Not sure about
>> (sendbuf=(void*)1,recvbuf=(void*)1,count=0) , but I can imagine cases
>> were this would be nice to have (e.g. some dynamic language, or
>> library, or even user code that employs a singleton for zero-length
>> arrays)
> We don't test pointers for any particular value other than named constants (e.g., MPI_IN_PLACE) because any pointer value could point to a valid buffer when paired with an appropriate datatype.
> As such, NULL is *not* a special case.  It's a potentially valid buffer, just like any other value.

Are you assuming here that MPI_BOTTOM do is exactly the same as NULL,
at least in Open MPI?

How can (ptr=NULL,count>0,MPI_INT) or other predefined datatypes be a
valid buffer ? However, with
(ptr=MPI_BOTTOM,count>0,usr-def-datatype), that's other story...

>> Special casing Open MPI in my testsuite to disable these tests is just
>> a matter of adding two lines,  but before that I would like to have
>> some sort of final pronouncement on all this from your side.
> What is the purpose of testing 0-length reductions?

I'm testing zero-length reductions because MPI implementations can
potentially support them. My Python wrappers should support as much
features of the underlying MPI implementation as possible. Then I
should support zero-length reductions if possible.

In Python land (specially when third party extension modules written
in C are involved) and likely other places, a zero-length array is
something not very well defined... Instances could be singletons (then
pointers could alias, because this should not be an issue as the array
length is zero), pointers could be non-NULL and always different (i.e.
what malloc(0) returns in some platforms), or pointer could be NULL
(because that's what malloc(0) returns, of because the implemention
code special-case things by enforcing ptr=NULL,len=0 for zero-length
array instances).

As there are different ways to represent a zero-length array using a
(ptr,len) pair, I tried to make sure by exhaustive testing that all
the possibilities were working... Such testing or corner cases is not
easy in general :-). Some thing fails depending on the MPI
implementation, some other things work but likely by accident. You
see, I'm suffering the usual nightmares of platform/implementation
defined behavior :-( ...

Lisandro Dalcin
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594