In my example, each sender task 1 to n-1 will have one rendezvous message to task 0 at a time. The MPI standard suggests descriptors be small enough and there be enough descriptor space for reasonable programs . The standard is clear that unreasonable programs can run out of space and fail. The standard does not try to quantify reasonableness.

This gets really interesting when we talk about hundreds of thousands of tasks. If on a general purpose MPI there are 16 tasks and task 0 cannot hold 1 envelop from each of the other 15, it is probably a poor quality MPI. If there are a million tasks and task 0 can only hold 100,000 envelops then it is fair to argue that holding 100,000 evelopes is generous and the million task job is not being reasonable. This little example could be reasonable for small task counts and unreasonable for huge task counts.

If there are 2 tasks and and the single sender posts 15 MPI_ISENDs to task 0, a quality MPI should probably handle that too. If the sender tries to post a million MPI_ISENDs and either sender or receiver run out of descriptor space after 100,000 it is again fair to fail the job and argue the program is not being reasonable. The line between reasonable and unreasonable application behavior is not a bright, sharp line.

A big part of my reason for prodding this is that I think it is bettter to have the MPI Forum discuss changes to the standard than to have MPI implementors deciding what parts to ignore. If the MPI Forum does bless a mode that allows my example to crash, IBM MPI will support that mode and some of our users will chose to run in that mode. If their applications are "well structured" in certain specific ways they will never have a problem with early arrival oveflow.

If the standard is unclear then this is the time to make it clear.


Dick Treumann - MPI Team/TCEM
IBM Systems & Technology Group
Dept 0lva / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846 Fax (845) 433-8363 wrote on 02/04/2008 02:03:20 PM:

> On Mon, Feb 04, 2008 at 09:08:45AM -0500, Richard Treumann wrote:
> > To me, the MPI standard is clear that a program like this:
> >
> > task 0:
> > MPI_Init
> > sleep(3000);
> > start receiving messages
> >
> > each of tasks 1 to n-1:
> > MPI_Init
> > loop 5000 times
> >    MPI_Send(small message to 0)
> > end loop
> >
> > May send some small messages eagerly if there is space at task 0 but must
> > block each task 1 to  n-1 before allowing task 0 to run out of eager buffer
> > space.  Doing this requires a token or credit management system in which
> > each task has credits for known buffer space at task 0. Each task will send
> > eagerly to task 0 until the sender runs out of credits and then must switch
> > to rendezvous protocol.
> And rendezvous messages are not free either. So this approach will only
> postpone failure a little bit.
> --
>          Gleb.
> _______________________________________________
> users mailing list