In my example, each sender task 1 to n-1 will have one rendezvous message
to task 0 at a time. The MPI standard suggests descriptors be small enough
and there be enough descriptor space for reasonable programs . The
standard is clear that unreasonable programs can run out of space and fail.
The standard does not try to quantify reasonableness.
This gets really interesting when we talk about hundreds of thousands of
tasks. If on a general purpose MPI there are 16 tasks and task 0 cannot
hold 1 envelop from each of the other 15, it is probably a poor quality
MPI. If there are a million tasks and task 0 can only hold 100,000
envelops then it is fair to argue that holding 100,000 evelopes is generous
and the million task job is not being reasonable. This little example
could be reasonable for small task counts and unreasonable for huge task
If there are 2 tasks and and the single sender posts 15 MPI_ISENDs to task
0, a quality MPI should probably handle that too. If the sender tries to
post a million MPI_ISENDs and either sender or receiver run out of
descriptor space after 100,000 it is again fair to fail the job and argue
the program is not being reasonable. The line between reasonable and
unreasonable application behavior is not a bright, sharp line.
A big part of my reason for prodding this is that I think it is bettter to
have the MPI Forum discuss changes to the standard than to have MPI
implementors deciding what parts to ignore. If the MPI Forum does bless a
mode that allows my example to crash, IBM MPI will support that mode and
some of our users will chose to run in that mode. If their applications
are "well structured" in certain specific ways they will never have a
problem with early arrival oveflow.
If the standard is unclear then this is the time to make it clear.
Dick Treumann - MPI Team/TCEM
IBM Systems & Technology Group
Dept 0lva / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846 Fax (845) 433-8363
users-bounces_at_[hidden] wrote on 02/04/2008 02:03:20 PM:
> On Mon, Feb 04, 2008 at 09:08:45AM -0500, Richard Treumann wrote:
> > To me, the MPI standard is clear that a program like this:
> > task 0:
> > MPI_Init
> > sleep(3000);
> > start receiving messages
> > each of tasks 1 to n-1:
> > MPI_Init
> > loop 5000 times
> > MPI_Send(small message to 0)
> > end loop
> > May send some small messages eagerly if there is space at task 0 but
> > block each task 1 to n-1 before allowing task 0 to run out of eager
> > space. Doing this requires a token or credit management system in
> > each task has credits for known buffer space at task 0. Each task will
> > eagerly to task 0 until the sender runs out of credits and then must
> > to rendezvous protocol.
> And rendezvous messages are not free either. So this approach will only
> postpone failure a little bit.
> users mailing list