To keep this out of the weeds, I have attached a program called "bug3"
that illustrates this problem on openmpi 1.2.5 using the openib BTL. In
bug3 process with rank 0 uses all available memory buffering
"unexpected" messages from its neighbors.
Bug3 is a test-case derived from a real, scalable application (desmond
for molecular dynamics) that several experienced MPI developers have
worked on. Note the MPI_Send calls of processes N>0 are *blocking*; the
openmpi silently sends them in the background and overwhelms process 0
due to lack of flow control.
It may not be hard to change desmond to work around openmpi's small
message semantics, but a programmer should reasonably be allowed to
think a blocking send will block if the receiver cannot handle it yet.
From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On
Behalf Of Brightwell, Ronald
Sent: Monday, February 04, 2008 3:30 PM
To: Patrick Geoffray
Cc: Open MPI Users
Subject: Re: [OMPI users] openmpi credits for eager messages
> > I'm looking at a network where the number of endpoints is large
> > everybody can't have a credit to start with, and the "offender"
> > single process, but rather a combination of processes doing N-to-1
> > is sufficiently large. I can't just tell one process to slow down.
> > to tell them all to slow down and do it quickly...
> When you have N->1 patterns, then the hardware flow-control will
> throttle the senders, or drop packets if there is no hardware
> flow-control. If you don't have HOL blocking but the receiver does not
> consume for any reasons (busy, sleeping, dead, whatever), then you can
> still drop packets on the receiver (NIC, driver, thread) at a last
> resort, this is what TCP does. The key is have exponential backoff (or
> reasonably large resend timeout) to no continue the hammering.
> It costs nothing in the common case (unlike the credits approach), but
> it does handle corner cases without affecting too much other nodes
> (unlike hardware flow-control).
Right. For a sufficiently large number of endpoints, flow control has
pushed out of MPI and down into the network, which is why I don't
want an MPI that does flow control at the user-level.
> But you know all that. You are just being mean to your users because
> can :-) The sick part is that I think I envy you...
You know it :)
users mailing list
- application/octet-stream attachment: bug3.c