Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] openmpi credits for eager messages
From: Brightwell, Ronald (rbbrigh_at_[hidden])
Date: 2008-02-04 14:39:43

> > Not to muddy the point, but if there's enough ambiguity in the Standard
> > for people to ignore the progress rule, then I think (hope) there's enough
> > ambiguity for people to ignore the sender throttling issue too ;)
> I understand your position, and I used to agree until I was forced to
> change my mind by naive users :-)

Right. That's what I meant by:

  "Most of the vendors aren't allowed to have this perspective....".

> Poorly written MPI codes won't likely segfault or deadlock because the
> progress rule was ignored. However, users will proudly tell you that you
> have a memory leak if you don't limit the size of the unexpected queue
> and their codes with no flow control blow up.

Yep. I don't lose money when I tell these people to go fix their code. I like
to think that I actually get paid to tell these people to go fix their code....

> You don't have to make it very efficient (per-sender credits
> definitively does not scale), but you need to have a way to stall/slow
> the sender when the unexpected queue gets too big. That's quite easy to
> do without affecting the common case.

Not on my network. I don't have the nice situation that the Standard refers
to where one producer is overwhelming the consumer. For a reasonable number
of endpoints and a known offending sender, it's pretty straightforward to
do a user-level credit-based flow control.

I'm looking at a network where the number of endpoints is large enough that
everybody can't have a credit to start with, and the "offender" isn't any
single process, but rather a combination of processes doing N-to-1 where N
is sufficiently large. I can't just tell one process to slow down. I have
to tell them all to slow down and do it quickly...