Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Flush CQ error on iWARP/Out-of-sync shutdown
From: Brian W. Barrett (brbarret_at_[hidden])
Date: 2008-05-06 10:57:05

On Tue, 6 May 2008, Jeff Squyres wrote:

> On May 5, 2008, at 6:27 PM, Steve Wise wrote:
>>> There is a larger question regarding why the remote node is still
>>> polling the hca and not shutting down, but my immediate question is
>>> if it is an acceptable fix to simply disregard this "error" if it
>>> is an iWARP adapter.
> If proc B is still polling the hca, it is likely because it simply has
> not yet stopped doing it. I.e., a big problem in MPI implementations
> is that not all actions are exactly synchronous. MPI disconnects are
> *effectively* synchronous, but we probably didn't *guarantee*
> synchronicity in this case because we didn't need it (perhaps until
> now).

Not to mention... The BTL has to be able to handle a shutdown from one
proc while still running its progression engine, as that's a normal
sequence of events when dynamic processes are involved. Because of that,
there wasn't too much care taken to ensure that everyone stopped polling,
then everyone did del_procs.