On Tue, 6 May 2008, Jeff Squyres wrote:
> On May 5, 2008, at 6:27 PM, Steve Wise wrote:
>
>>> There is a larger question regarding why the remote node is still
>>> polling the hca and not shutting down, but my immediate question is
>>> if it is an acceptable fix to simply disregard this "error" if it
>>> is an iWARP adapter.
>
> If proc B is still polling the hca, it is likely because it simply has
> not yet stopped doing it. I.e., a big problem in MPI implementations
> is that not all actions are exactly synchronous. MPI disconnects are
> *effectively* synchronous, but we probably didn't *guarantee*
> synchronicity in this case because we didn't need it (perhaps until
> now).
Not to mention... The BTL has to be able to handle a shutdown from one
proc while still running its progression engine, as that's a normal
sequence of events when dynamic processes are involved. Because of that,
there wasn't too much care taken to ensure that everyone stopped polling,
then everyone did del_procs.
Brian
|