Open MPI logo

MTT Devel Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all MTT Devel mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-10-17 10:42:51

On Oct 17, 2007, at 9:31 AM, Ethan Mallove wrote:

>> Either or both of those would be fine (don't we have a timeout in
>> already?).
> There is a timeout in DoCommand, but currently I keep
> reinvoking DoCommand on each "interrupted system call" so
> the timeout gets reset each time. This would not be the case
> if the do-while were to go in DoCommand.

Ah -- I see what you're saying. Good point -- I agree.

> Then again, an
> infinite loop is certain in the case of a command that is
> *expected* to output "interrupted system call".

But only if that command *always* output "interrupted system call".
So yes, I'm a bit paranoid about an unlikely corner case. But we
might as well handle it in the off-chance that it happens (and output
a noisy error message so that you can tell if it happened, because
that likely means that something is wrong with your cluster

And bang on your OS guys to fix the real problem while you're at
it. ;-)

Jeff Squyres
Cisco Systems