Open MPI logo

MTT Devel Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all MTT Devel mailing list

From: Ethan Mallove (ethan.mallove_at_[hidden])
Date: 2007-10-17 09:31:09


On Wed, Oct/17/2007 07:45:53AM, Jeff Squyres wrote:
> On Oct 16, 2007, at 6:36 PM, Ethan Mallove wrote:
>
> >>> The bail is that "make" will eventually succeed or fail
> >>> with something other than "interrupted system call". Do
> >>> we need another condition?
> >>
> >> I'm just worried that Sun's NFS will somehow get in a
> >> perpetual "interrupted system call" loop such that you'll
> >> never actually break out of it.
> >
> > How about a counter? E.g., after "x" number of "interrupted
> > system call" messages, MTT moves on. Or should the "command
> > restart" go in DoCommand.pm so we can have a timeout?
>
> Either or both of those would be fine (don't we have a timeout in
> DoCommand.pm already?).

There is a timeout in DoCommand, but currently I keep
reinvoking DoCommand on each "interrupted system call" so
the timeout gets reset each time. This would not be the case
if the do-while were to go in DoCommand. Then again, an
infinite loop is certain in the case of a command that is
*expected* to output "interrupted system call".

-Ethan

>
> > I also noticed that our build script (which prints hundreds
> > of "interrupted system call" messages per build, but does
> > not seem to die because of them) uses "make -j 24" while MTT
> > has been using "make -j 4". I'll experiment with -j.
>
> I know that Terry/Sun and co. spent a good amount of time trying to
> solve the "interrupted system call" error -- they may have some more
> information for you, such as how/why it happens...?
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> mtt-devel mailing list
> mtt-devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel