Open MPI logo

MTT Devel Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all MTT Devel mailing list

From: Ethan Mallove (ethan.mallove_at_[hidden])
Date: 2007-10-16 18:36:26


On Tue, Oct/16/2007 05:37:18PM, Jeff Squyres wrote:
> On Oct 16, 2007, at 5:23 PM, Ethan Mallove wrote:
>
> > The bail is that "make" will eventually succeed or fail
> > with something other than "interrupted system call". Do
> > we need another condition?
>
> I'm just worried that Sun's NFS will somehow get in a
> perpetual "interrupted system call" loop such that you'll
> never actually break out of it.

How about a counter? E.g., after "x" number of "interrupted
system call" messages, MTT moves on. Or should the "command
restart" go in DoCommand.pm so we can have a timeout?

I also noticed that our build script (which prints hundreds
of "interrupted system call" messages per build, but does
not seem to die because of them) uses "make -j 24" while MTT
has been using "make -j 4". I'll experiment with -j.

-Ethan

>
> > I do not know which system call is getting interrupted, but
> > here's an interesting article on how different Unixes deal
> > with connect() interruptions:
> >
> > http://www.madore.org/~david/computers/connect-intr.html
> >
> > -Ethan
> >
> >
> > On Tue, Oct/16/2007 04:59:29PM, Jeff Squyres wrote:
> >> Ick!
> >>
> >> This is a long-known problem [apparently] with Sun's NFS,
> >> unfortunately. :-(
> >>
> >> I'd be ok with this if there is an eventual bail out of the loop --
> >> the prospect of an infinite loop is a bit scary for me.
> >>
> >>
> >> On Oct 16, 2007, at 11:23 AM, Ethan Mallove wrote:
> >>
> >>> On certain NFS servers, I run into the error message
> >>> "Interrupted system call" when executing long running
> >>> commands such as "make all". One solution I've been able to
> >>> use is to setup an NFS mount point solely for the cluster
> >>> I'm using, but this is not always an option. The below link
> >>> advises to restart the build on "Interrupted system call":
> >>>
> >>> http://developers.sun.com/solaris/articles/parallel_make.html
> >>>
> >>> I wrapped the GNU_Install.pm make commands in a do-while to
> >>> effect the build restarts. E.g.,
> >>>
> >>> do {
> >>> $x = MTT::DoCommand::Cmd("make install")
> >>> } while (!MTT::DoCommand::wsuccess($x->{exit_status}) and ($x->
> >>> {result_stderr} =~ /interrupted system call/i));
> >>>
> >>> As long as make emits "interrupted system call" and fails,
> >>> MTT will keep restarting make.
> >>>
> >>> I realize this is ugly, but is it acceptable?
> >>>
> >>> -Ethan
> >>> _______________________________________________
> >>> mtt-devel mailing list
> >>> mtt-devel_at_[hidden]
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel
> >>
> >>
> >> --
> >> Jeff Squyres
> >> Cisco Systems
> >>
> >> _______________________________________________
> >> mtt-devel mailing list
> >> mtt-devel_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel
> > _______________________________________________
> > mtt-devel mailing list
> > mtt-devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> mtt-devel mailing list
> mtt-devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel