Open MPI logo

MTT Devel Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Ethan Mallove (ethan.mallove_at_[hidden])
Date: 2007-10-16 18:36:26


On Tue, Oct/16/2007 05:37:18PM, Jeff Squyres wrote:
> On Oct 16, 2007, at 5:23 PM, Ethan Mallove wrote:
>
> > The bail is that "make" will eventually succeed or fail
> > with something other than "interrupted system call". Do
> > we need another condition?
>
> I'm just worried that Sun's NFS will somehow get in a
> perpetual "interrupted system call" loop such that you'll
> never actually break out of it.

How about a counter? E.g., after "x" number of "interrupted
system call" messages, MTT moves on. Or should the "command
restart" go in DoCommand.pm so we can have a timeout?

I also noticed that our build script (which prints hundreds
of "interrupted system call" messages per build, but does
not seem to die because of them) uses "make -j 24" while MTT
has been using "make -j 4". I'll experiment with -j.

-Ethan

>
> > I do not know which system call is getting interrupted, but
> > here's an interesting article on how different Unixes deal
> > with connect() interruptions:
> >
> > http://www.madore.org/~david/computers/connect-intr.html
> >
> > -Ethan
> >
> >
> > On Tue, Oct/16/2007 04:59:29PM, Jeff Squyres wrote:
> >> Ick!
> >>
> >> This is a long-known problem [apparently] with Sun's NFS,
> >> unfortunately. :-(
> >>
> >> I'd be ok with this if there is an eventual bail out of the loop --
> >> the prospect of an infinite loop is a bit scary for me.
> >>
> >>
> >> On Oct 16, 2007, at 11:23 AM, Ethan Mallove wrote:
> >>
> >>> On certain NFS servers, I run into the error message
> >>> "Interrupted system call" when executing long running
> >>> commands such as "make all". One solution I've been able to
> >>> use is to setup an NFS mount point solely for the cluster
> >>> I'm using, but this is not always an option. The below link
> >>> advises to restart the build on "Interrupted system call":
> >>>
> >>> http://developers.sun.com/solaris/articles/parallel_make.html
> >>>
> >>> I wrapped the GNU_Install.pm make commands in a do-while to
> >>> effect the build restarts. E.g.,
> >>>
> >>> do {
> >>> $x = MTT::DoCommand::Cmd("make install")
> >>> } while (!MTT::DoCommand::wsuccess($x->{exit_status}) and ($x->
> >>> {result_stderr} =~ /interrupted system call/i));
> >>>
> >>> As long as make emits "interrupted system call" and fails,
> >>> MTT will keep restarting make.
> >>>
> >>> I realize this is ugly, but is it acceptable?
> >>>
> >>> -Ethan
> >>> _______________________________________________
> >>> mtt-devel mailing list
> >>> mtt-devel_at_[hidden]
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel
> >>
> >>
> >> --
> >> Jeff Squyres
> >> Cisco Systems
> >>
> >> _______________________________________________
> >> mtt-devel mailing list
> >> mtt-devel_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel
> > _______________________________________________
> > mtt-devel mailing list
> > mtt-devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> mtt-devel mailing list
> mtt-devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel