Open MPI logo

MTT Devel Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all MTT Users mailing list

Subject: Re: [MTT users] [MTT bugs] [MTT] #212: Generic networklockingserver *REVIEW NEEDED*
From: Ethan Mallove (ethan.mallove_at_[hidden])
Date: 2010-03-05 14:05:08


On Fri, Feb/19/2010 12:00:55PM, Ethan Mallove wrote:
> On Thu, Feb/18/2010 04:13:15PM, Jeff Squyres wrote:
> > On Feb 18, 2010, at 10:48 AM, Ethan Mallove wrote:
> >
> > > To ensure there is never a collision between $a->{k} and $b->{k}, the
> > > user can have two MTT clients share a $scratch, but they cannot both
> > > run the same INI section simultaneously. I setup my scheduler to run
> > > batches of MPI get, MPI install, Test get, Test build, and Test run
> > > sections in parallel with successor INI sections dependent on their
> > > predecessor INI sections (e.g., [Test run: foo] only runs after [Test
> > > build: foo] completes). The limitation stinks, but the current
> > > limitation is much worse: two MTT clients can't even run the same
> > > *phase* out of one $scratch.
> >
> > Maybe it might be a little nicer just to protect the user from
> > themselves -- if we ever detect a case where $a->{k} and $b->{k}
> > both exist and are not the same value, dump out everything to a file
> > and abort with an error message. This is clearly an erroneous
> > situation, but running MTT in big parallel batches like this is a
> > worthwhile-but-complicated endeavor, and some people are likely to
> > get it wrong. So we should at least detect the situation and fail
> > gracefully, rather than losing or corrupting results.
> >
> > Make sense?
>
> Yes. I'll add this.

The check is there now. Ready for review.

-Ethan

>
> -Ethan
>
> >
> > > I originally wanted the .dump files to be completely safe, but MTT
> > > clients were getting locked out of the .dump files for way too long.
> > > E.g., MTT::MPI::LoadInstalls happens very early in client/mtt, and an
> > > hour could elapse before MTT::MPI::SaveInstalls is called in
> > > Install.pm.
> >
> > Yep, if you lock from load->save, then that can definitely happen...
> >
> > --
> > Jeff Squyres
> > jsquyres_at_[hidden]
> >
> > For corporate legal information go to:
> > http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> >
> > _______________________________________________
> > mtt-users mailing list
> > mtt-users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
> _______________________________________________
> mtt-users mailing list
> mtt-users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users