Subject: Re: [MTT users] [MTT bugs] [MTT] #212: Generic networklockingserver *REVIEW NEEDED*
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-02-18 16:13:15


On Feb 18, 2010, at 10:48 AM, Ethan Mallove wrote:

> To ensure there is never a collision between $a->{k} and $b->{k}, the
> user can have two MTT clients share a $scratch, but they cannot both
> run the same INI section simultaneously. I setup my scheduler to run
> batches of MPI get, MPI install, Test get, Test build, and Test run
> sections in parallel with successor INI sections dependent on their
> predecessor INI sections (e.g., [Test run: foo] only runs after [Test
> build: foo] completes). The limitation stinks, but the current
> limitation is much worse: two MTT clients can't even run the same
> *phase* out of one $scratch.

Maybe it might be a little nicer just to protect the user from themselves -- if we ever detect a case where $a->{k} and $b->{k} both exist and are not the same value, dump out everything to a file and abort with an error message. This is clearly an erroneous situation, but running MTT in big parallel batches like this is a worthwhile-but-complicated endeavor, and some people are likely to get it wrong. So we should at least detect the situation and fail gracefully, rather than losing or corrupting results.

Make sense?

> I originally wanted the .dump files to be completely safe, but MTT
> clients were getting locked out of the .dump files for way too long.
> E.g., MTT::MPI::LoadInstalls happens very early in client/mtt, and an
> hour could elapse before MTT::MPI::SaveInstalls is called in
> Install.pm.

Yep, if you lock from load->save, then that can definitely happen...

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/