Subject: Re: [MTT users] [MTT bugs] [MTT] #212: Generic networklocking server *REVIEW NEEDED*
From: Ethan Mallove (ethan.mallove_at_[hidden])
Date: 2010-02-18 10:48:50


On Wed, Feb/17/2010 04:57:38PM, Jeff Squyres wrote:
> Sorry for the delay...
>
> I see the comments like this:
>
> + # We write the entire MPI::sources hash to file, even
> + # though the filename indicates a single INI section
> + # MTT::Util::hashes_merge will take care of duplicate
> + # hash keys. The reason for splitting up the .dump files
> + # is to keep them read and write safe across INI sections
>
> I'm a little confused by this. I see that the goal is to have multiple MTT clients running simultaneously, all sharing a single $scratch. Per the comment above, you're writing all current data to the .dump file, even if it's more than just the one section that the parameters (and filename) implies. You're relying on merge_hashes() to "figure it out" and create one unified tree underneath.
>
> I'm a bit worried: aren't there cases where you can end up with a conflict? I.e., hash A has value X for key K, but hash B has value B for the same key K?
>

To ensure there is never a collision between $a->{k} and $b->{k}, the
user can have two MTT clients share a $scratch, but they cannot both
run the same INI section simultaneously. I setup my scheduler to run
batches of MPI get, MPI install, Test get, Test build, and Test run
sections in parallel with successor INI sections dependent on their
predecessor INI sections (e.g., [Test run: foo] only runs after [Test
build: foo] completes). The limitation stinks, but the current
limitation is much worse: two MTT clients can't even run the same
*phase* out of one $scratch.

I originally wanted the .dump files to be completely safe, but MTT
clients were getting locked out of the .dump files for way too long.
E.g., MTT::MPI::LoadInstalls happens very early in client/mtt, and an
hour could elapse before MTT::MPI::SaveInstalls is called in
Install.pm.

-Ethan

>
>
> On Feb 11, 2010, at 12:09 PM, Ethan Mallove wrote:
>
> > This apparently got lost in the shuffle a few months ago. The fix
> > allows one to kick off all of their MPI Installs and Test Builds in
> > parallel. Give it a try when you have a chance.
> >
> > -Ethan
> >
> >
> > > On Sat, Nov/07/2009 04:15:42PM, Jeff Squyres wrote:
> > > > On Nov 6, 2009, at 5:18 PM, Ethan Mallove wrote:
> > > >
> > > >> I'm running multiple MTT clients out of the same scratch directory
> > > >> using SGE. I'm running into race conditions between the multiple
> > > >> clients, where one client is overwriting another's data in the .dump
> > > >> files - which is a Very Bad Thing(tm). I'm running the
> > > >> client/mtt-lock-server, and I've added the corresponding [Lock]
> > > >> section in my INI file. Will my MTT clients now not interfere with
> > > >> each other's .dump files? I'm skeptical of this because I don't see,
> > > >> e.g., Lock() calls in SaveRuns(). How do I make my .dump files safe?
> > > >>
> > > >
> > > >
> > > > Err... perhaps this part wasn't tested well...?
> > > >
> > > > I'm afraid it's been forever since I've looked at this code and I'm gearing
> > > > up to leave for the Forum on Tuesday and then staying on for SC09, so it's
> > > > quite likely that you'll be able to look at this in more detail before I
> > > > will. Sorry to pass the buck; just trying to be realistic... :-(
> > >
> > > After some digging, I discover that MTT is not designed to execute
> > > multiple INI sections out of a single scratch directory in parallel.
> > > There's a ticket for this:
> > >
> > > https://svn.open-mpi.org/trac/mtt/ticket/167
> > >
> > > The way around this limitation is to have MTT split up the .dump files
> > > by INI section so that two MTT client running simultaneously never
> > > conflict with each other. (This change did not need to be made for the
> > > Test run .dump files, as MTT already splits them up.) I have attached
> > > a patch, which makes a simple wrapper script for #167 possible. The
> > > changes should not disrupt normal (non-parallel) execution. Anyone
> > > care to give it a try?
> > >
> > > -Ethan
> > >
> > > >
> > > > --
> > > > Jeff Squyres
> > > > jsquyres_at_[hidden]
> > > >
> > > > _______________________________________________
> > > > mtt-users mailing list
> > > > mtt-users_at_[hidden]
> > > > http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
> >
> > <mtt-safe-dump-files.diff>_______________________________________________
> > mtt-users mailing list
> > mtt-users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
> > _______________________________________________
> > mtt-users mailing list
> > mtt-users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
>
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> mtt-users mailing list
> mtt-users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users