Open MPI logo

MTT Devel Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all MTT Devel mailing list

Subject: Re: [MTT devel] race condition in SCM module
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2011-02-21 13:53:34


On Feb 21, 2011, at 1:00 AM, Mike Dubman wrote:

> Mercurial uses "Copytree" to copy fresh checked-out copy to the build location.

This statement threw me for a little while; I had to go to the code to figure it out. It's not Mercurial.pm that uses copytree; it looks like SCM.pm sets up a pointer to use copytree later...?

Can you send your MPI Get and MPI Install sections to help understand what is going on here? I tried to dig into the code a bit, but it's been (literally) years since I looked at these parts of the code; it looks like the specific code paths are going to be driven by your ini file.

> These are messages coming from mtt during problematic situation:
>
> Debug message stating that post-copy completed (should be applied AFTER copy to the build location):
> > >> copytree running post_copy command:
>
> Debug message stating that copying completed --> race
> > >> copytree finished copying

Actually, looking at lib/MTT/Common/Copytree.pm, it looks like this sequence of messages is normal in Copytree::PrepareForInstall...? I.e., the Debug statement with ">> copytree finished copying" is actually output after the post-copy command, not before. Perhaps it's just the location of this debug statement that is confusing / incorrect...

>
>
> On Fri, Feb 18, 2011 at 6:26 PM, Jeff Squyres <jsquyres_at_[hidden]> wrote:
> Hmm -- I'm having difficulty understanding the exact scenario here.
>
> Are you using the Mercurial SCM module, or copytree?
>
> I don't think that I have used the Mercurial SCM module before -- I believe Ethan added that. Ethan -- does SCM/Mercurial work well for you?
>
>
>
> On Feb 18, 2011, at 2:06 AM, Mike Dubman wrote:
>
> >
> > 1. post_copy fails because does not find some files which should be already copied.
> > 2. In the mtt debug output, (attached in original post) you can see that "post_copy" is executed before "copytree" has finished.
> >
> > >> copytree running post_copy command:
> > ...
> > ...
> >
> > >> copytree finished copying
> >
> > On Fri, Feb 18, 2011 at 12:23 AM, Jeff Squyres <jsquyres_at_[hidden]> wrote:
> > On Feb 10, 2011, at 2:36 PM, Mike Dubman wrote:
> >
> > > There is a race condition in SCM, Mercurial module when used from MPI GET phase:
> > >
> > > - scm_post_copy hook can be started before MPI GET completed copy of fetched tree into install location.
> >
> > How have you verified this?
> >
> > > - This leads to mtt failure, because post_copy starts too early (tree was not copied yet) and fails.
> > > - adding sleeps to post_copy hook - helps.
> > > - does copytree used during mtt get phase have async behave?
> >
> > No, it shouldn't. Everything is serial.
> >
> > >
> > >
> > > ---------------- from the mtt -d -v output ---------------------
> > >
> > > copytree running post_copy command:
> > > ...
> > > ...
> > >
> > > >> copytree finished copying
> > > ----------------------------------------------------------------------------
> > >
> > >
> > > Please suggest.
> > >
> > > Thanks
> > >
> > > M
> > >
> > >
> > >
> > > _______________________________________________
> > > mtt-devel mailing list
> > > mtt-devel_at_[hidden]
> > > http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel
> >
> >
> > --
> > Jeff Squyres
> > jsquyres_at_[hidden]
> > For corporate legal information go to:
> > http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> >
> > _______________________________________________
> > mtt-devel mailing list
> > mtt-devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel
> >
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/