Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] SM init failures
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-03-30 21:53:42


FWIW, George found what looks like a race condition in the sm init
code today -- it looks like we don't call maffinity anywhere in the sm
btl startup, so we're not actually guaranteed that the memory is local
to any particular process(or) (!). This race shouldn't cause segvs,
though; it should only mean that memory is potentially farther away
than we intended.

The central question is: does "first touch" mean both read and write?
I.e., is the first process that either reads *or* writes to a given
location considered "first touch"? Or is it only the first write?

On Mar 30, 2009, at 7:01 PM, Eugene Loh wrote:

> Jeff Squyres wrote:
>
> > On Mar 30, 2009, at 1:40 PM, Patrick Geoffray wrote:
> >
> >> > we will have to find a
> >> > pretty smart way to do this or we will completely break the
> memory
> >> > affinity stuff.
> >>
> >> I didn't look at the code, but I sure hope that the SM init code
> does
> >> touch each page to force allocation, otherwise there is no memory
> >> affinity stuff at all...
> >
> > Why not? The "owning" process can do the touch; then it'll be
> > affinity'ed properly. Right?
>
> So far as I can tell, the code has two mechanisms for memory
> placement.
> One is to create a different mpool for each affinity pool. The second
> is to have the correct owner perform the first touch. (It's not clear
> to me that the first mechanism is working, makes sense, is necessary,
> etc. I just don't know.) Anyhow, we do indeed want proper first
> touch
> and the code seems to respect that.
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
Cisco Systems