Subject: Re: [hwloc-devel] thread safety
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-03-12 11:05:04

On Mar 12, 2010, at 7:51 AM, Samuel Thibault wrote:

> > To support that, do we need to make internal variables and fields be volatile?
> ?! I fail to see why we would need that.
> If some threads uses a function that modifies a topology object, no
> other thread should be reading it of course, since the reader will
> possibly read incoherent data. A volatile qualifier can not fix that,
> only mutexes (or transactional memory :) ) can.

Right -- that's not what I'm asking about.

Even in this scenario:

1. thread A calls hwloc_topology_init(&a)
2. thread A calls hwloc_topology_load(a)
3. thread A launches thread B
4. thread B calls hwloc_topology_get_*(a...)
5. threads A and B synchronize
6. thread A calls hwloc_topology_load(a)
7. thread B calls hwloc_topology_get_*(a...)

If the topology struct is not marked volatile (or the fields or whatever), then the compiler *might* assume that all the data in cache/registers from step 4 may still be valid in step 7.

volatile effectively forces cache misses so that step 7 will guarantee to read from memory again, rather than relying on the compiler's optimizer to know that the data that may still be in registers from step 4 is actually (potentially) invalid.

> > If we say that applications need to provide their own synchronization
> > between readers and writers, atomic writes shouldn't be an issue,
> > right?
> I do not understand this either.

Since writes back to memory may be delayed, it could be possible that a write of a value in a topology struct only gets partially written before a read for that same value comes in from another thread (even if the threads *think* they have synchronized, such as above). Hence, thread A may have written 2 bytes of a 4 byte value when thread B actually reads it. The value that B gets could then possibly be gibberish (these are the worst kinds of bugs to try and find -- IBM is rooting out some of these in Open MPI right now, for example :-( ).

And actually, my first mail came out opposite of what I wanted to say (cut-n-paste error). I meant to say:

If we say that applications need to provide their own synchronization
between readers and writers, atomic writes **could still** be an issue,

Jeff Squyres
