Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] 1.3.1 -- bad MTT from Cisco
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-03-12 08:58:43

On Mar 11, 2009, at 12:19 PM, Eugene Loh wrote:

> I don't understand what's going on, but I guess each process is
> calling
> sm_btl_first_time_init(), during which it initializes its own
> shm_bases
> value, FIFOs, and shm_fifo pointer. If a remote process saw those
> memory operations in that order, then initialization of the shm_fifo
> pointer would be an indicator that the rest of the data structures had
> been initialized. But there are no memory barriers between those
> operations to order them. So, perhaps testing the shm_fifo pointer
> doesn't really mean much. I don't know enough about memory
> coherency to
> know.

FWIW, George and I puzzled through some of this code yesterday. We
didn't see anything that was obviously wrong, even though we were
actively trying to think of whacky race conditions that could be
happening. :-(

George said he'd continue to investigate.

Jeff Squyres
Cisco Systems