Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2005-12-01 11:19:15

On Dec 1, 2005, at 10:58 AM, Greg Watson wrote:

> @#$%^& it! I can't get the problem to manifest for either branch now.

Well, that's good for me. :-)

FWIW, the problem existed on systems that could/would return different
addresses in different processes from mmap() for memory that was common
to all of them. E.g., if processes A and B share common memory Z, A
would get virtual address M for Z, and B would get virtual address N
(as opposed to both A and B getting virtual address M).

Here's the history of what happened...

We had code paths for that situation in the sm btl (i.e., when A and B
get different addresses for the same shared memory), but unbeknownst to
us, mmap() on most systems seems to return the same value in A and B
(this could be a side-effect of typical MPI testing memory usage
patterns... I don't know).

But FC3 and FC4 consistently did not seem to follow this pattern --
they would return different values from mmap() in different processes.
Unfortunately, we did not do any testing on FC3 or FC4 systems until a
few weeks before SC, and discovered that some of our
previously-unknowingly-untested sm bootstrap code paths had some bugs.
I fixed all of those and brought [almost all of] them over to the 1.0
release branch. I missed one patch in v1.0, but it will be included in
v1.0.1, to be released shortly.

So I'd be surprised if you were still seeing this bug in either branch
-- as far as I know, I fixed all the issues. More specifically, if you
see this behavior, it will probably be in *both* branches.

Let me know if you run across it again. Thanks!

{+} Jeff Squyres
{+} The Open MPI Project