Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] System V Shared Memory for Open MPI: Request for Community Input and Testing
From: Ashley Pittman (ashley_at_[hidden])
Date: 2010-05-02 06:49:02

On 2 May 2010, at 04:03, Samuel K. Gutierrez wrote:
> As far as I can tell, calling shmctl IPC_RMID is immediately destroying
> the shared memory segment even though there is at least one process
> attached to it. This is interesting and confusing because Solaris 10's
> behavior description of shmctl IPC_RMID is similar to that of Linux'.
> I call shmctl IPC_RMID immediately after one process has attached to the
> segment because, at least on Linux, this only marks the segment for
> destruction. The segment is only actually destroyed after all attached
> processes have terminated. I'm relying on this behavior for resource
> cleanup upon application termination (normal/abnormal).

I think you should look into this a little deeper, it certainly used to be the case on Linux that setting IPC_RMID would also prevent any further processes from attaching to the segment.

You're right that minimising the window that the region exists for without that bit set is good, both in terms of wall-clock-time and lines of code, what we used to do here was to have all processes on a node perform a out-of-band intra-node barrier before creating the segment and another in-band barrier immediately after creating it. Without this if one process on a node has problems and aborts during startup before it gets to the shared memory code then you are almost guaranteed to leave a un-attached segment behind.

As to performance there should be no difference in use between sys-V shared memory and file-backed shared memory, the instructions issued and the MMU flags for the page should both be the same so the performance should be identical.

The one area you do need to keep an eye on for performance is on numa machines where it's important which process on a node touches each page first, you can end up using different areas (pages, not regions) for communicating in different directions between the same pair of processes. I don't believe this is any different to mmap backed shared memory though.

> Because of this, sysv support may be limited to Linux systems - that is,
> until we can get a better sense of which systems provide the shmctl
> IPC_RMID behavior that I am relying on.


Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing