On Jun 10, 2010, at 1:47 AM, Sylvain Jeaugey wrote:

On Wed, 9 Jun 2010, Jeff Squyres wrote:

On Jun 9, 2010, at 3:26 PM, Samuel K. Gutierrez wrote:

System V shared memory cleanup is a concern only if a process dies in
between shmat and shmctl IPC_RMID.  Shared memory segment cleanup
should happen automagically in most cases, including abnormal process
termination.

Umm... right.  Duh.  I knew that.

Really.

So -- we're good!

Let's open the discussion of making sysv the default on systems that support the IPC_RMID behavior (which, AFAIK, is only Linux)...
I'm sorry, but I think System V has many disadvantages over mmap.

1. As discussed before, cleaning is not as easy as for a file. It is a good thing to remove the shm segment after creation, but since problems often happen during shmget/shmat, there's still a high risk of letting things behind.

2. There are limits in the kernel you need to grow (kernel.shmall, kernel.shmmax).

I agree that this is a disadvantage, but changing shmall and shmmax limits is *only* as painful as having a system admin change a few settings (okay, it's painful ;-) ).

On most linux distribution, shmmax is 32MB, which does not permit the sysv mechanism to work. Mmapped files are unlimited.

Not necessarily true.  If a user *really* wanted to use sysv and their system's shmmax limit was 32MB, they could just add -mca mpool_sm_min_size 33550000 and everything would work properly.  I do understand, however, that this may not be ideal and may have performance implications.

Based on this, I'm leaning towards the default behavior that we currently have in the trunk:

- sysv disabled by default
- use mmap, unless sysv is explicitly requested by the user


3. Each shm segment is identified by a 32 bit integer. This namespace is small (and non-intuitive, as opposed to a file name), and the probability for a collision is not null, especially when you start creating multiple shared memory segments (for collectives, one-sided operations, ...).

I'm not sure if collisions are a problem.  I'm using shmget(IPC_PRIVATE), so I'm guessing once I've asked for more than ~ 2^16 keys, things will fail.


So, I'm a bit reluctant to work with System V mechanisms again. I don't think there is a *real* reason for System V to be faster than mmap, since it should just be memory. I'd rather find out why mmap is slower.

Jeff and I talked, and we are going to hack something together that uses shm_open and friends and incorporates more sophisticated fallback mechanisms if a particular component fails initialization.  Once we are done with that work, would you be willing to conduct another similar performance study that incorporates all sm mechanisms?

Thanks,

--
Samuel K. Gutierrez
Los Alamos National Laboratory


Sylvain
_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel