Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] System V Shared Memory for Open MPI: Request forCommunity Input and Testing
From: Samuel K. Gutierrez (samuel_at_[hidden])
Date: 2010-06-10 10:59:52


On Jun 10, 2010, at 1:47 AM, Sylvain Jeaugey wrote:

> On Wed, 9 Jun 2010, Jeff Squyres wrote:
>
>> On Jun 9, 2010, at 3:26 PM, Samuel K. Gutierrez wrote:
>>
>>> System V shared memory cleanup is a concern only if a process dies
>>> in
>>> between shmat and shmctl IPC_RMID. Shared memory segment cleanup
>>> should happen automagically in most cases, including abnormal
>>> process
>>> termination.
>>
>> Umm... right. Duh. I knew that.
>>
>> Really.
>>
>> So -- we're good!
>>
>> Let's open the discussion of making sysv the default on systems
>> that support the IPC_RMID behavior (which, AFAIK, is only Linux)...
> I'm sorry, but I think System V has many disadvantages over mmap.
>
> 1. As discussed before, cleaning is not as easy as for a file. It is
> a good thing to remove the shm segment after creation, but since
> problems often happen during shmget/shmat, there's still a high risk
> of letting things behind.
>
> 2. There are limits in the kernel you need to grow (kernel.shmall,
> kernel.shmmax).

I agree that this is a disadvantage, but changing shmall and shmmax
limits is *only* as painful as having a system admin change a few
settings (okay, it's painful ;-) ).

> On most linux distribution, shmmax is 32MB, which does not permit
> the sysv mechanism to work. Mmapped files are unlimited.

Not necessarily true. If a user *really* wanted to use sysv and their
system's shmmax limit was 32MB, they could just add -mca
mpool_sm_min_size 33550000 and everything would work properly. I do
understand, however, that this may not be ideal and may have
performance implications.

Based on this, I'm leaning towards the default behavior that we
currently have in the trunk:

- sysv disabled by default
- use mmap, unless sysv is explicitly requested by the user

>
> 3. Each shm segment is identified by a 32 bit integer. This
> namespace is small (and non-intuitive, as opposed to a file name),
> and the probability for a collision is not null, especially when you
> start creating multiple shared memory segments (for collectives, one-
> sided operations, ...).

I'm not sure if collisions are a problem. I'm using
shmget(IPC_PRIVATE), so I'm guessing once I've asked for more than ~
2^16 keys, things will fail.

>
> So, I'm a bit reluctant to work with System V mechanisms again. I
> don't think there is a *real* reason for System V to be faster than
> mmap, since it should just be memory. I'd rather find out why mmap
> is slower.

Jeff and I talked, and we are going to hack something together that
uses shm_open and friends and incorporates more sophisticated fallback
mechanisms if a particular component fails initialization. Once we
are done with that work, would you be willing to conduct another
similar performance study that incorporates all sm mechanisms?

Thanks,

--
Samuel K. Gutierrez
Los Alamos National Laboratory
>
> Sylvain
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel