Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] System V Shared Memory for Open MPI: Request for Community Input and Testing
From: Samuel K. Gutierrez (samuel_at_[hidden])
Date: 2010-05-01 23:03:46


Hi Ethan,

Sorry about the lag.

As far as I can tell, calling shmctl IPC_RMID is immediately destroying
the shared memory segment even though there is at least one process
attached to it. This is interesting and confusing because Solaris 10's
behavior description of shmctl IPC_RMID is similar to that of Linux'.

I call shmctl IPC_RMID immediately after one process has attached to the
segment because, at least on Linux, this only marks the segment for
destruction. The segment is only actually destroyed after all attached
processes have terminated. I'm relying on this behavior for resource
cleanup upon application termination (normal/abnormal).

Because of this, sysv support may be limited to Linux systems - that is,
until we can get a better sense of which systems provide the shmctl
IPC_RMID behavior that I am relying on.

Any other ideas are greatly appreciated.

Thanks for testing!

--
Samuel K. Gutierrez
Los Alamos National Laboratory
> On Thu, Apr/29/2010 02:52:24PM, Samuel K. Gutierrez wrote:
>>  Hi Ethan,
>>  Bummer.  What does the following command show?
>>  sysctl -a | grep shm
>
> In this case, I think the Solaris equivalent to sysctl is prctl, e.g.,
>
>   $ prctl -i project group.staff
>   project: 10: group.staff
>   NAME    PRIVILEGE       VALUE    FLAG   ACTION
> RECIPIENT
>   ...
>   project.max-shm-memory
>           privileged      3.92GB      -   deny
>     -
>           system          16.0EB    max   deny
>     -
>   project.max-shm-ids
>           privileged        128       -   deny
>     -
>           system          16.8M     max   deny
>     -
>   ...
>
> Is that the info you need?
>
> -Ethan
>
>>  Thanks!
>>  --
>>  Samuel K. Gutierrez
>>  Los Alamos National Laboratory
>>  On Apr 29, 2010, at 1:32 PM, Ethan Mallove wrote:
>> > Hi Samuel,
>> >
>> > I'm trying to run off your HG clone, but I'm seeing issues with
c_hello, e.g.,
>> >
>> >  $ mpirun -mca mpi_common_sm sysv --mca btl self,sm,tcp --host
>> > burl-ct-v440-2,burl-ct-v440-2 -np 2 ./c_hello
>> >  --------------------------------------------------------------------------
A system call failed during shared memory initialization that should not
have.  It is likely that your MPI job will now either abort or experience
performance degradation.
>> >
>> >    Local host:  burl-ct-v440-2
>> >    System call: shmat(2)
>> >    Process:     [[43408,1],1]
>> >    Error:       Invalid argument (errno 22)
>> >  --------------------------------------------------------------------------
^Cmpirun: killing job...
>> >
>> >  $ uname -a
>> >  SunOS burl-ct-v440-2 5.10 Generic_118833-33 sun4u sparc
>> SUNW,Sun-Fire-V440
>> >
>> > The same test works okay if I s/sysv/mmap/.
>> >
>> > Regards,
>> > Ethan
>> >
>> >
>> > On Wed, Apr/28/2010 07:16:12AM, Samuel K. Gutierrez wrote:
>> >> Hi,
>> >>
>> >> Faster component initialization/finalization times is one of the
main
>> >> motivating factors of this work.  The general idea is to get away
>> from
>> >> creating a rather large backing file.  With respect to module
>> bandwidth
>> >> and
>> >> latency, mmap and sysv seem to be comparable - at least that is what
>> my
>> >> preliminary tests have shown.  As it stands, I have not come across
a
>> >> situation where the mmap SM component doesn't work or is slower.
>> >>
>> >> Hope that helps,
>> >>
>> >> --
>> >> Samuel K. Gutierrez
>> >> Los Alamos National Laboratory
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On Apr 28, 2010, at 5:35 AM, Bogdan Costescu wrote:
>> >>
>> >>> On Tue, Apr 27, 2010 at 7:55 PM, Samuel K. Gutierrez
>> <samuel_at_[hidden]>
>> >>> wrote:
>> >>>> With Jeff and Ralph's help, I have completed a System V shared
>> memory
>> >>>> component for Open MPI.
>> >>>
>> >>> What is the motivation for this work ? Are there situations where
>> the
>> >>> mmap based SM component doesn't work or is slow(er) ?
>> >>>
>> >>> Kind regards,
>> >>> Bogdan
>> >>> _______________________________________________
>> >>> devel mailing list
>> >>> devel_at_[hidden]
>> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> >>
>> >> _______________________________________________
>> >> devel mailing list
>> >> devel_at_[hidden]
>> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> > _______________________________________________
>> > devel mailing list
>> > devel_at_[hidden]
>> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>  _______________________________________________
>>  devel mailing list
>>  devel_at_[hidden]
>>  http://www.open-mpi.org/mailman/listinfo.cgi/devel
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>