Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Granular locks?
From: Gilbert Grosdidier (Gilbert.Grosdidier_at_[hidden])
Date: 2011-01-05 15:47:05

Hi Gijsbert,

  Thank you for this proposal, I think it could be useful for our LQCD
at least for further evaluations. How could I get to the code, please ?

  Thanks in advance for your help, Best, G.

Le 03/01/2011 22:36, Gijsbert Wiesenekker a écrit :
> On Oct 2, 2010, at 10:54 , Gijsbert Wiesenekker wrote:
>> On Oct 1, 2010, at 23:24 , Gijsbert Wiesenekker wrote:
>>> I have a large array that is shared between two processes. One process updates array elements randomly, the other process reads array elements randomly. Most of the time these writes and reads do not overlap.
>>> The current version of the code uses Linux shared memory with NSEMS semaphores. When array element i has to be read or updated semaphore (i % NSEMS) is used. if NSEMS = 1 the entire array will be locked which leads to unnecessary waits because reads and writes do not overlap most of the time. Performance increases as NSEMS increases, and flattens out at NSEMS = 32, at which point the code runs twice as fast when compared to NSEMS = 1.
>>> I want to change the code to use OpenMPI RMA, but MPI_Win_lock locks the entire array, which is similar to NSEMS = 1. Is there a way to have more granular locks?
>>> Gijsbert
>> Also, is there an MPI_Win_lock equavalent for IPC_NOWAIT?
>> Gijsbert
> FYI, as in my case the performance penalty by using OpenMPI RMA instead of shared memory was too large I have written a couple of wrapper functions that use OpenMPI to gracefully allocate and release shared memory:
> //mpi_alloc_shm is a collective operation that allocates arg_nrecords of arg_record_size each in the shared memory segment identified by arg_key with arg_nsems semaphores to control access.
> //arg_key is the shared memory key.
> //arg_nrecords is the number of records.
> //arg_record_size is the size of a record.
> //arg_default is the default record value. If not equal to NULL all arg_nrecord records will be initialized to *arg_default.
> //arg_nsems is the number of semaphores that will be used to control access. If record irecord has to be updated or read, semaphore (irecord % arg_nsems) will be used for exclusive access.
> //arg_mpi_id is the mpi_id of the process that will create the shared memory segment. If the mpi_id of the calling process is not equal to arg_mpi_id the process will not create but try to open it.
> void mpi_alloc_shm(key_t arg_key, i64_t arg_nrecords, i64_t arg_record_size,
> void *arg_default, int arg_nsems, int arg_mpi_id, MPI_Comm comm);
> //mpi_shm_put updates record irecord in the shared memory segment identified by shm_key with value *source.
> void mpi_shm_put(key_t shm_key, void *source, i64_t irecord);
> //mpi_shm_get tries to read record irecord in the shared memory segment identified by shm_key using IPC_NO_WAIT to request a lock.
> //FALSE is returned if the lock could not be obtained, else TRUE and the record in *dest.
> //as in my case only the creator of the shared memory segment will update it, a lock is not used if the creator tries to read record irecord.
> int mpi_shm_get(key_t shm_key, i64_t irecord, void *dest);
> //mpi_free_shm is a collective operation that deallocates the shared memory segment identified by shm_key
> void mpi_free_shm(key_t shm_key, MPI_Comm comm);
> Please feel free to contact me if you would like to have a copy of the source code of these routines.
> Regards,
> Gijsbert