Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Shared Memory - Eager VS Rendezvous
From: Gutierrez, Samuel K (samuel_at_[hidden])
Date: 2012-05-23 10:49:08


On May 23, 2012, at 7:05 AM, Jeff Squyres wrote:

> On May 23, 2012, at 6:05 AM, Simone Pellegrini wrote:
>
>>> If process A sends a message to process B and the eager protocol is used then I assume that the message is written into a shared memory area and picked up by the receiver when the receive operation is posted.
>
> Open MPI has a few different shared memory protocols.
>
> For short messages, they always follow what you mention above: CICO.
>
> For large messages, we either use a pipelined CICO (as you surmised below) or use direct memory mapping if you have the Linux knem kernel module installed. More below.
>
>>> When the rendezvous is utilized however the message still need to end up in the shared memory area somehow. I don't think any RDMA-like transfer exists for shared memory communications.
>
> Just to clarify: RDMA = Remote Direct Memory Access, and the "remote" usually refers to a different physical address space (e.g., a different server).
>
> In Open MPI's case, knem can use a direct memory copy between two processes.

In addition, the vader BTL (XPMEM BTL) also provides similar functionality - provided the XPMEM kernel module and user library are available on the system.

Based on my limited experience with the two, I've noticed that knem is well-suited for Intel architectures, while XPMEM is best for AMD architectures.

Samuel K. Gutierrez
Los Alamos National Laboratory

>
>>> Therefore you need to buffer this message somehow, however I assume that you don't buffer the whole thing but use some type of pipelined protocol so that you reduce the size of the buffer you need to keep in the shared memory.
>
> Correct. For large messages, when using CICO, we copy the first fragment and the necessary meta data to the shmem block. When the receiver ACKs the first fragment, we pipeline CICO the rest of the large message through the shmem block. With the sender and receiver (more or less) simultaneously writing and reading to the circular shmem block, we probably won't fill it up -- meaning that the sender hypothetically won't need to block.
>
> I'm skipping a bunch of details, but that's the general idea.
>
>>> Is it completely wrong? It would be nice if someone could point me somewhere I can find more details about this. In the OpenMPI tuning page there are several details regarding the protocol utilized for IB but very little for SM.
>
> Good point. I'll see if we can get some more info up there.
>
>> I think I found the answer to my question on Jeff Squyres blog:
>> http://blogs.cisco.com/performance/shared-memory-as-an-mpi-transport-part-2/
>>
>> However now I have a new question, how do I know if my machine uses the copyin/copyout mechanism or the direct mapping?
>
> You need the Linux knem module. See the OMPI README and do a text search for "knem".
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users