Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [EXTERNAL] Re: Question regarding osu-benchamarks 3.1.1
From: Jingcha Joba (pukkimonkey_at_[hidden])
Date: 2012-03-01 01:26:08

Well, as Jeff says, looks like its to do with the 1 sided comm.

But the reason why I said was because of what I experienced a couple of
months ago: When I had a Myri-10G and an Intel gigabit ethernet card lying
around, I wanted to test the kernel bypass using open-mx stack and I ran
the osu benchmark.
Though all the tests worked fine with the Myri 10g, I seemed to get this
"hanging" issue when running using Intel Gigabit ethernet, esp for a size
more than 1K on put/get / bcast. I tried with the tcp stack instead of mx,
and it seemed to work fine, though with bad latency numbers (which is kind
of obvious, considering that cpu overhead due to tcp).
I never really got a change to dig deep, but I was pretty much sure that
this is to do with the open-mx.

On Wed, Feb 29, 2012 at 9:13 PM, Venkateswara Rao Dokku <dvrao.584_at_[hidden]
> wrote:

> Hi,
> I tried executing those tests with the other devices like tcp
> instead of ib with the same open-mpi 1.4.3.. It went fine but it took time
> to execute, when i tried to execute the same test on the customized OFED
> ,tests are hanging at the same message size..
> Can u please tel me, what could me the possible issue over there, so that
> you can narrow down the issue..
> i.e.. Do i have to move to open-mpi 1.5 tree or there is a issue with the
> customized OFED ( in RDMA scenario's or anything (if u can specify)).
> On Thu, Mar 1, 2012 at 1:45 AM, Jeffrey Squyres <jsquyres_at_[hidden]>wrote:
>> On Feb 29, 2012, at 2:57 PM, Jingcha Joba wrote:
>> > So if I understand correctly, if a message size is smaller than it will
>> use the MPI way (non-RDMA, 2 way communication), if its larger, then it
>> would use the Open Fabrics, by using the ibverbs (and ofed stack) instead
>> of using the MPI's stack?
>> Er... no.
>> So let's talk MPI-over-OpenFabrics-verbs specifically.
>> All MPI communication calls will use verbs under the covers. They may
>> use verbs send/receive semantics in some cases, and RDMA semantics in other
>> cases. "It depends" -- on a lot of things, actually. It's hard to come up
>> with a good rule of thumb for when it uses one or the other; this is one of
>> the reasons that the openib BTL code is so complex. :-)
>> The main points here are:
>> 1. you can trust the openib BTL to do the Best thing possible to get the
>> message to the other side. Regardless of whether that message is an
>> MPI_SEND or an MPI_PUT (for example).
>> 2. MPI_PUT does not necessarily == verbs RDMA write (and likewise,
>> MPI_GET does not necessarily == verbs RDMA read).
>> > If so, could that be the reason why the MPI_Put "hangs" when sending a
>> message more than 512KB (or may be 1MB)?
>> No. I'm guessing that there's some kind of bug in the MPI_PUT
>> implementation.
>> > Also is there a way to know if for a particular MPI call, OF uses
>> send/recv or RDMA exchange?
>> Not really.
>> More specifically: all things being equal, you don't care which is used.
>> You just want your message to get to the receiver/target as fast as
>> possible. One of the main ideas of MPI is to hide those kinds of details
>> from the user. I.e., you call MPI_SEND. A miracle occurs. The message is
>> received on the other side.
>> :-)
>> --
>> Jeff Squyres
>> jsquyres_at_[hidden]
>> For corporate legal information go to:
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
> --
> Thanks & Regards,
> D.Venkateswara Rao,
> Software Engineer,One Convergence Devices Pvt Ltd.,
> Jubille Hills,Hyderabad.
> _______________________________________________
> users mailing list
> users_at_[hidden]