Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [EXTERNAL] Re: Question regarding osu-benchamarks 3.1.1
From: Jingcha Joba (pukkimonkey_at_[hidden])
Date: 2012-02-29 14:30:13


Squyres,
I thought RDMA read and write are implemented as one side communication
using get and put respectively..
Is it not so?

On Wed, Feb 29, 2012 at 10:49 AM, Jeffrey Squyres <jsquyres_at_[hidden]>wrote:

> FWIW, if Brian says that our one-sided stuff is a bit buggy, I believe him
> (because he wrote it). :-)
>
> The fact is that the MPI-2 one-sided stuff is extremely complicated and
> somewhat open to interpretation. In practice, I haven't seen the MPI-2
> one-sided stuff used much in the wild. The MPI-3 working group just
> revamped the one-sided support and generally made it much mo'betta. Brian
> is re-implementing that stuff, and I believe it'll also be much mo'betta.
>
> My point: I wouldn't worry if not all one-sided benchmarks run with OMPI.
> No one uses them (yet) anyway.
>
>
> On Feb 29, 2012, at 1:42 PM, Jingcha Joba wrote:
>
> > When I ran my osu tests , I was able to get the numbers out of all the
> tests except latency_mt (which was obvious, as I didnt compile open-mpi
> with multi threaded support).
> > A good way to know if the problem is with openmpi or with your custom
> OFED stack would be to use some other device like tcp instead of ib and
> rerun these one sided comm tests.
> > On Wed, Feb 29, 2012 at 10:04 AM, Barrett, Brian W <bwbarre_at_[hidden]>
> wrote:
> > I'm pretty sure that they are correct. Our one-sided implementation is
> > buggier than I'd like (indeed, I'm in the process of rewriting most of it
> > as part of Open MPI's support for MPI-3's revised RDMA), so it's likely
> > that the bugs are in Open MPI's onesided support. Can you try a more
> > recent release (something from the 1.5 tree) and see if the problem
> > persists?
> >
> > Thanks,
> >
> > Brian
> >
> > On 2/29/12 10:56 AM, "Jeffrey Squyres" <jsquyres_at_[hidden]> wrote:
> >
> > >FWIW, I'm immediately suspicious of *any* MPI application that uses the
> > >MPI one-sided operations (i.e., MPI_PUT and MPI_GET). It looks like
> > >these two OSU benchmarks are using those operations.
> > >
> > >Is it known that these two benchmarks are correct?
> > >
> > >
> > >
> > >On Feb 29, 2012, at 11:33 AM, Venkateswara Rao Dokku wrote:
> > >
> > >> Sorry, i forgot to introduce the system.. Ours is the customized OFED
> > >>stack implemented to work on the specific hardware.. We tested the
> stack
> > >>with the q-perf and Intel Benchmarks(IMB-3.2.2).. they went fine.. We
> > >>want to execute the osu_benchamark3.1.1 suite on our OFED..
> > >>
> > >> On Wed, Feb 29, 2012 at 9:57 PM, Venkateswara Rao Dokku
> > >><dvrao.584_at_[hidden]> wrote:
> > >> Hiii,
> > >> I tried executing osu_benchamarks-3.1.1 suite with the
> openmpi-1.4.3...
> > >>I could run 10 bench-mark tests (except osu_put_bibw,osu_put_bw,osu_
> > >> get_bw,osu_latency_mt) out of 14 tests in the bench-mark suite... and
> > >>the remaining tests are hanging at some message size.. the output is
> > >>shown below
> > >>
> > >> [root_at_test2 ~]# mpirun --prefix /usr/local/ -np 2 --mca btl
> > >>openib,self,sm -H 192.168.0.175,192.168.0.174 --mca
> > >>orte_base_help_aggregate 0
> > >>/root/ramu/ofed_pkgs/osu_benchmarks-3.1.1/osu_put_bibw
> > >> failed to create doorbell file /dev/plx2_char_dev
> > >>
> >
> >>-------------------------------------------------------------------------
> > >>-
> > >> WARNING: No preset parameters were found for the device that Open MPI
> > >> detected:
> > >>
> > >> Local host: test1
> > >> Device name: plx2_0
> > >> Device vendor ID: 0x10b5
> > >> Device vendor part ID: 4277
> > >>
> > >> Default device parameters will be used, which may result in lower
> > >> performance. You can edit any of the files specified by the
> > >> btl_openib_device_param_files MCA parameter to set values for your
> > >> device.
> > >>
> > >> NOTE: You can turn off this warning by setting the MCA parameter
> > >> btl_openib_warn_no_device_params_found to 0.
> > >>
> >
> >>-------------------------------------------------------------------------
> > >>-
> > >> failed to create doorbell file /dev/plx2_char_dev
> > >>
> >
> >>-------------------------------------------------------------------------
> > >>-
> > >> WARNING: No preset parameters were found for the device that Open MPI
> > >> detected:
> > >>
> > >> Local host: test2
> > >> Device name: plx2_0
> > >> Device vendor ID: 0x10b5
> > >> Device vendor part ID: 4277
> > >>
> > >> Default device parameters will be used, which may result in lower
> > >> performance. You can edit any of the files specified by the
> > >> btl_openib_device_param_files MCA parameter to set values for your
> > >> device.
> > >>
> > >> NOTE: You can turn off this warning by setting the MCA parameter
> > >> btl_openib_warn_no_device_params_found to 0.
> > >>
> >
> >>-------------------------------------------------------------------------
> > >>-
> > >> alloc_srq max: 512 wqe_shift: 5
> > >> alloc_srq max: 512 wqe_shift: 5
> > >> alloc_srq max: 512 wqe_shift: 5
> > >> alloc_srq max: 512 wqe_shift: 5
> > >> alloc_srq max: 512 wqe_shift: 5
> > >> alloc_srq max: 512 wqe_shift: 5
> > >> # OSU One Sided MPI_Put Bi-directional Bandwidth Test v3.1.1
> > >> # Size Bi-Bandwidth (MB/s)
> > >> plx2_create_qp line: 415
> > >> plx2_create_qp line: 415
> > >> plx2_create_qp line: 415
> > >> plx2_create_qp line: 415
> > >> 1 0.00
> > >> 2 0.00
> > >> 4 0.01
> > >> 8 0.03
> > >> 16 0.07
> > >> 32 0.15
> > >> 64 0.11
> > >> 128 0.21
> > >> 256 0.43
> > >> 512 0.88
> > >> 1024 2.10
> > >> 2048 4.21
> > >> 4096 8.10
> > >> 8192 16.19
> > >> 16384 8.46
> > >> 32768 20.34
> > >> 65536 39.85
> > >> 131072 84.22
> > >> 262144 142.23
> > >> 524288 234.83
> > >> mpirun: killing job...
> > >>
> > >>
> >
> >>-------------------------------------------------------------------------
> > >>-
> > >> mpirun noticed that process rank 0 with PID 7305 on node test2 exited
> > >>on signal 0 (Unknown signal 0).
> > >>
> >
> >>-------------------------------------------------------------------------
> > >>-
> > >> 2 total processes killed (some possibly by mpirun during cleanup)
> > >> mpirun: clean termination accomplished
> > >>
> > >> [root_at_test2 ~]# mpirun --prefix /usr/local/ -np 2 --mca btl
> > >>openib,self,sm -H 192.168.0.175,192.168.0.174 --mca
> > >>orte_base_help_aggregate 0
> > >>/root/ramu/ofed_pkgs/osu_benchmarks-3.1.1/osu_put_bw
> > >> failed to create doorbell file /dev/plx2_char_dev
> > >>
> >
> >>-------------------------------------------------------------------------
> > >>-
> > >> WARNING: No preset parameters were found for the device that Open MPI
> > >> detected:
> > >>
> > >> Local host: test1
> > >> Device name: plx2_0
> > >> Device vendor ID: 0x10b5
> > >> Device vendor part ID: 4277
> > >>
> > >> Default device parameters will be used, which may result in lower
> > >> performance. You can edit any of the files specified by the
> > >> btl_openib_device_param_files MCA parameter to set values for your
> > >> device.
> > >>
> > >> NOTE: You can turn off this warning by setting the MCA parameter
> > >> btl_openib_warn_no_device_params_found to 0.
> > >>
> >
> >>-------------------------------------------------------------------------
> > >>-
> > >> failed to create doorbell file /dev/plx2_char_dev
> > >>
> >
> >>-------------------------------------------------------------------------
> > >>-
> > >> WARNING: No preset parameters were found for the device that Open MPI
> > >> detected:
> > >>
> > >> Local host: test2
> > >> Device name: plx2_0
> > >> Device vendor ID: 0x10b5
> > >> Device vendor part ID: 4277
> > >>
> > >> Default device parameters will be used, which may result in lower
> > >> performance. You can edit any of the files specified by the
> > >> btl_openib_device_param_files MCA parameter to set values for your
> > >> device.
> > >>
> > >> NOTE: You can turn off this warning by setting the MCA parameter
> > >> btl_openib_warn_no_device_params_found to 0.
> > >>
> >
> >>-------------------------------------------------------------------------
> > >>-
> > >> alloc_srq max: 512 wqe_shift: 5
> > >> alloc_srq max: 512 wqe_shift: 5
> > >> alloc_srq max: 512 wqe_shift: 5
> > >> alloc_srq max: 512 wqe_shift: 5
> > >> alloc_srq max: 512 wqe_shift: 5
> > >> alloc_srq max: 512 wqe_shift: 5
> > >> # OSU One Sided MPI_Put Bandwidth Test v3.1.1
> > >> # Size Bandwidth (MB/s)
> > >> plx2_create_qp line: 415
> > >> plx2_create_qp line: 415
> > >> plx2_create_qp line: 415
> > >> plx2_create_qp line: 415
> > >> 1 0.02
> > >> 2 0.05
> > >> 4 0.10
> > >> 8 0.19
> > >> 16 0.39
> > >> 32 0.77
> > >> 64 1.53
> > >> 128 2.57
> > >> 256 4.16
> > >> 512 8.30
> > >> 1024 16.62
> > >> 2048 33.22
> > >> 4096 66.51
> > >> 8192 42.45
> > >> 16384 11.99
> > >> 32768 18.20
> > >> 65536 76.04
> > >> 131072 98.64
> > >> 262144 407.66
> > >> 524288 489.84
> > >> mpirun: killing job...
> > >>
> > >>
> >
> >>-------------------------------------------------------------------------
> > >>-
> > >> mpirun noticed that process rank 0 with PID 7314 on node test2 exited
> > >>on signal 0 (Unknown signal 0).
> > >>
> >
> >>-------------------------------------------------------------------------
> > >>-
> > >> 2 total processes killed (some possibly by mpirun during cleanup)
> > >> mpirun: clean termination accomplished
> > >>
> > >> I even checked the logs but i couldn't see any errors...
> > >> Could you suggest a way to overcome/debug this issue..
> > >>
> > >> Thanks for the kind reply..
> > >>
> > >>
> > >> --
> > >> Thanks & Regards,
> > >> D.Venkateswara Rao,
> > >> Software Engineer,One Convergence Devices Pvt Ltd.,
> > >> Jubille Hills,Hyderabad.
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> Thanks & Regards,
> > >> D.Venkateswara Rao,
> > >> Software Engineer,One Convergence Devices Pvt Ltd.,
> > >> Jubille Hills,Hyderabad.
> > >>
> > >> _______________________________________________
> > >> users mailing list
> > >> users_at_[hidden]
> > >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >
> > >
> > >--
> > >Jeff Squyres
> > >jsquyres_at_[hidden]
> > >For corporate legal information go to:
> > >http://www.cisco.com/web/about/doing_business/legal/cri/
> > >
> > >
> > >_______________________________________________
> > >users mailing list
> > >users_at_[hidden]
> > >http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >
> > >
> >
> >
> > --
> > Brian W. Barrett
> > Dept. 1423: Scalable System Software
> > Sandia National Laboratories
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>