Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [EXTERNAL] Re: Question regarding osu-benchamarks 3.1.1
From: Jingcha Joba (pukkimonkey_at_[hidden])
Date: 2012-02-29 13:42:58


When I ran my osu tests , I was able to get the numbers out of all the
tests except latency_mt (which was obvious, as I didnt compile open-mpi
with multi threaded support).
A good way to know if the problem is with openmpi or with your custom OFED
stack would be to use some other device like tcp instead of ib and rerun
these one sided comm tests.
On Wed, Feb 29, 2012 at 10:04 AM, Barrett, Brian W <bwbarre_at_[hidden]>wrote:

> I'm pretty sure that they are correct. Our one-sided implementation is
> buggier than I'd like (indeed, I'm in the process of rewriting most of it
> as part of Open MPI's support for MPI-3's revised RDMA), so it's likely
> that the bugs are in Open MPI's onesided support. Can you try a more
> recent release (something from the 1.5 tree) and see if the problem
> persists?
>
> Thanks,
>
> Brian
>
> On 2/29/12 10:56 AM, "Jeffrey Squyres" <jsquyres_at_[hidden]> wrote:
>
> >FWIW, I'm immediately suspicious of *any* MPI application that uses the
> >MPI one-sided operations (i.e., MPI_PUT and MPI_GET). It looks like
> >these two OSU benchmarks are using those operations.
> >
> >Is it known that these two benchmarks are correct?
> >
> >
> >
> >On Feb 29, 2012, at 11:33 AM, Venkateswara Rao Dokku wrote:
> >
> >> Sorry, i forgot to introduce the system.. Ours is the customized OFED
> >>stack implemented to work on the specific hardware.. We tested the stack
> >>with the q-perf and Intel Benchmarks(IMB-3.2.2).. they went fine.. We
> >>want to execute the osu_benchamark3.1.1 suite on our OFED..
> >>
> >> On Wed, Feb 29, 2012 at 9:57 PM, Venkateswara Rao Dokku
> >><dvrao.584_at_[hidden]> wrote:
> >> Hiii,
> >> I tried executing osu_benchamarks-3.1.1 suite with the openmpi-1.4.3...
> >>I could run 10 bench-mark tests (except osu_put_bibw,osu_put_bw,osu_
> >> get_bw,osu_latency_mt) out of 14 tests in the bench-mark suite... and
> >>the remaining tests are hanging at some message size.. the output is
> >>shown below
> >>
> >> [root_at_test2 ~]# mpirun --prefix /usr/local/ -np 2 --mca btl
> >>openib,self,sm -H 192.168.0.175,192.168.0.174 --mca
> >>orte_base_help_aggregate 0
> >>/root/ramu/ofed_pkgs/osu_benchmarks-3.1.1/osu_put_bibw
> >> failed to create doorbell file /dev/plx2_char_dev
> >>
> >>-------------------------------------------------------------------------
> >>-
> >> WARNING: No preset parameters were found for the device that Open MPI
> >> detected:
> >>
> >> Local host: test1
> >> Device name: plx2_0
> >> Device vendor ID: 0x10b5
> >> Device vendor part ID: 4277
> >>
> >> Default device parameters will be used, which may result in lower
> >> performance. You can edit any of the files specified by the
> >> btl_openib_device_param_files MCA parameter to set values for your
> >> device.
> >>
> >> NOTE: You can turn off this warning by setting the MCA parameter
> >> btl_openib_warn_no_device_params_found to 0.
> >>
> >>-------------------------------------------------------------------------
> >>-
> >> failed to create doorbell file /dev/plx2_char_dev
> >>
> >>-------------------------------------------------------------------------
> >>-
> >> WARNING: No preset parameters were found for the device that Open MPI
> >> detected:
> >>
> >> Local host: test2
> >> Device name: plx2_0
> >> Device vendor ID: 0x10b5
> >> Device vendor part ID: 4277
> >>
> >> Default device parameters will be used, which may result in lower
> >> performance. You can edit any of the files specified by the
> >> btl_openib_device_param_files MCA parameter to set values for your
> >> device.
> >>
> >> NOTE: You can turn off this warning by setting the MCA parameter
> >> btl_openib_warn_no_device_params_found to 0.
> >>
> >>-------------------------------------------------------------------------
> >>-
> >> alloc_srq max: 512 wqe_shift: 5
> >> alloc_srq max: 512 wqe_shift: 5
> >> alloc_srq max: 512 wqe_shift: 5
> >> alloc_srq max: 512 wqe_shift: 5
> >> alloc_srq max: 512 wqe_shift: 5
> >> alloc_srq max: 512 wqe_shift: 5
> >> # OSU One Sided MPI_Put Bi-directional Bandwidth Test v3.1.1
> >> # Size Bi-Bandwidth (MB/s)
> >> plx2_create_qp line: 415
> >> plx2_create_qp line: 415
> >> plx2_create_qp line: 415
> >> plx2_create_qp line: 415
> >> 1 0.00
> >> 2 0.00
> >> 4 0.01
> >> 8 0.03
> >> 16 0.07
> >> 32 0.15
> >> 64 0.11
> >> 128 0.21
> >> 256 0.43
> >> 512 0.88
> >> 1024 2.10
> >> 2048 4.21
> >> 4096 8.10
> >> 8192 16.19
> >> 16384 8.46
> >> 32768 20.34
> >> 65536 39.85
> >> 131072 84.22
> >> 262144 142.23
> >> 524288 234.83
> >> mpirun: killing job...
> >>
> >>
> >>-------------------------------------------------------------------------
> >>-
> >> mpirun noticed that process rank 0 with PID 7305 on node test2 exited
> >>on signal 0 (Unknown signal 0).
> >>
> >>-------------------------------------------------------------------------
> >>-
> >> 2 total processes killed (some possibly by mpirun during cleanup)
> >> mpirun: clean termination accomplished
> >>
> >> [root_at_test2 ~]# mpirun --prefix /usr/local/ -np 2 --mca btl
> >>openib,self,sm -H 192.168.0.175,192.168.0.174 --mca
> >>orte_base_help_aggregate 0
> >>/root/ramu/ofed_pkgs/osu_benchmarks-3.1.1/osu_put_bw
> >> failed to create doorbell file /dev/plx2_char_dev
> >>
> >>-------------------------------------------------------------------------
> >>-
> >> WARNING: No preset parameters were found for the device that Open MPI
> >> detected:
> >>
> >> Local host: test1
> >> Device name: plx2_0
> >> Device vendor ID: 0x10b5
> >> Device vendor part ID: 4277
> >>
> >> Default device parameters will be used, which may result in lower
> >> performance. You can edit any of the files specified by the
> >> btl_openib_device_param_files MCA parameter to set values for your
> >> device.
> >>
> >> NOTE: You can turn off this warning by setting the MCA parameter
> >> btl_openib_warn_no_device_params_found to 0.
> >>
> >>-------------------------------------------------------------------------
> >>-
> >> failed to create doorbell file /dev/plx2_char_dev
> >>
> >>-------------------------------------------------------------------------
> >>-
> >> WARNING: No preset parameters were found for the device that Open MPI
> >> detected:
> >>
> >> Local host: test2
> >> Device name: plx2_0
> >> Device vendor ID: 0x10b5
> >> Device vendor part ID: 4277
> >>
> >> Default device parameters will be used, which may result in lower
> >> performance. You can edit any of the files specified by the
> >> btl_openib_device_param_files MCA parameter to set values for your
> >> device.
> >>
> >> NOTE: You can turn off this warning by setting the MCA parameter
> >> btl_openib_warn_no_device_params_found to 0.
> >>
> >>-------------------------------------------------------------------------
> >>-
> >> alloc_srq max: 512 wqe_shift: 5
> >> alloc_srq max: 512 wqe_shift: 5
> >> alloc_srq max: 512 wqe_shift: 5
> >> alloc_srq max: 512 wqe_shift: 5
> >> alloc_srq max: 512 wqe_shift: 5
> >> alloc_srq max: 512 wqe_shift: 5
> >> # OSU One Sided MPI_Put Bandwidth Test v3.1.1
> >> # Size Bandwidth (MB/s)
> >> plx2_create_qp line: 415
> >> plx2_create_qp line: 415
> >> plx2_create_qp line: 415
> >> plx2_create_qp line: 415
> >> 1 0.02
> >> 2 0.05
> >> 4 0.10
> >> 8 0.19
> >> 16 0.39
> >> 32 0.77
> >> 64 1.53
> >> 128 2.57
> >> 256 4.16
> >> 512 8.30
> >> 1024 16.62
> >> 2048 33.22
> >> 4096 66.51
> >> 8192 42.45
> >> 16384 11.99
> >> 32768 18.20
> >> 65536 76.04
> >> 131072 98.64
> >> 262144 407.66
> >> 524288 489.84
> >> mpirun: killing job...
> >>
> >>
> >>-------------------------------------------------------------------------
> >>-
> >> mpirun noticed that process rank 0 with PID 7314 on node test2 exited
> >>on signal 0 (Unknown signal 0).
> >>
> >>-------------------------------------------------------------------------
> >>-
> >> 2 total processes killed (some possibly by mpirun during cleanup)
> >> mpirun: clean termination accomplished
> >>
> >> I even checked the logs but i couldn't see any errors...
> >> Could you suggest a way to overcome/debug this issue..
> >>
> >> Thanks for the kind reply..
> >>
> >>
> >> --
> >> Thanks & Regards,
> >> D.Venkateswara Rao,
> >> Software Engineer,One Convergence Devices Pvt Ltd.,
> >> Jubille Hills,Hyderabad.
> >>
> >>
> >>
> >>
> >> --
> >> Thanks & Regards,
> >> D.Venkateswara Rao,
> >> Software Engineer,One Convergence Devices Pvt Ltd.,
> >> Jubille Hills,Hyderabad.
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> >--
> >Jeff Squyres
> >jsquyres_at_[hidden]
> >For corporate legal information go to:
> >http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> >
> >_______________________________________________
> >users mailing list
> >users_at_[hidden]
> >http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
>
>
> --
> Brian W. Barrett
> Dept. 1423: Scalable System Software
> Sandia National Laboratories
>
>
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>