Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [EXTERNAL] Re: Question regarding osu-benchamarks 3.1.1
From: Barrett, Brian W (bwbarre_at_[hidden])
Date: 2012-02-29 13:04:54


I'm pretty sure that they are correct. Our one-sided implementation is
buggier than I'd like (indeed, I'm in the process of rewriting most of it
as part of Open MPI's support for MPI-3's revised RDMA), so it's likely
that the bugs are in Open MPI's onesided support. Can you try a more
recent release (something from the 1.5 tree) and see if the problem
persists?

Thanks,

Brian

On 2/29/12 10:56 AM, "Jeffrey Squyres" <jsquyres_at_[hidden]> wrote:

>FWIW, I'm immediately suspicious of *any* MPI application that uses the
>MPI one-sided operations (i.e., MPI_PUT and MPI_GET). It looks like
>these two OSU benchmarks are using those operations.
>
>Is it known that these two benchmarks are correct?
>
>
>
>On Feb 29, 2012, at 11:33 AM, Venkateswara Rao Dokku wrote:
>
>> Sorry, i forgot to introduce the system.. Ours is the customized OFED
>>stack implemented to work on the specific hardware.. We tested the stack
>>with the q-perf and Intel Benchmarks(IMB-3.2.2).. they went fine.. We
>>want to execute the osu_benchamark3.1.1 suite on our OFED..
>>
>> On Wed, Feb 29, 2012 at 9:57 PM, Venkateswara Rao Dokku
>><dvrao.584_at_[hidden]> wrote:
>> Hiii,
>> I tried executing osu_benchamarks-3.1.1 suite with the openmpi-1.4.3...
>>I could run 10 bench-mark tests (except osu_put_bibw,osu_put_bw,osu_
>> get_bw,osu_latency_mt) out of 14 tests in the bench-mark suite... and
>>the remaining tests are hanging at some message size.. the output is
>>shown below
>>
>> [root_at_test2 ~]# mpirun --prefix /usr/local/ -np 2 --mca btl
>>openib,self,sm -H 192.168.0.175,192.168.0.174 --mca
>>orte_base_help_aggregate 0
>>/root/ramu/ofed_pkgs/osu_benchmarks-3.1.1/osu_put_bibw
>> failed to create doorbell file /dev/plx2_char_dev
>>
>>-------------------------------------------------------------------------
>>-
>> WARNING: No preset parameters were found for the device that Open MPI
>> detected:
>>
>> Local host: test1
>> Device name: plx2_0
>> Device vendor ID: 0x10b5
>> Device vendor part ID: 4277
>>
>> Default device parameters will be used, which may result in lower
>> performance. You can edit any of the files specified by the
>> btl_openib_device_param_files MCA parameter to set values for your
>> device.
>>
>> NOTE: You can turn off this warning by setting the MCA parameter
>> btl_openib_warn_no_device_params_found to 0.
>>
>>-------------------------------------------------------------------------
>>-
>> failed to create doorbell file /dev/plx2_char_dev
>>
>>-------------------------------------------------------------------------
>>-
>> WARNING: No preset parameters were found for the device that Open MPI
>> detected:
>>
>> Local host: test2
>> Device name: plx2_0
>> Device vendor ID: 0x10b5
>> Device vendor part ID: 4277
>>
>> Default device parameters will be used, which may result in lower
>> performance. You can edit any of the files specified by the
>> btl_openib_device_param_files MCA parameter to set values for your
>> device.
>>
>> NOTE: You can turn off this warning by setting the MCA parameter
>> btl_openib_warn_no_device_params_found to 0.
>>
>>-------------------------------------------------------------------------
>>-
>> alloc_srq max: 512 wqe_shift: 5
>> alloc_srq max: 512 wqe_shift: 5
>> alloc_srq max: 512 wqe_shift: 5
>> alloc_srq max: 512 wqe_shift: 5
>> alloc_srq max: 512 wqe_shift: 5
>> alloc_srq max: 512 wqe_shift: 5
>> # OSU One Sided MPI_Put Bi-directional Bandwidth Test v3.1.1
>> # Size Bi-Bandwidth (MB/s)
>> plx2_create_qp line: 415
>> plx2_create_qp line: 415
>> plx2_create_qp line: 415
>> plx2_create_qp line: 415
>> 1 0.00
>> 2 0.00
>> 4 0.01
>> 8 0.03
>> 16 0.07
>> 32 0.15
>> 64 0.11
>> 128 0.21
>> 256 0.43
>> 512 0.88
>> 1024 2.10
>> 2048 4.21
>> 4096 8.10
>> 8192 16.19
>> 16384 8.46
>> 32768 20.34
>> 65536 39.85
>> 131072 84.22
>> 262144 142.23
>> 524288 234.83
>> mpirun: killing job...
>>
>>
>>-------------------------------------------------------------------------
>>-
>> mpirun noticed that process rank 0 with PID 7305 on node test2 exited
>>on signal 0 (Unknown signal 0).
>>
>>-------------------------------------------------------------------------
>>-
>> 2 total processes killed (some possibly by mpirun during cleanup)
>> mpirun: clean termination accomplished
>>
>> [root_at_test2 ~]# mpirun --prefix /usr/local/ -np 2 --mca btl
>>openib,self,sm -H 192.168.0.175,192.168.0.174 --mca
>>orte_base_help_aggregate 0
>>/root/ramu/ofed_pkgs/osu_benchmarks-3.1.1/osu_put_bw
>> failed to create doorbell file /dev/plx2_char_dev
>>
>>-------------------------------------------------------------------------
>>-
>> WARNING: No preset parameters were found for the device that Open MPI
>> detected:
>>
>> Local host: test1
>> Device name: plx2_0
>> Device vendor ID: 0x10b5
>> Device vendor part ID: 4277
>>
>> Default device parameters will be used, which may result in lower
>> performance. You can edit any of the files specified by the
>> btl_openib_device_param_files MCA parameter to set values for your
>> device.
>>
>> NOTE: You can turn off this warning by setting the MCA parameter
>> btl_openib_warn_no_device_params_found to 0.
>>
>>-------------------------------------------------------------------------
>>-
>> failed to create doorbell file /dev/plx2_char_dev
>>
>>-------------------------------------------------------------------------
>>-
>> WARNING: No preset parameters were found for the device that Open MPI
>> detected:
>>
>> Local host: test2
>> Device name: plx2_0
>> Device vendor ID: 0x10b5
>> Device vendor part ID: 4277
>>
>> Default device parameters will be used, which may result in lower
>> performance. You can edit any of the files specified by the
>> btl_openib_device_param_files MCA parameter to set values for your
>> device.
>>
>> NOTE: You can turn off this warning by setting the MCA parameter
>> btl_openib_warn_no_device_params_found to 0.
>>
>>-------------------------------------------------------------------------
>>-
>> alloc_srq max: 512 wqe_shift: 5
>> alloc_srq max: 512 wqe_shift: 5
>> alloc_srq max: 512 wqe_shift: 5
>> alloc_srq max: 512 wqe_shift: 5
>> alloc_srq max: 512 wqe_shift: 5
>> alloc_srq max: 512 wqe_shift: 5
>> # OSU One Sided MPI_Put Bandwidth Test v3.1.1
>> # Size Bandwidth (MB/s)
>> plx2_create_qp line: 415
>> plx2_create_qp line: 415
>> plx2_create_qp line: 415
>> plx2_create_qp line: 415
>> 1 0.02
>> 2 0.05
>> 4 0.10
>> 8 0.19
>> 16 0.39
>> 32 0.77
>> 64 1.53
>> 128 2.57
>> 256 4.16
>> 512 8.30
>> 1024 16.62
>> 2048 33.22
>> 4096 66.51
>> 8192 42.45
>> 16384 11.99
>> 32768 18.20
>> 65536 76.04
>> 131072 98.64
>> 262144 407.66
>> 524288 489.84
>> mpirun: killing job...
>>
>>
>>-------------------------------------------------------------------------
>>-
>> mpirun noticed that process rank 0 with PID 7314 on node test2 exited
>>on signal 0 (Unknown signal 0).
>>
>>-------------------------------------------------------------------------
>>-
>> 2 total processes killed (some possibly by mpirun during cleanup)
>> mpirun: clean termination accomplished
>>
>> I even checked the logs but i couldn't see any errors...
>> Could you suggest a way to overcome/debug this issue..
>>
>> Thanks for the kind reply..
>>
>>
>> --
>> Thanks & Regards,
>> D.Venkateswara Rao,
>> Software Engineer,One Convergence Devices Pvt Ltd.,
>> Jubille Hills,Hyderabad.
>>
>>
>>
>>
>> --
>> Thanks & Regards,
>> D.Venkateswara Rao,
>> Software Engineer,One Convergence Devices Pvt Ltd.,
>> Jubille Hills,Hyderabad.
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>--
>Jeff Squyres
>jsquyres_at_[hidden]
>For corporate legal information go to:
>http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
>_______________________________________________
>users mailing list
>users_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>

-- 
  Brian W. Barrett
  Dept. 1423: Scalable System Software
  Sandia National Laboratories