Sorry, i forgot to introduce the system.. Ours is the customized OFED stack implemented to work on the specific hardware.. We tested the stack with the q-perf and Intel Benchmarks(IMB-3.2.2).. they went fine.. We want to execute the osu_benchamark3.1.1 suite on our OFED..
Hiii,
I tried executing osu_benchamarks-3.1.1 suite with the openmpi-1.4.3... I could run 10 bench-mark tests (except osu_put_bibw,osu_put_bw,osu_get_bw,osu_latency_mt) out of 14 tests in the bench-mark suite... and the remaining tests are hanging at some message size.. the output is shown below[root@test2 ~]# mpirun --prefix /usr/local/ -np 2 --mca btl openib,self,sm -H 192.168.0.175,192.168.0.174 --mca orte_base_help_aggregate 0 /root/ramu/ofed_pkgs/osu_benchmarks-3.1.1/osu_put_bibwfailed to create doorbell file /dev/plx2_char_dev--------------------------------------------------------------------------WARNING: No preset parameters were found for the device that Open MPIdetected:Local host: test1Device name: plx2_0Device vendor ID: 0x10b5Device vendor part ID: 4277Default device parameters will be used, which may result in lowerperformance. You can edit any of the files specified by thebtl_openib_device_param_files MCA parameter to set values for yourdevice.NOTE: You can turn off this warning by setting the MCA parameterbtl_openib_warn_no_device_params_found to 0.--------------------------------------------------------------------------failed to create doorbell file /dev/plx2_char_dev--------------------------------------------------------------------------WARNING: No preset parameters were found for the device that Open MPIdetected:Local host: test2Device name: plx2_0Device vendor ID: 0x10b5Device vendor part ID: 4277Default device parameters will be used, which may result in lowerperformance. You can edit any of the files specified by thebtl_openib_device_param_files MCA parameter to set values for yourdevice.NOTE: You can turn off this warning by setting the MCA parameterbtl_openib_warn_no_device_params_found to 0.--------------------------------------------------------------------------alloc_srq max: 512 wqe_shift: 5alloc_srq max: 512 wqe_shift: 5alloc_srq max: 512 wqe_shift: 5alloc_srq max: 512 wqe_shift: 5alloc_srq max: 512 wqe_shift: 5alloc_srq max: 512 wqe_shift: 5# OSU One Sided MPI_Put Bi-directional Bandwidth Test v3.1.1# Size Bi-Bandwidth (MB/s)plx2_create_qp line: 415plx2_create_qp line: 415plx2_create_qp line: 415plx2_create_qp line: 4151 0.002 0.004 0.018 0.0316 0.0732 0.1564 0.11128 0.21256 0.43512 0.881024 2.102048 4.214096 8.108192 16.1916384 8.4632768 20.3465536 39.85131072 84.22262144 142.23524288 234.83mpirun: killing job...--------------------------------------------------------------------------mpirun noticed that process rank 0 with PID 7305 on node test2 exited on signal 0 (Unknown signal 0).--------------------------------------------------------------------------2 total processes killed (some possibly by mpirun during cleanup)mpirun: clean termination accomplished[root@test2 ~]# mpirun --prefix /usr/local/ -np 2 --mca btl openib,self,sm -H 192.168.0.175,192.168.0.174 --mca orte_base_help_aggregate 0 /root/ramu/ofed_pkgs/osu_benchmarks-3.1.1/osu_put_bwfailed to create doorbell file /dev/plx2_char_dev--------------------------------------------------------------------------WARNING: No preset parameters were found for the device that Open MPIdetected:Local host: test1Device name: plx2_0Device vendor ID: 0x10b5Device vendor part ID: 4277Default device parameters will be used, which may result in lowerperformance. You can edit any of the files specified by thebtl_openib_device_param_files MCA parameter to set values for yourdevice.NOTE: You can turn off this warning by setting the MCA parameterbtl_openib_warn_no_device_params_found to 0.--------------------------------------------------------------------------failed to create doorbell file /dev/plx2_char_dev--------------------------------------------------------------------------WARNING: No preset parameters were found for the device that Open MPIdetected:Local host: test2Device name: plx2_0Device vendor ID: 0x10b5Device vendor part ID: 4277Default device parameters will be used, which may result in lowerperformance. You can edit any of the files specified by thebtl_openib_device_param_files MCA parameter to set values for yourdevice.NOTE: You can turn off this warning by setting the MCA parameterbtl_openib_warn_no_device_params_found to 0.--------------------------------------------------------------------------alloc_srq max: 512 wqe_shift: 5alloc_srq max: 512 wqe_shift: 5alloc_srq max: 512 wqe_shift: 5alloc_srq max: 512 wqe_shift: 5alloc_srq max: 512 wqe_shift: 5alloc_srq max: 512 wqe_shift: 5# OSU One Sided MPI_Put Bandwidth Test v3.1.1# Size Bandwidth (MB/s)plx2_create_qp line: 415plx2_create_qp line: 415plx2_create_qp line: 415plx2_create_qp line: 4151 0.022 0.054 0.108 0.1916 0.3932 0.7764 1.53128 2.57256 4.16512 8.301024 16.622048 33.224096 66.518192 42.4516384 11.9932768 18.2065536 76.04131072 98.64262144 407.66524288 489.84mpirun: killing job...--------------------------------------------------------------------------mpirun noticed that process rank 0 with PID 7314 on node test2 exited on signal 0 (Unknown signal 0).--------------------------------------------------------------------------2 total processes killed (some possibly by mpirun during cleanup)mpirun: clean termination accomplishedI even checked the logs but i couldn't see any errors...Could you suggest a way to overcome/debug this issue..Thanks for the kind reply..
--
Thanks & Regards,
D.Venkateswara Rao,
Software Engineer,One Convergence Devices Pvt Ltd.,
Jubille Hills,Hyderabad.