Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] 1.7.5rc1, error "COLL-ML ml_discover_hierarchy exited with error."
From: Filippo Spiga (spiga.filippo_at_[hidden])
Date: 2014-03-03 19:14:05


Dear Open MPI developers,

I hit an expected error running OSU osu_alltoall benchmark using Open MPI 1.7.5rc1. Here the error:

$ mpirun -np 4 --map-by ppr:1:socket -bind-to core osu_alltoall
In bcol_comm_query hmca_bcol_basesmuma_allocate_sm_ctl_memory failed
In bcol_comm_query hmca_bcol_basesmuma_allocate_sm_ctl_memory failed
[tesla50][[6927,1],1][../../../../../ompi/mca/coll/ml/coll_ml_module.c:2996:mca_coll_ml_comm_query] COLL-ML ml_discover_hierarchy exited with error.

[tesla50:42200] In base_bcol_masesmuma_setup_library_buffers and mpool was not successfully setup!
[tesla50][[6927,1],0][../../../../../ompi/mca/coll/ml/coll_ml_module.c:2996:mca_coll_ml_comm_query] COLL-ML ml_discover_hierarchy exited with error.

[tesla50:42201] In base_bcol_masesmuma_setup_library_buffers and mpool was not successfully setup!
# OSU MPI All-to-All Personalized Exchange Latency Test v4.2
# Size Avg Latency(us)
--------------------------------------------------------------------------
mpirun noticed that process rank 3 with PID 4508 on node tesla51 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
2 total processes killed (some possibly by mpirun during cleanup)

Any idea where this come from?

I compiled Open MPI using Intel 12.1, latest Mellanox stack and CUDA 6.0RC. Attached outputs grabbed from configure, make and run. The configure was

export MXM_DIR=/opt/mellanox/mxm
export KNEM_DIR=$(find /opt -maxdepth 1 -type d -name "knem*" -print0)
export FCA_DIR=/opt/mellanox/fca
export HCOLL_DIR=/opt/mellanox/hcoll

../configure CC=icc CXX=icpc F77=ifort FC=ifort FFLAGS="-xSSE4.2 -axAVX -ip -O3 -fno-fnalias" FCFLAGS="-xSSE4.2 -axAVX -ip -O3 -fno-fnalias" --prefix=<...> --enable-mpirun-prefix-by-default --with-fca=$FCA_DIR --with-mxm=$MXM_DIR --with-knem=$KNEM_DIR --with-cuda=$CUDA_INSTALL_PATH --enable-mpi-thread-multiple --with-hwloc=internal --with-verbs 2>&1 | tee config.out

Thanks in advance,
Regards

Filippo

--
Mr. Filippo SPIGA, M.Sc.
http://www.linkedin.com/in/filippospiga ~ skype: filippo.spiga
«Nobody will drive us out of Cantor's paradise.» ~ David Hilbert
*****
Disclaimer: "Please note this message and any attachments are CONFIDENTIAL and may be privileged or otherwise protected from disclosure. The contents are not to be disclosed to anyone other than the addressee. Unauthorized recipients are requested to preserve this confidentiality and to advise the sender immediately of any error in transmission."