Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] 1.7.5rc1, error "COLL-ML ml_discover_hierarchy exited with error."
From: Rolf vandeVaart (rvandevaart_at_[hidden])
Date: 2014-03-03 19:17:17


Can you try running with --mca coll ^ml and see if things work?

Rolf

>-----Original Message-----
>From: users [mailto:users-bounces_at_[hidden]] On Behalf Of Filippo Spiga
>Sent: Monday, March 03, 2014 7:14 PM
>To: Open MPI Users
>Subject: [OMPI users] 1.7.5rc1, error "COLL-ML ml_discover_hierarchy exited
>with error."
>
>Dear Open MPI developers,
>
>I hit an expected error running OSU osu_alltoall benchmark using Open MPI
>1.7.5rc1. Here the error:
>
>$ mpirun -np 4 --map-by ppr:1:socket -bind-to core osu_alltoall In
>bcol_comm_query hmca_bcol_basesmuma_allocate_sm_ctl_memory failed
>In bcol_comm_query hmca_bcol_basesmuma_allocate_sm_ctl_memory
>failed
>[tesla50][[6927,1],1][../../../../../ompi/mca/coll/ml/coll_ml_module.c:2996:mc
>a_coll_ml_comm_query] COLL-ML ml_discover_hierarchy exited with error.
>
>[tesla50:42200] In base_bcol_masesmuma_setup_library_buffers and mpool
>was not successfully setup!
>[tesla50][[6927,1],0][../../../../../ompi/mca/coll/ml/coll_ml_module.c:2996:mc
>a_coll_ml_comm_query] COLL-ML ml_discover_hierarchy exited with error.
>
>[tesla50:42201] In base_bcol_masesmuma_setup_library_buffers and mpool
>was not successfully setup!
># OSU MPI All-to-All Personalized Exchange Latency Test v4.2
># Size Avg Latency(us)
>--------------------------------------------------------------------------
>mpirun noticed that process rank 3 with PID 4508 on node tesla51 exited on
>signal 11 (Segmentation fault).
>--------------------------------------------------------------------------
>2 total processes killed (some possibly by mpirun during cleanup)
>
>Any idea where this come from?
>
>I compiled Open MPI using Intel 12.1, latest Mellanox stack and CUDA 6.0RC.
>Attached outputs grabbed from configure, make and run. The configure was
>
>export MXM_DIR=/opt/mellanox/mxm
>export KNEM_DIR=$(find /opt -maxdepth 1 -type d -name "knem*" -print0)
>export FCA_DIR=/opt/mellanox/fca export HCOLL_DIR=/opt/mellanox/hcoll
>
>../configure CC=icc CXX=icpc F77=ifort FC=ifort FFLAGS="-xSSE4.2 -axAVX -ip -
>O3 -fno-fnalias" FCFLAGS="-xSSE4.2 -axAVX -ip -O3 -fno-fnalias" --prefix=<...>
>--enable-mpirun-prefix-by-default --with-fca=$FCA_DIR --with-
>mxm=$MXM_DIR --with-knem=$KNEM_DIR --with-
>cuda=$CUDA_INSTALL_PATH --enable-mpi-thread-multiple --with-
>hwloc=internal --with-verbs 2>&1 | tee config.out
>
>
>Thanks in advance,
>Regards
>
>Filippo
>
>--
>Mr. Filippo SPIGA, M.Sc.
>http://www.linkedin.com/in/filippospiga ~ skype: filippo.spiga
>
><Nobody will drive us out of Cantor's paradise.> ~ David Hilbert
>
>*****
>Disclaimer: "Please note this message and any attachments are
>CONFIDENTIAL and may be privileged or otherwise protected from disclosure.
>The contents are not to be disclosed to anyone other than the addressee.
>Unauthorized recipients are requested to preserve this confidentiality and to
>advise the sender immediately of any error in transmission."

-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------