Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] MPI fails when launched with srun using openib btl.
From: Victor Kocheganov (victor.kocheganov_at_[hidden])
Date: 2013-09-20 05:33:26


Hi folks!

I am trying to launch *MPI master branch* with srun (simple send/recv
program, see attach) and using *openib*, but unfortunately I get a *segfault
*.

Below is my workflow.
1) I configured ompi/master with following line:

./autogen.sh && ./configure --prefix=$PWD/install --with-openib --with-pmi
&& make -j3 && make install -j3

2) exported (along with PATH and LD_LIBRARY_PATH) OMPI_MCA_btl variable:

export OMPI_MCA_btl=self,openib

3) and launched with following line:

mpicc ~/usefull_tests/mpi_init.c && srun -n 2 ./a.out

Eventually I get following error:

srun: error: mir6: task 1: Segmentation fault (core dumped)
srun: Terminating job step 17309.2

with following backtrace:

#0 0x00007f856c47b1d0 in ?? ()
#1 <signal handler called>
#2 0x00007f856d12d721 in rml_recv_cb (status=0, process_name=0x2027c50,
buffer=0x7f857084ed10,
    tag=102, cbdata=0x0) at connect/btl_openib_connect_oob.c:823
#3 0x00007f857553ffb0 in orte_rml_base_process_msg (fd=-1, flags=4,
cbdata=0x2027b80)
    at base/rml_base_msg_handlers.c:172
#4 0x00007f857522a6c6 in event_process_active_single_queue
(base=0x1ed6c60, activeq=0x1ec9210)
    at event.c:1367
#5 0x00007f857522a93e in event_process_active (base=0x1ed6c60) at
event.c:1437
#6 0x00007f857522afbc in opal_libevent2021_event_base_loop
(base=0x1ed6c60, flags=1) at event.c:1645
#7 0x00007f85754ccc19 in orte_progress_thread_engine (obj=0x7f857577cf20)
at runtime/orte_init.c:180
#8 0x0000003b5a6077f1 in start_thread () from /lib64/libpthread.so.0
#9 0x0000003b59ee570d in clone () from /lib64/libc.so.6

Can anybody please help with a reason of such failure?

P.s. I use Red Hat Enterprise Linux Server release 6.2 with InfiniBand
cards.

Thanks in advance,
Victor Kocheganov.