Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Tim Prins (tprins_at_[hidden])
Date: 2007-10-03 20:22:17


Marco,

Thanks for the report, and sorry for the delayed response. I can
replicate a problem using your test code, but it does not segfault for
me (although I am using a different version of Open MPI).

I filed a bug on this so (hopefully) out collective gurus will look at
it soon. You will receive email updates about the bug. Also, it is here:
https://svn.open-mpi.org/trac/ompi/ticket/1158

Thanks,

Tim

Marco Sbrighi wrote:
>
> Dear Open MPI developers,
>
> I'm using Open MPI 1.2.2 over OFED 1.1 on an 680 nodes dual Opteron dual
> core Linux cluster. Of course, with Infiniband interconnect.
> During the execution of big jobs (greater than 128 processes) I've
> experimented slow down in performances and deadlock in collective MPI
> operations. The job processes terminates often issuing "RETRY EXCEEDED
> ERROR", of course if the btl_openib_ib_timeout is properly set.
> Yes, this kind of error seems to be related to the fabric, but more or
> less half of the MPI processes are incurring in timeout.....
> In order to do a better investigation on that behaviour, I've tried to
> do some "constrained" tests using SKaMPI, but it is quite difficult to
> insulate a single collective operation using SKaMPI. In fact despite the
> SKaMPI script can contain only a request for (say) a Reduce, with many
> communicator sizes, the SKaMPI code will make also a lot of bcast,
> alltoall etc. by itself.
> So I've tried to use an hand made piece of code, in order to do "only" a
> repeated collective operation at a time.
> The code is attached to this message, the file is named
> collect_noparms.c.
> What is happened when I've tried to run this code is reported here:
>
> ......
>
> 011 - 011 - 039 NOOT START
> 000 - 000 of 38 - 655360 0.000000
> [node1049:11804] *** Process received signal ***
> [node1049:11804] Signal: Segmentation fault (11)
> [node1049:11804] Signal code: Address not mapped (1)
> [node1049:11804] Failing at address: 0x18
> 035 - 035 - 039 NOOT START
> 000 - 000 of 38 - 786432 0.000000
> [node1049:11804] [ 0] /lib64/tls/libpthread.so.0 [0x2a964db420]
> 000 - 000 of 38 - 917504 0.000000
> [node1049:11804] [ 1] /cineca/prod/openmpi/1.2.2/mr/gnu3.4-bc_no_memory_mgr_dbg/lib/libmpi.so.0 [0x2a9573fa18]
> [node1049:11804] [ 2] /cineca/prod/openmpi/1.2.2/mr/gnu3.4-bc_no_memory_mgr_dbg/lib/libmpi.so.0 [0x2a9573f639]
> [node1049:11804] [ 3] /cineca/prod/openmpi/1.2.2/mr/gnu3.4-bc_no_memory_mgr_dbg/lib/libmpi.so.0(mca_btl_sm_send+0x122) [0x2a9573f5e1]
> [node1049:11804] [ 4] /cineca/prod/openmpi/1.2.2/mr/gnu3.4-bc_no_memory_mgr_dbg/lib/libmpi.so.0 [0x2a957acac6]
> [node1049:11804] [ 5] /cineca/prod/openmpi/1.2.2/mr/gnu3.4-bc_no_memory_mgr_dbg/lib/libmpi.so.0(mca_pml_ob1_send_request_start_copy+0x303) [0x2a957ace52]
> [node1049:11804] [ 6] /cineca/prod/openmpi/1.2.2/mr/gnu3.4-bc_no_memory_mgr_dbg/lib/libmpi.so.0 [0x2a957a2788]
> [node1049:11804] [ 7] /cineca/prod/openmpi/1.2.2/mr/gnu3.4-bc_no_memory_mgr_dbg/lib/libmpi.so.0 [0x2a957a251c]
> [node1049:11804] [ 8] /cineca/prod/openmpi/1.2.2/mr/gnu3.4-bc_no_memory_mgr_dbg/lib/libmpi.so.0(mca_pml_ob1_send+0x2e2) [0x2a957a2d9e]
> [node1049:11804] [ 9] /cineca/prod/openmpi/1.2.2/mr/gnu3.4-bc_no_memory_mgr_dbg/lib/libmpi.so.0(ompi_coll_tuned_reduce_generic+0x651) [0x2a95751621]
> [node1049:11804] [10] /cineca/prod/openmpi/1.2.2/mr/gnu3.4-bc_no_memory_mgr_dbg/lib/libmpi.so.0(ompi_coll_tuned_reduce_intra_pipeline+0x176) [0x2a95751bff]
> [node1049:11804] [11] /cineca/prod/openmpi/1.2.2/mr/gnu3.4-bc_no_memory_mgr_dbg/lib/libmpi.so.0(ompi_coll_tuned_reduce_intra_dec_fixed+0x3f4) [0x2a957475f6]
> [node1049:11804] [12] /cineca/prod/openmpi/1.2.2/mr/gnu3.4-bc_no_memory_mgr_dbg/lib/libmpi.so.0(PMPI_Reduce+0x3a6) [0x2a9570a076]
> [node1049:11804] [13] /bcx/usercin/asm0/mpptools/mpitools/debug/src/collect_noparms_bc.x(reduce+0x3e) [0x404e64]
> [node1049:11804] [14] /bcx/usercin/asm0/mpptools/mpitools/debug/src/collect_noparms_bc.x(main+0x620) [0x404c8e]
> [node1049:11804] [15] /lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x2a966004bb]
> [node1049:11804] [16] /bcx/usercin/asm0/mpptools/mpitools/debug/src/collect_noparms_bc.x [0x40448a]
> [node1049:11804] *** End of error message ***
>
> .......
>
> the behaviour is the same, more or less identical, using either
> Infiniband or Gigabit interconnect. If I use another MPI implementation
> (say MVAPICH), all goes right.
> Then I've compiled both my code and Open MPI using gcc 3.4.4 with
> bounds-checking, compiler debugging flags, without OMPI memory
> manager ... the behaviour is identical but now I've the line were the
> SIGSEGV is trapped:
>
>
> ----------------------------------------------------------------------------------------------------------------
> gdb collect_noparms_bc.x core.11580
> GNU gdb Red Hat Linux (6.3.0.0-1.96rh)
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB. Type "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu"...Using host libthread_db library "/lib64/tls/libthread_db.so.1".
>
>
> warning: core file may not match specified executable file.
> Core was generated by `/bcx/usercin/asm0/mpptools/mpitools/debug/src/collect_noparms_bc.x'.
> Program terminated with signal 11, Segmentation fault.
> Reading symbols from /prod/openmpi/1.2.2/mr/gnu3.4-bc_no_memory_mgr_dbg/lib/libmpi.so.0...done.
> Loaded symbols for /cineca/prod/openmpi/1.2.2/mr/gnu3.4-bc_no_memory_mgr_dbg/lib/libmpi.so.0
> Reading symbols from /prod/openmpi/1.2.2/mr/gnu3.4-bc_no_memory_mgr_dbg/lib/libopen-rte.so.0...done.
> Loaded symbols for /cineca/prod/openmpi/1.2.2/mr/gnu3.4-bc_no_memory_mgr_dbg/lib/libopen-rte.so.0
> Reading symbols from /prod/openmpi/1.2.2/mr/gnu3.4-bc_no_memory_mgr_dbg/lib/libopen-pal.so.0...done.
> Loaded symbols for /cineca/prod/openmpi/1.2.2/mr/gnu3.4-bc_no_memory_mgr_dbg/lib/libopen-pal.so.0
> Reading symbols from /usr/local/ofed/lib64/libibverbs.so.1...done.
> Loaded symbols for /usr/local/ofed/lib64/libibverbs.so.1
> Reading symbols from /lib64/tls/librt.so.1...done.
> Loaded symbols for /lib64/tls/librt.so.1
> Reading symbols from /usr/lib64/libnuma.so.1...done.
> Loaded symbols for /usr/lib64/libnuma.so.1
> Reading symbols from /lib64/libnsl.so.1...done.
> Loaded symbols for /lib64/libnsl.so.1
> Reading symbols from /lib64/libutil.so.1...done.
> Loaded symbols for /lib64/libutil.so.1
> Reading symbols from /lib64/tls/libm.so.6...done.
> Loaded symbols for /lib64/tls/libm.so.6
> Reading symbols from /lib64/libdl.so.2...done.
> Loaded symbols for /lib64/libdl.so.2
> Reading symbols from /lib64/tls/libpthread.so.0...done.
> Loaded symbols for /lib64/tls/libpthread.so.0
> Reading symbols from /lib64/tls/libc.so.6...done.
> Loaded symbols for /lib64/tls/libc.so.6
> Reading symbols from /usr/lib64/libsysfs.so.1...done.
> Loaded symbols for /usr/lib64/libsysfs.so.1
> Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> Reading symbols from /lib64/libnss_files.so.2...done.
> Loaded symbols for /lib64/libnss_files.so.2
> Reading symbols from /usr/local/ofed/lib64/infiniband/ipathverbs.so...done.
> Loaded symbols for /usr/local/ofed/lib64/infiniband/ipathverbs.so
> Reading symbols from /usr/local/ofed/lib64/infiniband/mthca.so...done.
> Loaded symbols for /usr/local/ofed/lib64/infiniband/mthca.so
> Reading symbols from /lib64/libgcc_s.so.1...done.
> Loaded symbols for /lib64/libgcc_s.so.1
> #0 0x0000002a9573fa18 in ompi_cb_fifo_write_to_head_same_base_addr (data=0x2a96f7df80, fifo=0x0)
> at /cineca/prod/build/mpich/openmpi-1.2.2/ompi/class/ompi_circular_buffer_fifo.h:370
> 370 h_ptr=fifo->head;
> (gdb) bt
> #0 0x0000002a9573fa18 in ompi_cb_fifo_write_to_head_same_base_addr (data=0x2a96f7df80, fifo=0x0)
> at /cineca/prod/build/mpich/openmpi-1.2.2/ompi/class/ompi_circular_buffer_fifo.h:370
> #1 0x0000002a9573f639 in ompi_fifo_write_to_head_same_base_addr (data=0x2a96f7df80, fifo=0x2a96e476a0, fifo_allocator=0x674100)
> at /cineca/prod/build/mpich/openmpi-1.2.2/ompi/class/ompi_fifo.h:312
> #2 0x0000002a9573f5e1 in mca_btl_sm_send (btl=0x2a95923440, endpoint=0x6e9670, descriptor=0x2a96f7df80, tag=1 '\001')
> at /cineca/prod/build/mpich/openmpi-1.2.2/ompi/mca/btl/sm/btl_sm.c:894
> #3 0x0000002a957acac6 in mca_bml_base_send (bml_btl=0x67fc00, des=0x2a96f7df80, tag=1 '\001')
> at /cineca/prod/build/mpich/openmpi-1.2.2/ompi/mca/bml/bml.h:283
> #4 0x0000002a957ace52 in mca_pml_ob1_send_request_start_copy (sendreq=0x594080, bml_btl=0x67fc00, size=1024)
> at /cineca/prod/build/mpich/openmpi-1.2.2/ompi/mca/pml/ob1/pml_ob1_sendreq.c:565
> #5 0x0000002a957a2788 in mca_pml_ob1_send_request_start_btl (sendreq=0x594080, bml_btl=0x67fc00)
> at /cineca/prod/build/mpich/openmpi-1.2.2/ompi/mca/pml/ob1/pml_ob1_sendreq.h:278
> #6 0x0000002a957a251c in mca_pml_ob1_send_request_start (sendreq=0x594080)
> at /cineca/prod/build/mpich/openmpi-1.2.2/ompi/mca/pml/ob1/pml_ob1_sendreq.h:345
> #7 0x0000002a957a2d9e in mca_pml_ob1_send (buf=0x7b8400, count=256, datatype=0x51b8b0, dst=37, tag=-21,
> sendmode=MCA_PML_BASE_SEND_STANDARD, comm=0x521c00) at /cineca/prod/build/mpich/openmpi-1.2.2/ompi/mca/pml/ob1/pml_ob1_isend.c:103
> #8 0x0000002a95751621 in ompi_coll_tuned_reduce_generic (sendbuf=0x7b8000, recvbuf=0x8b9000, original_count=32512,
> datatype=0x51b8b0, op=0x51ba40, root=0, comm=0x521c00, tree=0x520b00, count_by_segment=256)
> at /cineca/prod/build/mpich/openmpi-1.2.2/ompi/mca/coll/tuned/coll_tuned_reduce.c:187
> #9 0x0000002a95751bff in ompi_coll_tuned_reduce_intra_pipeline (sendbuf=0x7b8000, recvbuf=0x8b9000, count=32768, datatype=0x51b8b0,
> op=0x51ba40, root=0, comm=0x521c00, segsize=1024)
> at /cineca/prod/build/mpich/openmpi-1.2.2/ompi/mca/coll/tuned/coll_tuned_reduce.c:255
> #10 0x0000002a957475f6 in ompi_coll_tuned_reduce_intra_dec_fixed (sendbuf=0x7b8000, recvbuf=0x8b9000, count=32768, datatype=0x51b8b0,
> op=0x51ba40, root=0, comm=0x521c00) at /cineca/prod/build/mpich/openmpi-1.2.2/ompi/mca/coll/tuned/coll_tuned_decision_fixed.c:353
> #11 0x0000002a9570a076 in PMPI_Reduce (sendbuf=0x7b8000, recvbuf=0x8b9000, count=32768, datatype=0x51b8b0, op=0x51ba40, root=0,
> comm=0x521c00) at preduce.c:96
> #12 0x0000000000404e64 in reduce (comm=0x521c00, count=32768) at collect_noparms.c:248
> #13 0x0000000000404c8e in main (argc=1, argv=0x7fbffff308) at collect_noparms.c:187
> (gdb)
> -----------------------------------------
>
>
> I think this bug is not related to my performance slowdown in collective
> operations but ..... something seems to be wrong at an higher level in
> MCA framework .....
> Is there someone able to reproduce a similar bug?
> Is there someone having performance slowdown in collective operations
> with big jobs using OFED 1.1 over InfiniBand interconnect?
> Does I need some further btl or coll tuning? (I've tried with SRQ but
> that doesn't resolve my problems).
>
>
> Marco
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users