Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Troy Benjegerdes (hozer_at_[hidden])
Date: 2005-10-24 22:21:49


On Mon, Oct 24, 2005 at 06:03:02PM -0500, Troy Benjegerdes wrote:
> troy_at_opteron1:/usr/src/netpipe3-dev$ mpirun -np 2 -mca btl_base_exclude
> openib NPmpi
> 1: opteron1
> 0: opteron1
> mpirun noticed that job rank 1 with PID 352 on node "localhost" exited
> on signal 11.
> 1 process killed (possibly by Open MPI)
>
> This is debian-amd64 (from
> deb http://mirror.espri.arizona.edu/debian-amd64/debian/ etch main )
>
> On Mon, Oct 24, 2005 at 10:36:29AM -0500, Brian Barrett wrote:
> > That's a really weird backtrace - it seems to indicate that the
> > datatype engine is improperly calling free(). Can you try running
> > without openib (add "-mca btl_base_exclude openib" to the mpirun
> > arguments) and see if the problem goes away? Also, what platform was
> > this on?

Okay.. here's another backtrace, this time with no openib.

0x00002aaaab6fb365 in malloc_usable_size () from /lib/libc.so.6
(gdb) bt
#0 0x00002aaaab6fb365 in malloc_usable_size () from /lib/libc.so.6
#1 0x00002aaaaaecb016 in opal_mem_free_free_hook ()
   from /usr/local/lib/libopal.so.0
#2 0x00002aaaaac0c663 in ompi_convertor_cleanup ()
   from /usr/local/lib/libmpi.so.0
#3 0x00002aaaaeb41dbe in mca_pml_ob1_match_completion_cache ()
   from /usr/local/lib/openmpi/mca_pml_ob1.so
#4 0x00002aaaaf179c7b in mca_btl_sm_component_progress ()
   from /usr/local/lib/openmpi/mca_btl_sm.so
#5 0x00002aaaaee5eefe in mca_bml_r2_progress ()
   from /usr/local/lib/openmpi/mca_bml_r2.so
#6 0x00002aaaaeb3dd4e in mca_pml_ob1_progress ()
   from /usr/local/lib/openmpi/mca_pml_ob1.so
#7 0x00002aaaaaeb5c4a in opal_progress () from
/usr/local/lib/libopal.so.0
#8 0x00002aaaaeb3c265 in mca_pml_ob1_recv ()
   from /usr/local/lib/openmpi/mca_pml_ob1.so
#9 0x00002aaaaf6a0936 in mca_coll_basic_barrier_intra_lin ()
   from /usr/local/lib/openmpi/mca_coll_basic.so
#10 0x00002aaaaac1f3b8 in PMPI_Barrier () from
/usr/local/lib/libmpi.so.0
#11 0x00000000004030a2 in Sync (p=0x10053d900) at src/mpi.c:89
#12 0x0000000000401f83 in main (argc=2, argv=0x7fffffe30ae8)
    at src/netpipe.c:463