Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2005-10-25 10:12:29


Troy --

We've managed to replicate this problem and are looking into it. Thanks
for reporting it!

Troy Benjegerdes wrote:
> On Mon, Oct 24, 2005 at 06:03:02PM -0500, Troy Benjegerdes wrote:
>
>>troy_at_opteron1:/usr/src/netpipe3-dev$ mpirun -np 2 -mca btl_base_exclude
>>openib NPmpi
>>1: opteron1
>>0: opteron1
>>mpirun noticed that job rank 1 with PID 352 on node "localhost" exited
>>on signal 11.
>>1 process killed (possibly by Open MPI)
>>
>>This is debian-amd64 (from
>>deb http://mirror.espri.arizona.edu/debian-amd64/debian/ etch main )
>>
>>On Mon, Oct 24, 2005 at 10:36:29AM -0500, Brian Barrett wrote:
>>
>>>That's a really weird backtrace - it seems to indicate that the
>>>datatype engine is improperly calling free(). Can you try running
>>>without openib (add "-mca btl_base_exclude openib" to the mpirun
>>>arguments) and see if the problem goes away? Also, what platform was
>>>this on?
>
>
> Okay.. here's another backtrace, this time with no openib.
>
> 0x00002aaaab6fb365 in malloc_usable_size () from /lib/libc.so.6
> (gdb) bt
> #0 0x00002aaaab6fb365 in malloc_usable_size () from /lib/libc.so.6
> #1 0x00002aaaaaecb016 in opal_mem_free_free_hook ()
> from /usr/local/lib/libopal.so.0
> #2 0x00002aaaaac0c663 in ompi_convertor_cleanup ()
> from /usr/local/lib/libmpi.so.0
> #3 0x00002aaaaeb41dbe in mca_pml_ob1_match_completion_cache ()
> from /usr/local/lib/openmpi/mca_pml_ob1.so
> #4 0x00002aaaaf179c7b in mca_btl_sm_component_progress ()
> from /usr/local/lib/openmpi/mca_btl_sm.so
> #5 0x00002aaaaee5eefe in mca_bml_r2_progress ()
> from /usr/local/lib/openmpi/mca_bml_r2.so
> #6 0x00002aaaaeb3dd4e in mca_pml_ob1_progress ()
> from /usr/local/lib/openmpi/mca_pml_ob1.so
> #7 0x00002aaaaaeb5c4a in opal_progress () from
> /usr/local/lib/libopal.so.0
> #8 0x00002aaaaeb3c265 in mca_pml_ob1_recv ()
> from /usr/local/lib/openmpi/mca_pml_ob1.so
> #9 0x00002aaaaf6a0936 in mca_coll_basic_barrier_intra_lin ()
> from /usr/local/lib/openmpi/mca_coll_basic.so
> #10 0x00002aaaaac1f3b8 in PMPI_Barrier () from
> /usr/local/lib/libmpi.so.0
> #11 0x00000000004030a2 in Sync (p=0x10053d900) at src/mpi.c:89
> #12 0x0000000000401f83 in main (argc=2, argv=0x7fffffe30ae8)
> at src/netpipe.c:463
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/