Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Troy Benjegerdes (hozer_at_[hidden])
Date: 2005-10-24 18:03:02


troy_at_opteron1:/usr/src/netpipe3-dev$ mpirun -np 2 -mca btl_base_exclude
openib NPmpi
1: opteron1
0: opteron1
mpirun noticed that job rank 1 with PID 352 on node "localhost" exited
on signal 11.
1 process killed (possibly by Open MPI)

This is debian-amd64 (from
deb http://mirror.espri.arizona.edu/debian-amd64/debian/ etch main )

On Mon, Oct 24, 2005 at 10:36:29AM -0500, Brian Barrett wrote:
> That's a really weird backtrace - it seems to indicate that the
> datatype engine is improperly calling free(). Can you try running
> without openib (add "-mca btl_base_exclude openib" to the mpirun
> arguments) and see if the problem goes away? Also, what platform was
> this on?
>
> Thanks,
>
> Brian
>
> On Oct 21, 2005, at 6:37 PM, Troy Benjegerdes wrote:
>
> > On Fri, Oct 21, 2005 at 06:26:32PM -0500, Troy Benjegerdes wrote:
> >
> >> On Fri, Oct 21, 2005 at 04:12:05PM -0500, Andrew Friedley wrote:
> >>
> >>> I just committed a fix to the trunk to fix your original segfault
> >>> down
> >>> in opal_show_help() - this is the same problem Ken posted. This fix
> >>> should make it into the v1.0 branch eventually. Even so, you are
> >>> going
> >>> to run into the real problem you were handling - this fix is just
> >>> for
> >>> proper error handling/output.
> >>>
> >>> The error below looks like a word size mismatch - one thing is
> >>> compiled
> >>> 64bit, the other is compiled 32bit. Make sure everything is
> >>> compiled
> >>> either 32bit or 64bit.
> >>>
> >>
> >> Another note.. I think I may have had some problems because I
> >> built with
> >> 'make -j16'.. has anyone else tried parallel make builds?
> >>
> >> I have a working mpirun now.
> >>
> >> Now I'm back to having NetPIPE segfault when I run it..
> >>
> >
> > And here's a backtrace:
> >
> > 0x00002aaaab6fb365 in malloc_usable_size () from /lib/libc.so.6
> > #0 0x00002aaaab6fb365 in malloc_usable_size () from /lib/libc.so.6
> > #1 0x00002aaaaaecb016 in opal_mem_free_free_hook ()
> > from /usr/local/lib/libopal.so.0
> > #2 0x00002aaaaac0c663 in ompi_convertor_cleanup ()
> > from /usr/local/lib/libmpi.so.0
> > #3 0x00002aaaaeb41dbe in mca_pml_ob1_match_completion_cache ()
> > from /usr/local/lib/openmpi/mca_pml_ob1.so
> > #4 0x00002aaaaf179c7b in mca_btl_sm_component_progress ()
> > from /usr/local/lib/openmpi/mca_btl_sm.so
> > #5 0x00002aaaaee5eefe in mca_bml_r2_progress ()
> > from /usr/local/lib/openmpi/mca_bml_r2.so
> > #6 0x00002aaaaeb3dd4e in mca_pml_ob1_progress ()
> > from /usr/local/lib/openmpi/mca_pml_ob1.so
> > #7 0x00002aaaaaeb5c4a in opal_progress () from
> > /usr/local/lib/libopal.so.0
> > #8 0x00002aaaaeb3c265 in mca_pml_ob1_recv ()
> > from /usr/local/lib/openmpi/mca_pml_ob1.so
> > #9 0x00002aaaaf6a0936 in mca_coll_basic_barrier_intra_lin ()
> > from /usr/local/lib/openmpi/mca_coll_basic.so
> > #10 0x00002aaaaac1f3b8 in PMPI_Barrier () from
> > /usr/local/lib/libmpi.so.0
> > ---Type <return> to continue, or q <return> to quit---#11
> > 0x0000000000403016 inSync ()
> > #12 0x0000000000401ef8 in main ()
> >
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
--------------------------------------------------------------------------
Troy Benjegerdes                'da hozer'                hozer_at_[hidden]  
Somone asked me why I work on this free (http://www.fsf.org/philosophy/)
software stuff and not get a real job. Charles Shultz had the best answer:
"Why do musicians compose symphonies and poets write poems? They do it
because life wouldn't have any meaning for them if they didn't. That's why
I draw cartoons. It's my life." -- Charles Shultz