Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Abort
From: David Ronis (David.Ronis_at_[hidden])
Date: 2010-08-13 13:18:50


Thanks to all who replied.

First, I'm running openmpi 1.4.2.

Second coredumpsize is unlimited, and indeed I DO get core dumps when
I'm running a single-processor version. Third, the problem isn't
stopping the program, MPI_Abort does that just fine, rather it's getting
a cordump. According to the man page, MPI_Abort sends a SIGTERM, not a
SIGABRT so perhaps that's what should happen.

Finally, my guess as to what's happening if I use the libc abort is that
the other nodes get stuck in an MPI call (I do lots of MPI_Reduces or
MPI_Bcasts in this code), but this doesn't explain why the node calling
abort doesn't exit with a coredump.

David

On Thu, 2010-08-12 at 20:44 -0600, Ralph Castain wrote:
> Sounds very strange - what OMPI version, on what type of machine, and how was it configured?
>
>
> On Aug 12, 2010, at 7:49 PM, David Ronis wrote:
>
> > I've got a mpi program that is supposed to to generate a core file if
> > problems arise on any of the nodes. I tried to do this by adding a
> > call to abort() to my exit routines but this doesn't work; I get no core
> > file, and worse, mpirun doesn't detect that one of my nodes has
> > aborted(?) and doesn't kill off the entire job, except in the trivial
> > case where the number of processors I'm running on is 1. I've replaced
> > abort with MPI_Abort, which kills everything off, but leaves no core
> > file. Any suggestions how I can get one and still have mpi exit?
> >
> > Thanks in advance.
> >
> > David
> >
> >
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>