Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Abort
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-08-16 12:54:18


FWIW, I'm unable to replicate your behavior. This is with Open MPI 1.4.2 on RHEL5:

----
[9:52] svbu-mpi:~/mpi % cat abort.c
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
int main(int argc, char **argv)
{
    int rank;
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    if (0 == rank) {
        abort();
    }
    printf("Rank %d sleeping...\n", rank);
    sleep(600);
    printf("Rank %d finalizing...\n", rank);
    MPI_Finalize();
    return 0;
}
[9:52] svbu-mpi:~/mpi % mpicc abort.c -o abort
[9:52] svbu-mpi:~/mpi % ls -l core*
ls: No match.
[9:52] svbu-mpi:~/mpi % mpirun -np 4 --bynode --host svbu-mpi055,svbu-mpi056 ./abort
Rank 1 sleeping...
[svbu-mpi055:03991] *** Process received signal ***
[svbu-mpi055:03991] Signal: Aborted (6)
[svbu-mpi055:03991] Signal code:  (-6)
[svbu-mpi055:03991] [ 0] /lib64/libpthread.so.0 [0x2b45caac87c0]
[svbu-mpi055:03991] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x2b45cad05265]
[svbu-mpi055:03991] [ 2] /lib64/libc.so.6(abort+0x110) [0x2b45cad06d10]
[svbu-mpi055:03991] [ 3] ./abort(main+0x36) [0x4008ee]
[svbu-mpi055:03991] [ 4] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b45cacf2994]
[svbu-mpi055:03991] [ 5] ./abort [0x400809]
[svbu-mpi055:03991] *** End of error message ***
Rank 3 sleeping...
Rank 2 sleeping...
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 3991 on node svbu-mpi055 exited on signal 6 (Aborted).
--------------------------------------------------------------------------
[9:52] svbu-mpi:~/mpi % ls -l core*
-rw------- 1 jsquyres eng5 26009600 Aug 16 09:52 core.abort-1281977540-3991
[9:52] svbu-mpi:~/mpi % file core.abort-1281977540-3991 
core.abort-1281977540-3991: ELF 64-bit LSB core file AMD x86-64, version 1 (SYSV), SVR4-style, from 'abort'
[9:52] svbu-mpi:~/mpi % 
-----
You can see that all processes die immediately, and I get a corefile from the process that called abort().
On Aug 16, 2010, at 9:25 AM, David Ronis wrote:
> I've tried both--as you said, MPI_Abort doesn't drop a core file, but
> does kill off the entire MPI job.   abort() drops core when I'm running
> on 1 processor, but not in a multiprocessor run.  In addition, a node
> calling abort() doesn't lead to the entire run being killed off.
> 
> David
> O
> n Mon, 2010-08-16 at 08:51 -0700, Jeff Squyres wrote:
>> On Aug 13, 2010, at 12:53 PM, David Ronis wrote:
>> 
>>> I'm using mpirun and the nodes are all on the same machin (a 8 cpu box
>>> with an intel i7).  coresize is unlimited:
>>> 
>>> ulimit -a
>>> core file size          (blocks, -c) unlimited
>> 
>> That looks good.
>> 
>> In reviewing the email thread, it's not entirely clear: are you calling abort() or MPI_Abort()?  MPI_Abort() won't drop a core file.  abort() should.
>> 
> 
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/