Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Segmentation fault during MPI initialization
From: Gus Correa (gus_at_[hidden])
Date: 2012-04-24 14:29:24


Hi Jeffrey

Assuming you are on Linux,
a frequent cause of out-of-nowhere segfaults
is a limited/small stack size.
They can happen if you [ab]use big automatic arrays, etc.

You can set the stacksize bigger/unlimited
with the ulimit/limit command,
or edit the /etc/security/limits.conf.

Of course, there is always a chance of a bug in the code
itself, leading to a memory violation.

I hope this helps,
Gus Correa

On 04/24/2012 01:57 PM, Jeffrey A Cummings wrote:
> I've been having an intermittent failure during MPI initialization (v
> 1.4.3) for several months. It comes and goes as I make changes to my
> application, that is changes unrelated to MPI calls. Even when I have a
> version of my app which shows the problem, it doesn't happen on every
> submittal. This is a representative stack trace:
>
> *[mtcompute-6-6:05845] *** Process received signal ***
> [mtcompute-6-6:05845] Signal: Segmentation fault (11)
> [mtcompute-6-6:05845] Signal code: Address not mapped (1)
> [mtcompute-6-6:05845] Failing at address: 0x2ac352e0bd80
> [mtcompute-6-6:05845] [ 0] /lib64/libpthread.so.0 [0x314ee0eb10]
> [mtcompute-6-6:05845] [ 1] /opt/openmpi/lib/libmpi.so.0 [0x2b2b3d42fa70]
> [mtcompute-6-6:05845] [ 2]
> /opt/openmpi/lib/libopen-pal.so.0(opal_progress+0x5a) [0x2b2b3fa694ea]
> [mtcompute-6-6:05845] [ 3] /opt/openmpi/lib/libopen-rte.so.0
> [0x2b2b3f80913c]
> [mtcompute-6-6:05845] [ 4] /opt/openmpi/lib/libmpi.so.0 [0x2b2b3d3f160c]
> [mtcompute-6-6:05845] [ 5] /opt/openmpi/lib/libmpi.so.0(MPI_Init+0xf0)
> [0x2b2b3d40eb00]
> [mtcompute-6-6:05845] [ 6]
> /home/cummings/DART/DARTHome/bin/linux/DebrisProp [0x418610]
> [mtcompute-6-6:05845] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x31df41d994]
> [mtcompute-6-6:05845] [ 8]
> /home/cummings/DART/DARTHome/bin/linux/DebrisProp [0x417992]
> [mtcompute-6-6:05845] *** End of error message ***
> --------------------------------------------------------------------------
> mpirun noticed that process rank 1 with PID 5845 on node
> mtcompute-6-6.local exited on signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
> *
> *Any suggestions would be welcome.*
>
> *- Jeff Cummings*
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users