Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] segmentation fault
From: Gus Correa (gus_at_[hidden])
Date: 2010-12-15 18:42:00


Vaz, Guilherme wrote:
> Dear all,
>
> I have a problem with openmpi1.3, ifort+mkl v11.1 in Ubuntu10.04 systems
> (32 or 64bit). My code worked in Ubuntu8.04 and works in RedHat based
> systems, with slightly different version changes on mkl and ifort. There
> were no changes in the source code.
> The problem is that the application works for small cell sizes per core,
> but not for large cell sizes per core. And it always works for 1 core.
> Example: a grid with 1.2Million cells does not work with mpiexec -n 4
> <my_app> but it works with mpiexec -n 32 <my_app>. It seems that there
> is a maximum of cell/core. And it works with <my_app>.
>
> Is this a stack size (or any memory problem)? Should I set the ulimit -s
> unlimited not only on my bashrc but also in the ssh environment (and
> how)? Or is something else?
> Any clues/tips?
>
> Thanks for any help.
>
> Gui
> dr. ir. Guilherme Vaz
> CFD Researcher
> Research & Development
> *MARIN*
> 2, Haagsteeg
> E G.Vaz_at_[hidden] <mailto:G.Vaz_at_[hidden]> P.O. Box 28 T +31 317 49 39 11
> 6700 AA Wageningen F +31 317 49 32 45
> T +31 317 49 33 25 The Netherlands I www.marin.nl <http://www.marin.nl>
> ------------------------------------------------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Hi Guilherme

Can you estimate how much memory each run configuration requires,
and if the problem fits your computer's RAM?
(with some slack for OS, MPI, etc)
To check directly your guess of getting out of memory,
and if the program starts swapping,
login to the compute node or nodes and use "top".

Hard to tell the cause of segfault with this information only.
It could come from a limited stack, it could be from small RAM when you
run in one computer only, it could be a bug in the code.

On RedHat/Fedora/CentOs
you can set the stack to unlimited on /etc/security/limits.conf,
maybe the same in Ubuntu.
'man limits.conf' may help.

My two cents,
Gus Correa