Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] segmentation fault
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-12-16 06:36:23


Have you run your application through a debugger, or examined the corefiles to see where exactly the segv is occurring? That may shed some insight into what the exact problem is.

On Dec 16, 2010, at 4:20 AM, Vaz, Guilherme wrote:

> Ok, ok. It is indeed a CFD program, and Gus got it right. Number of cells per core means memory per core (sorry for the inaccuracy).
> My PC has 12GB of RAM. And the same calculation runs fine in an old Ubuntu8.04 32bits with 4GB RAM.
> What I find strange is that the same problems runs with 1 core (without evoking mpiexec) and then for large number of cores/processes, for instance mpiexec -n 32. Something in between not. And it is not a bug in the program because it runs in other machines and the code has not been changed.
>
> Anymore hints?
>
> Thanks in advance.
>
> Guilherme
>
>
>
>
> dr. ir. Guilherme Vaz
> CFD Researcher
> Research & Development
> E mailto:G.Vaz_at_[hidden]
> T +31 317 49 33 25
>
> MARIN
> 2, Haagsteeg, P.O. Box 28, 6700 AA Wageningen, The Netherlands
> T +31 317 49 39 11, F +31 317 49 32 45, I www.marin.nl
>
> -----Original Message-----
> From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On Behalf Of Gus Correa
> Sent: Thursday, December 16, 2010 12:46 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] segmentation fault
>
> Maybe a CFD jargon?
> Perhaps the number (not size) of cells in a mesh/grid being handled
> by each core/cpu?
>
> Ralph Castain wrote:
>> I have no idea what you mean by "cell sizes per core". Certainly not any
>> terminology within OMPI...
>>
>>
>> On Dec 15, 2010, at 3:47 PM, Vaz, Guilherme wrote:
>>
>>>
>>> Dear all,
>>>
>>> I have a problem with openmpi1.3, ifort+mkl v11.1 in Ubuntu10.04
>>> systems (32 or 64bit). My code worked in Ubuntu8.04 and works in
>>> RedHat based systems, with slightly different version changes on mkl
>>> and ifort. There were no changes in the source code.
>>> The problem is that the application works for small cell sizes per
>>> core, but not for large cell sizes per core. And it always works for 1
>>> core.
>>> Example: a grid with 1.2Million cells does not work with mpiexec -n 4
>>> <my_app> but it works with mpiexec -n 32 <my_app>. It seems that there
>>> is a maximum of cell/core. And it works with <my_app>.
>>>
>>> Is this a stack size (or any memory problem)? Should I set the ulimit
>>> -s unlimited not only on my bashrc but also in the ssh environment
>>> (and how)? Or is something else?
>>> Any clues/tips?
>>>
>>> Thanks for any help.
>>>
>>> Gui
>>>
>>>
>>>
>>>
>>> <imagec393d1.JPG><image4c4685.JPG>
>>>
>>> dr. ir. Guilherme Vaz
>>>
>>> CFD Researcher
>>>
>>>
>>> Research & Development
>>>
>>>
>>>
>>>
>>>
>>> *MARIN*
>>>
>>>
>>>
>>>
>>>
>>> 2, Haagsteeg
>>> E G.Vaz_at_[hidden] <mailto:G.Vaz_at_[hidden]> P.O. Box 28 T +31 317 49 39 11
>>> 6700 AA Wageningen F +31 317 49 32 45
>>> T +31 317 49 33 25 The Netherlands I www.marin.nl <http://www.marin.nl>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden] <mailto:users_at_[hidden]>
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/