Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] exited on signal 11 (Segmentation fault).
From: Mouhamad Al-Sayed-Ali (Mouhamad.Al-Sayed-Ali_at_[hidden])
Date: 2011-10-25 11:06:47


Hi all,

    I've checked the "limits.conf", and it contains theses lines

# Jcb 29.06.2007 : pbs wrf (Siji)
#* hard stack 1000000
#* soft stack 1000000

# Dr 14.02.2008 : pour voltaire mpi
* hard memlock unlimited
* soft memlock unlimited

Many thanks for your help
Mouhamad

Gus Correa <gus_at_[hidden]> a écrit :

> Hi Mouhamad, Ralph, Terry
>
> Very often big programs like wrf crash with segfault because they
> can't allocate memory on the stack, and assume the system doesn't
> impose any limits for it. This has nothing to do with MPI.
>
> Mouhamad: Check if your stack size is set to unlimited on all compute
> nodes. The easy way to get it done
> is to change /etc/security/limits.conf,
> where you or your system administrator could add these lines:
>
> * - memlock -1
> * - stack -1
> * - nofile 4096
>
> My two cents,
> Gus Correa
>
> Ralph Castain wrote:
>> Looks like you are crashing in wrf - have you asked them for help?
>>
>> On Oct 25, 2011, at 7:53 AM, Mouhamad Al-Sayed-Ali wrote:
>>
>>> Hi again,
>>>
>>> This is exactly the error I have:
>>>
>>> ----
>>> taskid: 0 hostname: part034.u-bourgogne.fr
>>> [part034:21443] *** Process received signal ***
>>> [part034:21443] Signal: Segmentation fault (11)
>>> [part034:21443] Signal code: Address not mapped (1)
>>> [part034:21443] Failing at address: 0xfffffffe01eeb340
>>> [part034:21443] [ 0] /lib64/libpthread.so.0 [0x3612c0de70]
>>> [part034:21443] [ 1] wrf.exe(__module_ra_rrtm_MOD_taugb3+0x418) [0x11cc9d8]
>>> [part034:21443] [ 2] wrf.exe(__module_ra_rrtm_MOD_gasabs+0x260) [0x11cfca0]
>>> [part034:21443] [ 3] wrf.exe(__module_ra_rrtm_MOD_rrtm+0xb31) [0x11e6e41]
>>> [part034:21443] [ 4]
>>> wrf.exe(__module_ra_rrtm_MOD_rrtmlwrad+0x25ec) [0x11e9bcc]
>>> [part034:21443] [ 5]
>>> wrf.exe(__module_radiation_driver_MOD_radiation_driver+0xe573)
>>> [0xcc4ed3]
>>> [part034:21443] [ 6]
>>> wrf.exe(__module_first_rk_step_part1_MOD_first_rk_step_part1+0x40c5)
>>> [0xe0e4f5]
>>> [part034:21443] [ 7] wrf.exe(solve_em_+0x22e58) [0x9b45c8]
>>> [part034:21443] [ 8] wrf.exe(solve_interface_+0x80a) [0x902dda]
>>> [part034:21443] [ 9]
>>> wrf.exe(__module_integrate_MOD_integrate+0x236) [0x4b2c4a]
>>> [part034:21443] [10] wrf.exe(__module_wrf_top_MOD_wrf_run+0x24) [0x47a924]
>>> [part034:21443] [11] wrf.exe(main+0x41) [0x4794d1]
>>> [part034:21443] [12] /lib64/libc.so.6(__libc_start_main+0xf4)
>>> [0x361201d8b4]
>>> [part034:21443] [13] wrf.exe [0x4793c9]
>>> [part034:21443] *** End of error message ***
>>> -------
>>>
>>> Mouhamad
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>