Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] exited on signal 11 (Segmentation fault).
From: Gus Correa (gus_at_[hidden])
Date: 2011-10-25 10:52:29


Hi Mouhamad, Ralph, Terry

Very often big programs like wrf crash with segfault because they
can't allocate memory on the stack, and assume the system doesn't
impose any limits for it. This has nothing to do with MPI.

Mouhamad: Check if your stack size is set to unlimited on all compute
nodes. The easy way to get it done
is to change /etc/security/limits.conf,
where you or your system administrator could add these lines:

* - memlock -1
* - stack -1
* - nofile 4096

My two cents,
Gus Correa

Ralph Castain wrote:
> Looks like you are crashing in wrf - have you asked them for help?
>
> On Oct 25, 2011, at 7:53 AM, Mouhamad Al-Sayed-Ali wrote:
>
>> Hi again,
>>
>> This is exactly the error I have:
>>
>> ----
>> taskid: 0 hostname: part034.u-bourgogne.fr
>> [part034:21443] *** Process received signal ***
>> [part034:21443] Signal: Segmentation fault (11)
>> [part034:21443] Signal code: Address not mapped (1)
>> [part034:21443] Failing at address: 0xfffffffe01eeb340
>> [part034:21443] [ 0] /lib64/libpthread.so.0 [0x3612c0de70]
>> [part034:21443] [ 1] wrf.exe(__module_ra_rrtm_MOD_taugb3+0x418) [0x11cc9d8]
>> [part034:21443] [ 2] wrf.exe(__module_ra_rrtm_MOD_gasabs+0x260) [0x11cfca0]
>> [part034:21443] [ 3] wrf.exe(__module_ra_rrtm_MOD_rrtm+0xb31) [0x11e6e41]
>> [part034:21443] [ 4] wrf.exe(__module_ra_rrtm_MOD_rrtmlwrad+0x25ec) [0x11e9bcc]
>> [part034:21443] [ 5] wrf.exe(__module_radiation_driver_MOD_radiation_driver+0xe573) [0xcc4ed3]
>> [part034:21443] [ 6] wrf.exe(__module_first_rk_step_part1_MOD_first_rk_step_part1+0x40c5) [0xe0e4f5]
>> [part034:21443] [ 7] wrf.exe(solve_em_+0x22e58) [0x9b45c8]
>> [part034:21443] [ 8] wrf.exe(solve_interface_+0x80a) [0x902dda]
>> [part034:21443] [ 9] wrf.exe(__module_integrate_MOD_integrate+0x236) [0x4b2c4a]
>> [part034:21443] [10] wrf.exe(__module_wrf_top_MOD_wrf_run+0x24) [0x47a924]
>> [part034:21443] [11] wrf.exe(main+0x41) [0x4794d1]
>> [part034:21443] [12] /lib64/libc.so.6(__libc_start_main+0xf4) [0x361201d8b4]
>> [part034:21443] [13] wrf.exe [0x4793c9]
>> [part034:21443] *** End of error message ***
>> -------
>>
>> Mouhamad
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users