Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] exited on signal 11 (Segmentation fault).
From: Gus Correa (gus_at_[hidden])
Date: 2011-10-25 10:52:29


Hi Mouhamad, Ralph, Terry

Very often big programs like wrf crash with segfault because they
can't allocate memory on the stack, and assume the system doesn't
impose any limits for it. This has nothing to do with MPI.

Mouhamad: Check if your stack size is set to unlimited on all compute
nodes. The easy way to get it done
is to change /etc/security/limits.conf,
where you or your system administrator could add these lines:

* - memlock -1
* - stack -1
* - nofile 4096

My two cents,
Gus Correa

Ralph Castain wrote:
> Looks like you are crashing in wrf - have you asked them for help?
>
> On Oct 25, 2011, at 7:53 AM, Mouhamad Al-Sayed-Ali wrote:
>
>> Hi again,
>>
>> This is exactly the error I have:
>>
>> ----
>> taskid: 0 hostname: part034.u-bourgogne.fr
>> [part034:21443] *** Process received signal ***
>> [part034:21443] Signal: Segmentation fault (11)
>> [part034:21443] Signal code: Address not mapped (1)
>> [part034:21443] Failing at address: 0xfffffffe01eeb340
>> [part034:21443] [ 0] /lib64/libpthread.so.0 [0x3612c0de70]
>> [part034:21443] [ 1] wrf.exe(__module_ra_rrtm_MOD_taugb3+0x418) [0x11cc9d8]
>> [part034:21443] [ 2] wrf.exe(__module_ra_rrtm_MOD_gasabs+0x260) [0x11cfca0]
>> [part034:21443] [ 3] wrf.exe(__module_ra_rrtm_MOD_rrtm+0xb31) [0x11e6e41]
>> [part034:21443] [ 4] wrf.exe(__module_ra_rrtm_MOD_rrtmlwrad+0x25ec) [0x11e9bcc]
>> [part034:21443] [ 5] wrf.exe(__module_radiation_driver_MOD_radiation_driver+0xe573) [0xcc4ed3]
>> [part034:21443] [ 6] wrf.exe(__module_first_rk_step_part1_MOD_first_rk_step_part1+0x40c5) [0xe0e4f5]
>> [part034:21443] [ 7] wrf.exe(solve_em_+0x22e58) [0x9b45c8]
>> [part034:21443] [ 8] wrf.exe(solve_interface_+0x80a) [0x902dda]
>> [part034:21443] [ 9] wrf.exe(__module_integrate_MOD_integrate+0x236) [0x4b2c4a]
>> [part034:21443] [10] wrf.exe(__module_wrf_top_MOD_wrf_run+0x24) [0x47a924]
>> [part034:21443] [11] wrf.exe(main+0x41) [0x4794d1]
>> [part034:21443] [12] /lib64/libc.so.6(__libc_start_main+0xf4) [0x361201d8b4]
>> [part034:21443] [13] wrf.exe [0x4793c9]
>> [part034:21443] *** End of error message ***
>> -------
>>
>> Mouhamad
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users