Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] exited on signal 11 (Segmentation fault).
From: Mouhamad Al-Sayed-Ali (Mouhamad.Al-Sayed-Ali_at_[hidden])
Date: 2011-10-26 03:57:38


Hi Gus Correa,

  the output of ulimit -a is

----
file(blocks)         unlimited
coredump(blocks)     2048
data(kbytes)         unlimited
stack(kbytes)        10240
lockedmem(kbytes)    unlimited
memory(kbytes)       unlimited
nofiles(descriptors) 1024
processes            256
--------
Thanks
Mouhamad
Gus Correa <gus_at_[hidden]> a écrit :
> Hi Mouhamad
>
> The locked memory is set to unlimited, but the lines
> about the stack are commented out.
> Have you tried to add this line:
>
> *   -   stack       -1
>
> then run wrf again? [Note no "#" hash character]
>
> Also, if you login to the compute nodes,
> what is the output of 'limit' [csh,tcsh] or 'ulimit -a' [sh,bash]?
> This should tell you what limits are actually set.
>
> I hope this helps,
> Gus Correa
>
> Mouhamad Al-Sayed-Ali wrote:
>> Hi all,
>>
>>   I've checked the "limits.conf", and it contains theses lines
>>
>>
>> # Jcb 29.06.2007 : pbs wrf (Siji)
>> #*      hard    stack   1000000
>> #*      soft    stack   1000000
>>
>> # Dr 14.02.2008 : pour voltaire mpi
>> *      hard    memlock unlimited
>> *      soft    memlock unlimited
>>
>>
>>
>> Many thanks for your help
>> Mouhamad
>>
>> Gus Correa <gus_at_[hidden]> a écrit :
>>
>>> Hi Mouhamad, Ralph, Terry
>>>
>>> Very often big programs like wrf crash with segfault because they
>>> can't allocate memory on the stack, and assume the system doesn't
>>> impose any limits for it.  This has nothing to do with MPI.
>>>
>>> Mouhamad:  Check if your stack size is set to unlimited on all compute
>>> nodes.  The easy way to get it done
>>> is to change /etc/security/limits.conf,
>>> where you or your system administrator could add these lines:
>>>
>>> *   -   memlock     -1
>>> *   -   stack       -1
>>> *   -   nofile      4096
>>>
>>> My two cents,
>>> Gus Correa
>>>
>>> Ralph Castain wrote:
>>>> Looks like you are crashing in wrf - have you asked them for help?
>>>>
>>>> On Oct 25, 2011, at 7:53 AM, Mouhamad Al-Sayed-Ali wrote:
>>>>
>>>>> Hi again,
>>>>>
>>>>> This is exactly the error I have:
>>>>>
>>>>> ----
>>>>> taskid: 0 hostname: part034.u-bourgogne.fr
>>>>> [part034:21443] *** Process received signal ***
>>>>> [part034:21443] Signal: Segmentation fault (11)
>>>>> [part034:21443] Signal code: Address not mapped (1)
>>>>> [part034:21443] Failing at address: 0xfffffffe01eeb340
>>>>> [part034:21443] [ 0] /lib64/libpthread.so.0 [0x3612c0de70]
>>>>> [part034:21443] [ 1] wrf.exe(__module_ra_rrtm_MOD_taugb3+0x418)  
>>>>> [0x11cc9d8]
>>>>> [part034:21443] [ 2] wrf.exe(__module_ra_rrtm_MOD_gasabs+0x260)  
>>>>> [0x11cfca0]
>>>>> [part034:21443] [ 3] wrf.exe(__module_ra_rrtm_MOD_rrtm+0xb31) [0x11e6e41]
>>>>> [part034:21443] [ 4]  
>>>>> wrf.exe(__module_ra_rrtm_MOD_rrtmlwrad+0x25ec) [0x11e9bcc]
>>>>> [part034:21443] [ 5]  
>>>>> wrf.exe(__module_radiation_driver_MOD_radiation_driver+0xe573)  
>>>>> [0xcc4ed3]
>>>>> [part034:21443] [ 6]  
>>>>> wrf.exe(__module_first_rk_step_part1_MOD_first_rk_step_part1+0x40c5)  
>>>>> [0xe0e4f5]
>>>>> [part034:21443] [ 7] wrf.exe(solve_em_+0x22e58) [0x9b45c8]
>>>>> [part034:21443] [ 8] wrf.exe(solve_interface_+0x80a) [0x902dda]
>>>>> [part034:21443] [ 9]  
>>>>> wrf.exe(__module_integrate_MOD_integrate+0x236) [0x4b2c4a]
>>>>> [part034:21443] [10] wrf.exe(__module_wrf_top_MOD_wrf_run+0x24)  
>>>>> [0x47a924]
>>>>> [part034:21443] [11] wrf.exe(main+0x41) [0x4794d1]
>>>>> [part034:21443] [12] /lib64/libc.so.6(__libc_start_main+0xf4)  
>>>>> [0x361201d8b4]
>>>>> [part034:21443] [13] wrf.exe [0x4793c9]
>>>>> [part034:21443] *** End of error message ***
>>>>> -------
>>>>>
>>>>> Mouhamad
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>