Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] memory per core/process
From: Duke Nguyen (duke.lists_at_[hidden])
Date: 2013-03-30 06:36:12


On 3/30/13 5:22 PM, Duke Nguyen wrote:
> On 3/30/13 3:13 PM, Patrick Bégou wrote:
>> I do not know about your code but:
>>
>> 1) did you check stack limitations ? Typically intel fortran codes
>> needs large amount of stack when the problem size increase.
>> Check ulimit -a
>
> First time I heard of stack limitations. Anyway, ulimit -a gives
>
> $ ulimit -a
> core file size (blocks, -c) 0
> data seg size (kbytes, -d) unlimited
> scheduling priority (-e) 0
> file size (blocks, -f) unlimited
> pending signals (-i) 127368
> max locked memory (kbytes, -l) unlimited
> max memory size (kbytes, -m) unlimited
> open files (-n) 1024
> pipe size (512 bytes, -p) 8
> POSIX message queues (bytes, -q) 819200
> real-time priority (-r) 0
> stack size (kbytes, -s) 10240
> cpu time (seconds, -t) unlimited
> max user processes (-u) 1024
> virtual memory (kbytes, -v) unlimited
> file locks (-x) unlimited
>
> So stack size is 10MB??? Does this one create problem? How do I change
> this?

I did $ ulimit -s unlimited to have stack size to be unlimited, and the
job ran fine!!! So it looks like stack limit is the problem. Questions are:

  * how do I set this automatically (and permanently)?
  * should I set all other ulimits to be unlimited?

Thanks,

D.

>
>>
>> 2) did your node uses cpuset and memory limitation like fake numa to
>> set the maximum amount of memory available for a job ?
>
> Not really understand (also first time heard of fake numa), but I am
> pretty sure we do not have such things. The server I tried was a
> dedicated server with 2 x5420 and 16GB physical memory.
>
>>
>> Patrick
>>
>> Duke Nguyen a écrit :
>>> Hi folks,
>>>
>>> I am sorry if this question had been asked before, but after ten
>>> days of searching/working on the system, I surrender :(. We try to
>>> use mpirun to run abinit (abinit.org) which in turns will call an
>>> input file to run some simulation. The command to run is pretty simple
>>>
>>> $ mpirun -np 4 /opt/apps/abinit/bin/abinit < input.files >& output.log
>>>
>>> We ran this command on a server with two quad core x5420 and 16GB of
>>> memory. I called only 4 core, and I guess in theory each of the core
>>> should take up to 2GB each.
>>>
>>> In the output of the log, there is something about memory:
>>>
>>> P This job should need less than 717.175 Mbytes
>>> of memory.
>>> Rough estimation (10% accuracy) of disk space for files :
>>> WF disk file : 69.524 Mbytes ; DEN or POT disk file : 14.240
>>> Mbytes.
>>>
>>> So basically it reported that the above job should not take more
>>> than 718MB each core.
>>>
>>> But I still have the Segmentation Fault error:
>>>
>>> mpirun noticed that process rank 0 with PID 16099 on node biobos
>>> exited on signal 11 (Segmentation fault).
>>>
>>> The system already has limits up to unlimited:
>>>
>>> $ cat /etc/security/limits.conf | grep -v '#'
>>> * soft memlock unlimited
>>> * hard memlock unlimited
>>>
>>> I also tried to run
>>>
>>> $ ulimit -l unlimited
>>>
>>> before the mpirun command above, but it did not help at all.
>>>
>>> If we adjust the parameters of the input.files to give the reported
>>> mem per core is less than 512MB, then the job runs fine.
>>>
>>> Please help,
>>>
>>> Thanks,
>>>
>>> D.
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>