Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: sadfub_at_[hidden]
Date: 2007-06-25 10:21:52


Pak Lui schrieb:
> sadfub_at_[hidden] wrote:
>> Sorry for late reply, but I havent had access to the machine at the weekend.
>>
>>> I don't really know what this means. People have explained "loose"
>>> vs. "tight" integration to me before, but since I'm not an SGE user,
>>> the definitions always fall away.
>> I *assume* loose coupled jobs, are just jobs, where the SGE find some
>> nodes to process them and from then on, it doesn't care about anything
>> in conjunction to the jobs. In contrast to tight coupled jobs, where the
>> SGE take care for sub process which could spwan from the job and
>> terminate them too in case of a failure, or take care of specified
>> resources.
>>
>>> Based on your prior e-mail, it looks like you are always invoking
>>> "ulimit" via "pdsh", even under SGE jobs. This is incorrect.
>> why?
>>
>>> Can't you just submit an SGE job script that runs "ulimit"?
>> #!/bin/csh -f
>> #$ -N MPI_Job
>> #$ -pe mpi 4
>> hostname && ulimit -a
>>
>> ATM I'm quite confused: cause I want to use the c-shell, but ulimit is
>> just for bash. The c-shell uses limit... hmm.. and SGE uses obviously
>> bash, instead of my request for csh in the first line. But if I just use
>> #!/bin/bash I get the same limits:
>>
>> -sh-3.00$ cat MPI_Job.o112116
>> node02
>> core file size (blocks, -c) unlimited
>> data seg size (kbytes, -d) unlimited
>> file size (blocks, -f) unlimited
>> pending signals (-i) 1024
>> max locked memory (kbytes, -l) 32
>> max memory size (kbytes, -m) unlimited
>> open files (-n) 1024
>> pipe size (512 bytes, -p) 8
>> POSIX message queues (bytes, -q) 819200
>> stack size (kbytes, -s) unlimited
>> cpu time (seconds, -t) unlimited
>> max user processes (-u) 139264
>> virtual memory (kbytes, -v) unlimited
>> file locks (-x) unlimited
>>
>> oops => 32 kbytes... So this isn't OMPI's fault.
>
> this looks like sge_execd isn't able to source the correct system
> defaults from the limit.conf file after you applied the change. Maybe
> you will need to restart the daemon?

Yes I posted the same question to the sun grid engine mailing list, and
as Jeff initially supposed it was the inproper limits for the daemons
(sgeexec). So I've to edit each node's init script
(/etc/init.d/sgeexecd), and put "ulimit -l unlimited" before starting
sge_execd. Then kill all sgeexecd's (running jobs won't be affected if
you use "qconf -ke all") then restart every node's sgeexecd. After that
every thing with SGE and OMPI 1.1.1 was fine.

But for the whole question just read the small thread at:
http://gridengine.sunsource.net/servlets/ReadMsg?list=users&msgNo=20390

At this point big thanks to Jeff, and all other which helped me!

Are there any suggestions to the compilation error?

many many thousand thanks for the great help here in the forum!