Eugene Loh wrote:
> Prentice Bisbal wrote:
>> Is there a limit on how many MPI processes can run on a single host?
>> I have a user trying to test his code on the command-line on a single
>> host before running it on our cluster like so:
>> mpirun -np X foo
>> When he tries to run it on large number of process (X = 256, 512), the
>> program fails, and I can reproduce this with a simple "Hello, World"
>> $ mpirun -np 256 mpihello
>> mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu
>> exited on signal 15 (Terminated).
>> 252 additional processes aborted (not shown)
>> I've done some testing and found that X <155 for this program to work.
>> Is this a bug, part of the standard, or design/implementation decision?
> One possible issue is the limit on the number of descriptors. The error
> message should be pretty helpful and descriptive, but perhaps you're
> using an older version of OMPI. If this is your problem, one workaround
> is something like this:
> unlimit descriptors
> mpirun -np 256 mpihello
Looks like I'm not allowed to set that as a regular user:
$ ulimit -n 2048
-bash: ulimit: open files: cannot modify limit: Operation not permitted
Since I am the admin, I could change that elsewhere, but I'd rather not
do that system-wide unless absolutely necessary.
> though I guess the syntax depends on what shell you're running. Another
> is to set the MCA parameter opal_set_max_sys_limits to 1.
That didn't work either:
$ mpirun -mca opal_set_max_sys_limits 1 -np 256 mpihello
mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu
exited on signal 15 (Terminated).
252 additional processes aborted (not shown)
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study