Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Limit to number of processes on one node?
From: Prentice Bisbal (prentice_at_[hidden])
Date: 2010-03-03 14:16:00


Eugene Loh wrote:
> Prentice Bisbal wrote:
>> Eugene Loh wrote:
>>
>>> Prentice Bisbal wrote:
>>>
>>>> Is there a limit on how many MPI processes can run on a single host?
>>>>
> Depending on which OMPI release you're using, I think you need something
> like 4*np up to 7*np (plus a few) descriptors. So, with 256, you need
> 1000+ descriptors. You're quite possibly up against your limit, though
> I don't know for sure that that's the problem here.
>
> You say you're running 1.2.8. That's "a while ago", so would you
> consider updating as a first step? Among other things, newer OMPIs will
> generate a much clearer error message if the descriptor limit is the
> problem.

While 1.2.8 might be "a while ago", upgrading software just because it's
"old" is not a valid argument.

I can install the lastest version of OpenMPI, but it will take a little
while.

>>>> I have a user trying to test his code on the command-line on a single
>>>> host before running it on our cluster like so:
>>>>
>>>> mpirun -np X foo
>>>>
>>>> When he tries to run it on large number of process (X = 256, 512), the
>>>> program fails, and I can reproduce this with a simple "Hello, World"
>>>> program:
>>>>
>>>> $ mpirun -np 256 mpihello
>>>> mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu
>>>> exited on signal 15 (Terminated).
>>>> 252 additional processes aborted (not shown)
>>>>
>>>> I've done some testing and found that X <155 for this program to work.
>>>> Is this a bug, part of the standard, or design/implementation decision?
>>>>
>>>>
>>>>
>>> One possible issue is the limit on the number of descriptors. The error
>>> message should be pretty helpful and descriptive, but perhaps you're
>>> using an older version of OMPI. If this is your problem, one workaround
>>> is something like this:
>>>
>>> unlimit descriptors
>>> mpirun -np 256 mpihello
>>>
>>
>> Looks like I'm not allowed to set that as a regular user:
>>
>> $ ulimit -n 2048
>> -bash: ulimit: open files: cannot modify limit: Operation not permitted
>>
>> Since I am the admin, I could change that elsewhere, but I'd rather not
>> do that system-wide unless absolutely necessary.
>>
>>> though I guess the syntax depends on what shell you're running. Another
>>> is to set the MCA parameter opal_set_max_sys_limits to 1.
>>>
>> That didn't work either:
>>
>> $ mpirun -mca opal_set_max_sys_limits 1 -np 256 mpihello
>> mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu
>> exited on signal 15 (Terminated).
>> 252 additional processes aborted (not shown)
>>
>>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ