Hi,
It sounds like Open MPI is hitting your system's open file descriptor
limit. If that's the case, one potential workaround is to have your
system administrator raise file descriptor limits.
On a compute node, what does "ulimit -a" show (using bash)?
Hope that helps,
--
Samuel K. Gutierrez
Los Alamos National Laboratory
On Mar 30, 2011, at 5:22 PM, Timothy Stitt wrote:
> Dear OpenMPI developers,
>
> One of our users was running a benchmark on a 1032 core simulation.
> He had a successful run at 900 cores but when he stepped up to 1032
> cores the job just stalled and his logs contained many occurrences
> of the following line:
>
>> [d6copt368.crc.nd.edu][[25621,1],0][btl_tcp_component.c:
>> 885:mca_btl_tcp_component_accept_handler] accept() failed: Too many
>> open files (24)
>
> The simulation has a single master task that communicates with all
> the other tasks to write out some I/O via the master. We are
> assuming the message is related to this bottleneck. Is there a 1024
> limit on the number of open files/connections for instance?
>
> Can anyone confirm the meaning of this error and secondly provide a
> resolution that hopefully doesn't involve a code rewrite.
>
> Thanks in advance,
>
> Tim.
>
> Tim Stitt PhD (User Support Manager).
> Center for Research Computing | University of Notre Dame |
> P.O. Box 539, Notre Dame, IN 46556 | Phone: 574-631-5287 | Email: tstitt_at_[hidden]
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
|