Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Error when using more than 88 processors for a specific executable -Abyss
From: Ashwani Kumar Mishra (ashwanimishra_at_[hidden])
Date: 2011-10-14 15:29:02


Hi Ralph,
No idea how much this program consumes the numbers of file descriptors :(

Best Regards,
Ashwani

On Sat, Oct 15, 2011 at 12:08 AM, Ralph Castain <rhc_at_[hidden]> wrote:

> Should be plenty for us - does your program consume a lot?
>
>
> On Oct 14, 2011, at 12:25 PM, Ashwani Kumar Mishra wrote:
>
> Hi Ralph,
>
> fs.file-max = 100000
>
> is this ok or less?
>
> Best Regards,
> Ashwani
>
>
> On Fri, Oct 14, 2011 at 11:45 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>
>> Can't offer much about the qsub job. On the first one, what is your limit
>> on the number of file descriptors? Could be your sys admin has it too low.
>>
>>
>> On Oct 14, 2011, at 12:07 PM, Ashwani Kumar Mishra wrote:
>>
>> Hello,
>> When i try to run the following command i receive the following error when
>> i try to submit this job on the cluster having 40 nodes with each node
>> having 8 processor & 8 GB RAM:
>>
>> Both the command work well, as long as i use only upto 88 processors in
>> the cluster, but the moment i allocate more than 88 processors it gives me
>> the below 2 errors:
>>
>> I tried to set the ulimit to unlimited & setting mca parameter
>> opal_set_max_sys_limits to 1 but still the problem wont go.
>>
>>
>> $ mpirun=/*opt*/psc/ompi/bin/mpirun abyss-pe np=100 name=cattle k=50
>> n=10 in=s_1_1_sequence.txt
>>
>> /opt/mpi/openmpi/1.3.3/intel/
>> bin/mpirun -np 100 ABYSS-P -k50 -q3 --coverage-hist=coverage.hist -s
>> cattle-bubbles.fa -o cattle-1.fa s_1_1_sequence.txt
>> [coe:19807 <http://coe.iitd.ac.in:19807/>] [[62863,0],0] ORTE_ERROR_LOG:
>> The system limit on number of pipes a process can open was reached in file
>> base/iof_base_setup.c at line 107
>> [coe.:19807 <http://coe.iitd.ac.in:19807/>] [[62863,0],0] ORTE_ERROR_LOG:
>> The system limit on number of pipes a process can open was reached in file
>> odls_default_module.c at line 203
>> [coe.:19807 <http://coe.iitd.ac.in:19807/>] [[62863,0],0] ORTE_ERROR_LOG:
>> The system limit on number of network connections a process can open was
>> reached in file oob_tcp.c at line 447
>> --------------------------------------------------------------------------
>> Error: system limit exceeded on number of network connections that can be
>> open
>>
>> This can be resolved by setting the mca parameter opal_set_max_sys_limits
>> to 1,
>> increasing your limit descriptor setting (using limit or ulimit commands),
>> or asking the system administrator to increase the system limit.
>> --------------------------------------------------------------------------
>> make: *** [cattle-1.fa] Error 1
>>
>>
>>
>> *
>> When i submit the same job through qsub, i receive the following error:*
>> $ qsub -cwd -pe orte 100 -o qsub.out -e qsub.err -b y -N abyss `which
>> mpirun` /home/genome/abyss/bin/ABYSS-P -k 50 s_1_1_sequence.txt -o av
>>
>>
>> [compute-0-19.local][[28273,1]
>> ,125][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
>> connect() to 173.16.255.231 failed: Connection refused (111)
>> [compute-0-19.local][[28273,1],127][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
>> connect() to 173.16.255.231 failed: Connection refused (111)
>> [compute-0-23.local][[28273,1],135][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
>> connect() to 173.16.255.228 failed: Connection refused (111)
>> [compute-0-23.local][[28273,1],133][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
>> connect() to 173.16.255.228 failed: Connection refused (111)
>> [compute-0-4.local][[28273,1],113][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
>> connect() to 173.16.255.231 failed: Connection refused (111)
>>
>>
>>
>> Best Regards,
>> Ashwani
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>