Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Error when using more than 88 processors for a specific executable -Abyss
From: Ralph Castain (rhc_at_[hidden])
Date: 2011-10-14 14:15:52


Can't offer much about the qsub job. On the first one, what is your limit on the number of file descriptors? Could be your sys admin has it too low.

On Oct 14, 2011, at 12:07 PM, Ashwani Kumar Mishra wrote:

> Hello,
> When i try to run the following command i receive the following error when i try to submit this job on the cluster having 40 nodes with each node having 8 processor & 8 GB RAM:
>
> Both the command work well, as long as i use only upto 88 processors in the cluster, but the moment i allocate more than 88 processors it gives me the below 2 errors:
>
> I tried to set the ulimit to unlimited & setting mca parameter opal_set_max_sys_limits to 1 but still the problem wont go.
>
>
> $ mpirun=/opt/psc/ompi/bin/mpirun abyss-pe np=100 name=cattle k=50 n=10 in=s_1_1_sequence.txt
>
> /opt/mpi/openmpi/1.3.3/intel/
> bin/mpirun -np 100 ABYSS-P -k50 -q3 --coverage-hist=coverage.hist -s cattle-bubbles.fa -o cattle-1.fa s_1_1_sequence.txt
> [coe:19807] [[62863,0],0] ORTE_ERROR_LOG: The system limit on number of pipes a process can open was reached in file base/iof_base_setup.c at line 107
> [coe.:19807] [[62863,0],0] ORTE_ERROR_LOG: The system limit on number of pipes a process can open was reached in file odls_default_module.c at line 203
> [coe.:19807] [[62863,0],0] ORTE_ERROR_LOG: The system limit on number of network connections a process can open was reached in file oob_tcp.c at line 447
> --------------------------------------------------------------------------
> Error: system limit exceeded on number of network connections that can be open
>
> This can be resolved by setting the mca parameter opal_set_max_sys_limits to 1,
> increasing your limit descriptor setting (using limit or ulimit commands),
> or asking the system administrator to increase the system limit.
> --------------------------------------------------------------------------
> make: *** [cattle-1.fa] Error 1
>
>
>
>
> When i submit the same job through qsub, i receive the following error:
> $ qsub -cwd -pe orte 100 -o qsub.out -e qsub.err -b y -N abyss `which mpirun` /home/genome/abyss/bin/ABYSS-P -k 50 s_1_1_sequence.txt -o av
>
>
> [compute-0-19.local][[28273,1]
> ,125][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] connect() to 173.16.255.231 failed: Connection refused (111)
> [compute-0-19.local][[28273,1],127][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] connect() to 173.16.255.231 failed: Connection refused (111)
> [compute-0-23.local][[28273,1],135][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] connect() to 173.16.255.228 failed: Connection refused (111)
> [compute-0-23.local][[28273,1],133][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] connect() to 173.16.255.228 failed: Connection refused (111)
> [compute-0-4.local][[28273,1],113][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] connect() to 173.16.255.231 failed: Connection refused (111)
>
>
>
> Best Regards,
> Ashwani
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users