Okay, let's try spreading them out more, just to avoid putting more on a node than you actually need. Add -bynode to your cmd line. This will spread the procs across all the nodes.

Our default mode is "byslot", which means we fill each node before adding procs to the next one. "bynode" puts one proc on each node, wrapping around until all procs have been assigned. You may lose a little performance as shared memory can't be used as much, but at least it has a better chance of running.


On Oct 14, 2011, at 1:29 PM, Ashwani Kumar Mishra wrote:

Hi Ralph,
No idea how much this program consumes the numbers of file descriptors :(

Best Regards,
Ashwani

On Sat, Oct 15, 2011 at 12:08 AM, Ralph Castain <rhc@open-mpi.org> wrote:
Should be plenty for us - does your program consume a lot?


On Oct 14, 2011, at 12:25 PM, Ashwani Kumar Mishra wrote:

Hi Ralph,
fs.file-max = 100000
is this ok or less?

Best Regards,
Ashwani


On Fri, Oct 14, 2011 at 11:45 PM, Ralph Castain <rhc@open-mpi.org> wrote:
Can't offer much about the qsub job. On the first one, what is your limit on the number of file descriptors? Could be your sys admin has it too low.


On Oct 14, 2011, at 12:07 PM, Ashwani Kumar Mishra wrote:

Hello,
When i try to run the following command i receive the following error when i try to submit this job on the cluster having 40 nodes with each node having 8 processor & 8 GB RAM:

Both the command work well, as long as i use only upto 88 processors in the cluster, but the moment i allocate more than 88 processors it gives me the below 2 errors:

I tried to set the ulimit to unlimited & setting mca parameter opal_set_max_sys_limits to 1 but still the problem wont go.


$ mpirun=/opt/psc/ompi/bin/mpirun abyss-pe np=100 name=cattle k=50 n=10  in=s_1_1_sequence.txt

/opt/mpi/openmpi/1.3.3/intel/
bin/mpirun -np 100 ABYSS-P -k50 -q3  --coverage-hist=coverage.hist -s cattle-bubbles.fa  -o cattle-1.fa s_1_1_sequence.txt
[coe:19807] [[62863,0],0] ORTE_ERROR_LOG: The system limit on number of pipes a process can open was reached in file base/iof_base_setup.c at line 107
[coe.:19807] [[62863,0],0] ORTE_ERROR_LOG: The system limit on number of pipes a process can open was reached in file odls_default_module.c at line 203
[coe.:19807] [[62863,0],0] ORTE_ERROR_LOG: The system limit on number of network connections a process can open was reached in file oob_tcp.c at line 447
--------------------------------------------------------------------------
Error: system limit exceeded on number of network connections that can be open

This can be resolved by setting the mca parameter opal_set_max_sys_limits to 1,
increasing your limit descriptor setting (using limit or ulimit commands),
or asking the system administrator to increase the system limit.
--------------------------------------------------------------------------
make: *** [cattle-1.fa] Error 1




When i submit the same job through qsub, i receive the following error:

$ qsub  -cwd -pe  orte 100 -o qsub.out -e qsub.err -b y -N  abyss `which mpirun` /home/genome/abyss/bin/ABYSS-P -k 50 s_1_1_sequence.txt -o av


[compute-0-19.local][[28273,1]
,125][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] connect() to 173.16.255.231 failed: Connection refused (111)
[compute-0-19.local][[28273,1],127][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] connect() to 173.16.255.231 failed: Connection refused (111)
[compute-0-23.local][[28273,1],135][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] connect() to 173.16.255.228 failed: Connection refused (111)
[compute-0-23.local][[28273,1],133][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] connect() to 173.16.255.228 failed: Connection refused (111)
[compute-0-4.local][[28273,1],113][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] connect() to 173.16.255.231 failed: Connection refused (111)



Best Regards,
Ashwani



_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users