Sorry that I did not make it clear. Actually, we are able to run in this way with less than 100 processes.
I put ‘hostname1.domain.com,1,2,3,4,5,6,7,8,9,…..,196,197,198,199’ in a hostfile and it does not work. I wonder what might be the equivalent format in a hostfile.
You might try putting that list of hosts in a hostfile instead of on the cmd line - you may be hitting some limits there.
I also don't believe that you can add an orted in that manner - orterun will have no idea how it got there and is likely to abort.
On Mar 1, 2012, at 3:20 PM, Jianzhang He wrote:
I am not sure if this is the right place to post this question. If you know where it is appropriate, please let me know.
I need to run application that launches 200 processes with the command:
1) orterun --prefix ./ -np 200 -wd ./ -host hostname1.domain.com,1,2,3,4,5,6,7,8,9,…..,196,197,198,199 CMD
Later, I will run a command to communicate with 1) with a command like:
2) orted -mca ess env -mca orte_ess_ -mca orte_ess_vpid 100 -mca orte_ess_num_procs 200 --hnp-uri "job#;tcp:/ hostname1.domain.com /:port#"
The problem I have is I can only run with about 100 nodes. If the number is higher, 1) will not invoke CMD and the total number of processes is about 130 or so.
My question is how to remove that limit?
Thanks in advance.
users mailing list