On Jul 6, 2010, at 5:41 PM, Robert Walters wrote:
> Thanks for your expeditious responses, Ralph.
> Just to confirm with you, I should change openmpi-mca-params.conf to include:
> oob_tcp_port_min_v4 = (My minimum port in the range)
> oob_tcp_port_range_v4 = (My port range)
> btl_tcp_port_min_v4 = (My minimum port in the range)
> btl_tcp_port_range_v4 = (My port range)
That should do ya. Use the same values on all nodes. You should be able to confirm that OMPI's run-time system is working if you are able to mpirun a non-MPI program like "hostname" or somesuch. If that works, then the daemons are launching, talking to each other, launching the app, shuttling the I/O around, noticing that the app is dying, tidying everything up, and telling mpirun that everything is done. In short: lots of things are happening right if you're able to mpirun "hostname" across multiple hosts.
> Also, for a cluster of around 32-64 processes (8 processors per node), how wide of a range will I require? I've noticed some entries in the mailing list suggesting you need a few to get started and then it opens as necessary. Will I be safe with 20 or should I go for 100?
If you have 64 hosts, each with 8 processors, meaning that the largest MPI job you would run would be 64 * 8 = 512 MPI processes, then I'd ask for at least 1024 -- 2048 would be better (you have a zillion ports; better to ask for more than you need). We recently found a bug in the TCP BTL where it *may* use 2 sockets for each peerwise connection in some cases.
Additionally, your sysadmin *might* be more amenable to opening up ports *only between the cluster nodes* (vs. opening up the ports to anything). If that's the case, you might as well go for the gold and ask them if they can open up *all* the ports between all your nodes (while still rejecting everything from non-cluster nodes).
For corporate legal information go to: