On Tue, Jun 19, 2007 at 11:28:33AM -0700, George Bosilca wrote:
> The deadlock happens with or without your patch ? If it's with your
> patch, the problem might come from the fact that you start 2
> processes on each node and you will share the port range (because of
> your patch).
If process 1 is using a port then process 2 should try the next port in the
range until it finds an available one (or runs out of ports). There are 1000
ports to choose from, so it should be able to find a free one.
> Please re-run either with 2 processes by node but without your patch
> or with only one process by node with your patch.
The job won't run without my patch due to the restrictive firewall on the
individual machines. I ran tests with only a single process per node and
encountered the same problem, so the problem doesn't appear to be due to
processes arguing over ports.