On Aug 12, 2009, at 19:09 PM, Ralph Castain wrote:
> Hmmm...well, I'm going to ask our TCP friends for some help here.
> Meantime, I do see one thing that stands out. Port 4 is an awfully
> low port number that usually sits in the reserved range. I checked
> the /etc/services file on my Mac, and it was commented out as
> unassigned, which should mean it was okay.
> Still, that is an unusual number. The default minimum port number is
> 1024, so I'm puzzled how you wound up down there. Of course, could
> just be an error in the print statement, but let's try moving it to
> be safe? Set
> -mca btl_tcp_port_min_v4 36900 -mca btl_tcp_port_range_v4 32
> and see what happens.
What happens is that everything works now! Both connectivity_c and
the MITgcm. I haven't tried under torque yet, but lets declare an
openMPI victory at this point.
On Aug 13, 2009, at 8:28 AM, Jeff Squyres wrote:
> Agreed -- ports 4 and 260 should be in the reserved ports range.
> Are you running as root, perchance?
Errrr, no, but yes. My user account has admin privledges. A sloppy
workstation OS X habit I now regret propagating to my cluster. I'm
sorry to not mention it earlier as possibly relevant.
As a suggestion, btl_base_verbose could be mentioned as a good
debugging tool in the troubleshooting section of the FAQ. Its on the
page to do with tcp, which I admit I should have read as soon as I
realized there was a communication issue, but having it in the
troubleshooting section would be helpful too. i.e. maybe a more
erudite version of:
Checking connections between nodes:
Sometimes the configuration of a cluster makes it impossible for nodes
to communicate properly. To debug this it helps to include --mca
btl_base_verbose 30 as a command line argument (see http://www.open-mpi.org/faq/?category=tcp
for more information). The program example/connectivity_c.c is also
a useful minimal program for testing communication on the cluster.
Thanks again for everyone's help, particularly Ralph, Jeff and Gus.