Hi guys,
am new to cluster computing, so please bear with me.
currently trying to utilize the available desktops found in labs.
Started trying out 2 labs, (namely Lab A which have 192.168.0.xx IPs
and Lab B which have public IPs 202.185.77.xx series)
Have been running Paraview in Lab A without any problem so far.
(mpirun -np 18 -machinefile quad_lan pvserver)
As i tried to extend the cluster to Lab B's desktops, by specifying their IPs in
my machinefile like:
202.185.77.110 slots=4 max-slots=4 #(eth1 = 202.185.77.110; eth0 = 192.168.0.10)
202.185.77.219 slots=2 max-slots=2 <-- this is "pc226"
192.168.0.227 slots=2 max-slots=2
my programs hangs while my client tries to connect to the pvserver.
if i wait long enough, it will show the error
yewyong@vrc1:~/installer/mpi_test> mpirun -np 8 -machinefile quad_hama pvserver --use-offscreen-rendering
Listen on port: 11111
Waiting for client...
Client connected.
[pc226][[8636,1],6][btl_tcp_endpoint.c:631:mca_btl_tcp_endpoint_complete_connect] connect() failed: Connection timed out (110)
note that when running "hello_world", it shows that the makeshift cluster are able to communicate which each other.
after trying out different sequence in the machinefile, i found out that whenever a cross IPs mpirun server is started, the system pauses
when the client tries to connect.
Would it be related to the problem raised in "http://www.open-mpi.org/community/lists/devel/2009/07/6385.php"?
i'm not too sure if i've fully described the situation, please let me know if you guys need any extra information.
thank you in advance.
yewyong.