Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] having trouble expanding our cluster
From: yewyong (uyong81_at_[hidden])
Date: 2009-10-20 09:33:32


Hi guys,

am new to cluster computing, so please bear with me.
currently trying to utilize the available desktops found in labs.

Started trying out 2 labs, (namely Lab A which have 192.168.0.xx IPs
and Lab B which have public IPs 202.185.77.xx series)

Have been running Paraview in Lab A without any problem so far.
(mpirun -np 18 -machinefile quad_lan pvserver)

As i tried to extend the cluster to Lab B's desktops, by specifying their
IPs in
my machinefile like:

202.185.77.110 slots=4 max-slots=4 *#(eth1 = 202.185.77.110; eth0 =
192.168.0.10)*
202.185.77.219 slots=2 max-slots=2 *<-- this is "pc226*"
192.168.0.227 slots=2 max-slots=2

my programs hangs while my client tries to connect to the pvserver.
if i wait long enough, it will show the error

yewyong_at_vrc1:~/installer/mpi_test> mpirun -np 8 -machinefile quad_hama
pvserver --use-offscreen-rendering
Listen on port: 11111
Waiting for client...
Client connected.
[pc226][[8636,1],6][btl_tcp_endpoint.c:631:mca_btl_tcp_endpoint_complete_connect]
connect() failed: Connection timed out (110)

note that when running "hello_world", it shows that the makeshift cluster
are able to communicate which each other.
after trying out different sequence in the machinefile, i found out that
whenever a cross IPs mpirun server is started, the system pauses
when the client tries to connect.

Would it be related to the problem raised in "
http://www.open-mpi.org/community/lists/devel/2009/07/6385.php"?

i'm not too sure if i've fully described the situation, please let me know
if you guys need any extra information.

thank you in advance.

yewyong.