What version of OMPI are you using? That error message looks like something from an ancient version - might be worth updating.

On Dec 13, 2010, at 4:04 AM, peifan wrote:

i have 3 nodes, one is master node and another is computing nodes,these nodes deployed in the internet (not in cluster)

when i running NPB (NASA parallel benchmark) in one node (use 2 processes)
 mpirun -np 2  exe. 
I can get the successful result, but when i running in two nodes(for example running on B and C nodes) i got a fail
mprirun -nolocal -hostfile hostfile -np 2 exe.
the fail information is :
B [0,1,0] connectimeout ,connect() fail errno=110 
C [0,1,1] connectimeout ,connect() fail errno=110
but the connect between B and  C has no problem, because i can use ping and ssh form B to C (or C to B).
I think this problem may be caused by the para connectimeout (so little that lead  fail?). Because my nodes deployed on internet so delay is bigger. 
who can help me attack this problem and how to set the connectimeout in openmpi?





网易163/126邮箱百分百兼容iphone ipad邮件收发
_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users