Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Eng. A.A. Isola (alfonso.isola_at_[hidden])
Date: 2006-09-07 16:20:35


"It's not possible to connect!!!!"

Hi Devel list, crossposting as this
is getting weird...

I did a client/server using MPI_Publish_name /
MPI_Lookup_name
and it runs fine on both MPICH2 and LAM-MPI but fail
on Open MPI. It's
not a simple failure (ie. returning an error code)
it breaks the
execution line and quits. The server continue to run
after the
client's crash.

The server also use 100% of CPU while
running, what doesn't happen with LAM.

The code is here:
http://www.
systemcall.com.br/rengolin/open-mpi/

OpenMP version: 1.1.1

Compiling:
mpiCC -o server server.c
mpiCC -o client client.c
 - or
-
mpiCC -o client client.c -DUSE_LOOKUP

Running & Output:

-- 
Server -- 
sbornia$ mpiexec server foo 
server Process Rank 0 ,TOT 
processes 1 on sbornia 
Server foo available at 0.1.0:2000 
-- 
Client without USE_LOOKUP -- 
sbornia$ mpiexec client foo 
Rank Client 
Process 0 ,TOT processes 1 on sbornia 
[sbornia:06246] [0,1,0] 
ORTE_ERROR_LOG: Pack data mismatch in file 
dss/dss_unpack.c at line 
171 
[sbornia:06246] [0,1,0] ORTE_ERROR_LOG: Pack data mismatch in 
file 
dss/dss_unpack.c at line 145 
[sbornia:06246] *** An error 
occurred in MPI_Comm_connect 
[sbornia:06246] *** on communicator 
MPI_COMM_WORLD 
[sbornia:06246] *** MPI_ERR_UNKNOWN: unknown error 
[sbornia:06246] *** MPI_ERRORS_ARE_FATAL (goodbye) 
[sbornia:06243] 
[0,0,0]-[0,1,0] mca_oob_tcp_msg_recv: readv failed 
with errno=104 
-- Client with USE_LOOKUP -- 
sbornia$ mpiexec client foo 
Rank Client 
Process 0 ,TOT processes 1 on sbornia 
[sbornia:06232] *** An error 
occurred in MPI_Lookup_name 
[sbornia:06232] *** on communicator 
MPI_COMM_WORLD 
[sbornia:06232] *** MPI_ERR_NAME: invalid name 
argument 
[sbornia:06232] *** MPI_ERRORS_ARE_FATAL (goodbye) 
[sbornia:
06229] [0,0,0]-[0,1,0] mca_oob_tcp_msg_recv: readv failed 
with 
errno=104 
OS error code 104: Connection reset by peer 
what are 
we doing wrong or where's the bug? 
thanks in advance! 
--alfonso & 
renato