"It's not possible to connect!!!!"
Hi Devel list, crossposting as this
is getting weird...
I did a client/server using MPI_Publish_name /
MPI_Lookup_name
and it runs fine on both MPICH2 and LAM-MPI but fail
on Open MPI. It's
not a simple failure (ie. returning an error code)
it breaks the
execution line and quits. The server continue to run
after the
client's crash.
The server also use 100% of CPU while
running, what doesn't happen with LAM.
The code is here:
http://www.
systemcall.com.br/rengolin/open-mpi/
OpenMP version: 1.1.1
Compiling:
mpiCC -o server server.c
mpiCC -o client client.c
- or
-
mpiCC -o client client.c -DUSE_LOOKUP
Running & Output:
--
Server --
sbornia$ mpiexec server foo
server Process Rank 0 ,TOT
processes 1 on sbornia
Server foo available at 0.1.0:2000
--
Client without USE_LOOKUP --
sbornia$ mpiexec client foo
Rank Client
Process 0 ,TOT processes 1 on sbornia
[sbornia:06246] [0,1,0]
ORTE_ERROR_LOG: Pack data mismatch in file
dss/dss_unpack.c at line
171
[sbornia:06246] [0,1,0] ORTE_ERROR_LOG: Pack data mismatch in
file
dss/dss_unpack.c at line 145
[sbornia:06246] *** An error
occurred in MPI_Comm_connect
[sbornia:06246] *** on communicator
MPI_COMM_WORLD
[sbornia:06246] *** MPI_ERR_UNKNOWN: unknown error
[sbornia:06246] *** MPI_ERRORS_ARE_FATAL (goodbye)
[sbornia:06243]
[0,0,0]-[0,1,0] mca_oob_tcp_msg_recv: readv failed
with errno=104
-- Client with USE_LOOKUP --
sbornia$ mpiexec client foo
Rank Client
Process 0 ,TOT processes 1 on sbornia
[sbornia:06232] *** An error
occurred in MPI_Lookup_name
[sbornia:06232] *** on communicator
MPI_COMM_WORLD
[sbornia:06232] *** MPI_ERR_NAME: invalid name
argument
[sbornia:06232] *** MPI_ERRORS_ARE_FATAL (goodbye)
[sbornia:
06229] [0,0,0]-[0,1,0] mca_oob_tcp_msg_recv: readv failed
with
errno=104
OS error code 104: Connection reset by peer
what are
we doing wrong or where's the bug?
thanks in advance!
--alfonso &
renato
|