Hi Devel list, crossposting as this is getting weird...
Alfonso did a client/server using MPI_Publish_name / MPI_Lookup_name
and it runs fine on both MPICH2 and LAM-MPI but fail on Open MPI. It's
not a simple failure (ie. returning an error code) it breaks the
execution line and quits. The server continue to run after the
client's crash.
The server also use 100% of CPU while running, what doesn't happen with LAM.
The code is here:
http://www.systemcall.com.br/rengolin/open-mpi/
OpenMP version: 1.1.1
Compiling:
mpiCC -o server server.c
mpiCC -o client client.c
- or -
mpiCC -o client client.c -DUSE_LOOKUP
Running & Output:
-- Server --
sbornia$ mpiexec server foo
server Process Rank 0 ,TOT processes 1 on sbornia
Server foo available at 0.1.0:2000
-- Client without USE_LOOKUP --
sbornia$ mpiexec client foo
Rank Client Process 0 ,TOT processes 1 on sbornia
[sbornia:06246] [0,1,0] ORTE_ERROR_LOG: Pack data mismatch in file
dss/dss_unpack.c at line 171
[sbornia:06246] [0,1,0] ORTE_ERROR_LOG: Pack data mismatch in file
dss/dss_unpack.c at line 145
[sbornia:06246] *** An error occurred in MPI_Comm_connect
[sbornia:06246] *** on communicator MPI_COMM_WORLD
[sbornia:06246] *** MPI_ERR_UNKNOWN: unknown error
[sbornia:06246] *** MPI_ERRORS_ARE_FATAL (goodbye)
[sbornia:06243] [0,0,0]-[0,1,0] mca_oob_tcp_msg_recv: readv failed
with errno=104
-- Client with USE_LOOKUP --
sbornia$ mpiexec client foo
Rank Client Process 0 ,TOT processes 1 on sbornia
[sbornia:06232] *** An error occurred in MPI_Lookup_name
[sbornia:06232] *** on communicator MPI_COMM_WORLD
[sbornia:06232] *** MPI_ERR_NAME: invalid name argument
[sbornia:06232] *** MPI_ERRORS_ARE_FATAL (goodbye)
[sbornia:06229] [0,0,0]-[0,1,0] mca_oob_tcp_msg_recv: readv failed
with errno=104
OS error code 104: Connection reset by peer
what are we doing wrong ?
thanks in advance!
--renato
|