Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Edgar Gabriel (gabriel_at_[hidden])
Date: 2006-03-14 13:00:57


you are touching here a difficult area in Open MPI:

- name publishing across independent jobs does unfortunatly not work
right now (It does work, if all processes have been started by the same
mpirun or if the have been spawned by a father process using
MPI_Comm_spawn). Your approach with passing the port as a command line
option should work however.

- you have to start however the orted daemon *before* starting both jobs
using the flags
' orted --seed --persistent --scope public'
These flags are however currently just lightly tested, since a brand new
runtime environment with much better support for these operations is
currently under development.

- regarding the 'pack data mismatch': do both machines which you are
using have the same data representation? The reason I ask is because
this looks like a data type mismatch error, and Open MPI currently does
have some restriction regarding different data formats and endianness...

Thanks
Edgar

Robert Latham wrote:

> Hello
> In playing around with process management routines, I found another
> issue. This one might very well be operator error, or something
> implementation specific.
>
> I've got two processes (a and b), linked with openmpi, but started
> independently (no mpiexec).
>
> - A starts up and calls MPI_Init
> - A calls MPI_Open_port, prints out the port name to stdout, then
> calls MPI_Comm_accept and blocks.
> - B takes as a command line argument the port
> name printed out by A. It calls MPI_Init and then and passes that
> port name to MPI_Comm_connect
> - B gets the following error:
>
> [leela.mcs.anl.gov:04177] [0,0,0] ORTE_ERROR_LOG: Pack data mismatch
> in file ../../../orte/dps/dps_unpack.c at line 121
> [leela.mcs.anl.gov:04177] [0,0,0] ORTE_ERROR_LOG: Pack data mismatch
> in file ../../../orte/dps/dps_unpack.c at line 95
> [leela.mcs.anl.gov:04177] *** An error occurred in MPI_Comm_connect
> [leela.mcs.anl.gov:04177] *** on communicator MPI_COMM_WORLD
> [leela.mcs.anl.gov:04177] *** MPI_ERR_UNKNOWN: unknown error
> [leela.mcs.anl.gov:04177] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [leela.mcs.anl.gov:04177] [0,0,0] ORTE_ERROR_LOG: Not found in file
> ../../../../../orte/mca/pls/base/pls_base_proxy.c at line 183
>
> - A is still waiting for someone to connect to it.
>
> Did I pass MPI port strings between programs the correct way, or is
> MPI_Publish_name/MPI_Lookup_name the prefered way to pass around this
> information?
>
> Thanks
> ==rob
>

-- 
Edgar Gabriel
Assistant Professor
Department of Computer Science          email:gabriel_at_[hidden]
University of Houston                   http://www.cs.uh.edu/~gabriel
Philip G. Hoffman Hall, Room 524        Tel: +1 (713) 743-3857
Houston, TX-77204, USA                  Fax: +1 (713) 743-3335