Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Ralph Castain (rhc_at_[hidden])
Date: 2006-09-07 23:19:13

Hi Pak

I can't say for certain, but I believe the problem relates to a change we
made in the summer to the default universe name. I encountered a similar
problem with the Eclipse folks at that time.

What happened was that Josh was encountering a problem relating to the
default universe name when working on orte-ps. At that time, we
restructured the default universe name to be "default-pid". This solved the
orte-ps problem.

However, it created a problem in persistent operations - namely, it became
impossible for a process to "know" the name of the persistent daemon's
universe. I'm not entirely certain that we fixed that problem.

Here's how you can check:

1. run "orted --debug --persistent --seed --scope public" in one window. You
will see a bunch of diagnostic output that eventually will stop, leaving the
orted waiting for commands.

2. run "mpirun -n 1 uptime" in another window. You should see the orted
window scroll a bunch of diagnostic output as the application runs. If you
don't, then you know that you did NOT connect to the persistent orted - and
you have found the problem.

If this is the case, the solution is actually rather trivial: just tell the
orted and mpirun the name of the universe they are to use. It would look
like this:

"orted --persistent --seed --scope public --universe foo"

"mpirun --universe foo -n 1 uptime

If you do that in the two windows (adding the "--debug" option to the orted
as before), you should see the orted window dump a bunch of diagnostic

Hope that helps. Please let us know what you find out - if this is the
problem, we need to find a solution that allows default universe
connections, or else document this clearly.


On 9/7/06 3:55 PM, "Pak Lui" <Pak.Lui_at_[hidden]> wrote:

> Hi Edgar,
> I tried starting the persistent orted before running the client/server
> executables without the MPI_Publish_name/MPI_Lookup_name, I am still
> getting the same kind of failure, as reported by Rolf earlier (in trac#252).
> The server prints the port and I feed in the port info to the client.
> Could you point out what we should have done to make this work?
> Edgar Gabriel wrote:
>> Hi,
>> sorry for the delay on your request.
>> There are two things which have to do in order to make a client/server
>> example work with Open MPI right now (assuming you are using
>> MPI_Comm_connect/MPI_Comm_accept)
>> First, you have to start the orted daemon in a persistent mode, e.g.
>> orted --persistent --seed --scope public
>> Second, you can not use right now MPI_Publish_name/MPI_Lookup_name
>> accross multiple jobs, this is unfortunatly a known bug. (Name
>> publishing works within the same job however). So what you would have to
>> do is pass the port-information of the MPI_Comm_accept call somehow to
>> the other process (e.g. printing it using a printf statement in the
>> server application and pass it as an input argument to the client
>> application).
>> Hope this helps
>> Edgar
>> Eng. A.A. Isola wrote:
>>> "It's not possible to connect!!!!"
>>> Hi Devel list, crossposting as this
>>> is getting weird...
>>> I did a client/server using MPI_Publish_name /
>>> MPI_Lookup_name
>>> and it runs fine on both MPICH2 and LAM-MPI but fail
>>> on Open MPI. It's
>>> not a simple failure (ie. returning an error code)
>>> it breaks the
>>> execution line and quits. The server continue to run
>>> after the
>>> client's crash.
>>> The server also use 100% of CPU while
>>> running, what doesn't happen with LAM.
>>> The code is here:
>>> http://www.
>>> OpenMP version: 1.1.1
>>> Compiling:
>>> mpiCC -o server server.c
>>> mpiCC -o client client.c
>>> - or
>>> -
>>> mpiCC -o client client.c -DUSE_LOOKUP
>>> Running & Output:
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]