Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Ralph Castain (rhc_at_[hidden])
Date: 2006-09-07 23:19:13


Hi Pak

I can't say for certain, but I believe the problem relates to a change we
made in the summer to the default universe name. I encountered a similar
problem with the Eclipse folks at that time.

What happened was that Josh was encountering a problem relating to the
default universe name when working on orte-ps. At that time, we
restructured the default universe name to be "default-pid". This solved the
orte-ps problem.

However, it created a problem in persistent operations - namely, it became
impossible for a process to "know" the name of the persistent daemon's
universe. I'm not entirely certain that we fixed that problem.

Here's how you can check:

1. run "orted --debug --persistent --seed --scope public" in one window. You
will see a bunch of diagnostic output that eventually will stop, leaving the
orted waiting for commands.

2. run "mpirun -n 1 uptime" in another window. You should see the orted
window scroll a bunch of diagnostic output as the application runs. If you
don't, then you know that you did NOT connect to the persistent orted - and
you have found the problem.

If this is the case, the solution is actually rather trivial: just tell the
orted and mpirun the name of the universe they are to use. It would look
like this:

"orted --persistent --seed --scope public --universe foo"

"mpirun --universe foo -n 1 uptime

If you do that in the two windows (adding the "--debug" option to the orted
as before), you should see the orted window dump a bunch of diagnostic
output.

Hope that helps. Please let us know what you find out - if this is the
problem, we need to find a solution that allows default universe
connections, or else document this clearly.

Ralph

On 9/7/06 3:55 PM, "Pak Lui" <Pak.Lui_at_[hidden]> wrote:

> Hi Edgar,
>
> I tried starting the persistent orted before running the client/server
> executables without the MPI_Publish_name/MPI_Lookup_name, I am still
> getting the same kind of failure, as reported by Rolf earlier (in trac#252).
>
> The server prints the port and I feed in the port info to the client.
> Could you point out what we should have done to make this work?
>
> http://svn.open-mpi.org/trac/ompi/ticket/252
>
> Edgar Gabriel wrote:
>> Hi,
>>
>> sorry for the delay on your request.
>>
>> There are two things which have to do in order to make a client/server
>> example work with Open MPI right now (assuming you are using
>> MPI_Comm_connect/MPI_Comm_accept)
>>
>> First, you have to start the orted daemon in a persistent mode, e.g.
>>
>> orted --persistent --seed --scope public
>>
>> Second, you can not use right now MPI_Publish_name/MPI_Lookup_name
>> accross multiple jobs, this is unfortunatly a known bug. (Name
>> publishing works within the same job however). So what you would have to
>> do is pass the port-information of the MPI_Comm_accept call somehow to
>> the other process (e.g. printing it using a printf statement in the
>> server application and pass it as an input argument to the client
>> application).
>>
>> Hope this helps
>> Edgar
>>
>>
>> Eng. A.A. Isola wrote:
>>> "It's not possible to connect!!!!"
>>>
>>> Hi Devel list, crossposting as this
>>> is getting weird...
>>>
>>> I did a client/server using MPI_Publish_name /
>>> MPI_Lookup_name
>>> and it runs fine on both MPICH2 and LAM-MPI but fail
>>> on Open MPI. It's
>>> not a simple failure (ie. returning an error code)
>>> it breaks the
>>> execution line and quits. The server continue to run
>>> after the
>>> client's crash.
>>>
>>>
>>> The server also use 100% of CPU while
>>> running, what doesn't happen with LAM.
>>>
>>>
>>> The code is here:
>>> http://www.
>>> systemcall.com.br/rengolin/open-mpi/
>>>
>>>
>>> OpenMP version: 1.1.1
>>>
>>>
>>> Compiling:
>>> mpiCC -o server server.c
>>> mpiCC -o client client.c
>>> - or
>>> -
>>> mpiCC -o client client.c -DUSE_LOOKUP
>>>
>>>
>>> Running & Output:
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>