Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Rolf Vandevaart (Rolf.Vandevaart_at_[hidden])
Date: 2006-03-31 11:28:01


Hi Ralph:
Thanks for your information. You said I could ask more so I am! See
below.

Ralph Castain wrote On 03/30/06 16:51,:

> Hi Rolf
>
> I apologize for the scarce documentation - we are working on it, but
> have a ways to go. I've tried to address your questions below. Please
> feel free to ask more!
>
> Ralph
>
> Rolf Vandevaart wrote:
>
>> Greetings:
>> I am new to the Open MPI world, and I have been trying to get a better
>> understanding of the ORTE environment. At this point, I have a few
>> questions that I was hoping someone could answer.
>>
>> 1. I have heard mention of running the ORTE daemons in persistent mode,
>> however, I can find no details of how to do this. Are there arguments
>> to either orted or mpirun to make this work right?
>
> Normally, we start a persistent daemon with:
> orted --seed --persistent --scope=public
>
> This will start the daemon and "daemonize" it so it keeps running
> until told to die. The arguments worth noting are:
>
> (a) --persistent. Tells the daemon to "stay alive" until specifically
> told to "die"
>
> (b) --scope=[public, private, exclusive]. This actually pertains to
> the universe, but you'll need to provide it anyway to ensure proper
> connectivity to anything you try to run. Right now, the daemons
> default to "exclusive", which means nothing can connect to them except
> the application that spawned them - no value to anyone if started with
> the above command! Private would exclude them to contact only from you
> - I haven't tested this enough to guarantee its functionality. I
> usually run them as "public" since security isn't a big concern right
> now - all this means is that anyone who can read the session directory
> tree (which is normally "locked" to only you anyway) would be able to
> connect to the daemon.
>
> (c) --seed. Indicates that this daemon is the first one and therefore
> will host the data storage for the registry and other central services
>
> (d) --universe=userid_at_hostname:universe_name. Allows you to name your
> universe to whatever you like. We use this to allow you to have
> multiple universes co-existing but separate - I've been explaining the
> reasons for that elsewhere, but will send them to this list if
> desired. You don't have to provide this, nor do you have to provide
> all the fields (e.g., you could just say "--universe=foo" to set the
> universe name).
>
> You can provide the same options to mpirun, if you like - mpirun will
> simply start an orted and pass those parameters along, and the orted
> will merrily stay alive after the specified application completes.
>
While I understand all that has been written here in theory, I am still
struggling
to get things to work.

The persistent daemon seems to be ignored when I do an mpirun. I have
watched the
system calls and looked at the process tree, and the persistent daemon
does not seem
to be part of the fun. So, I will be specific about what I am doing,
and maybe you can point
out what I am doing wrong.

I have a 3 node cluster. ct2, ct4, and ct5. I am launching the job
from ct2 and trying to
run on ct4 and ct5 which have persistent daemons on them. I have
selected the daemon
on ct4 to be the seed.

ct4> orted --seed --persistent --scope public -universe foo
ct5> orted --persistent --scope public -universe foo
ct2> mpirun --mca pls_rsh_agent rsh -np 4 -host ct4,ct5 -universe foo
my_connectivity -v

While the program is running, I see this on ct4 and ct5.

ps -ef | grep orted
   rolfv 9456 1 0 11:24:26 ? 0:00 orted --bootproxy 1
--name 0.0.2 --num_procs 3 --vpid_start 0 --nodename ct4
   rolfv 9386 1 0 11:21:30 ? 0:00 orted --seed
--persistent --scope public --universe foo
 
Thanks for any additional details.

*snip*

>> 3. I have a similar question about orteprobe. Is this something
>> we should know about?
>>
>
> Yes and no - there's nothing secret about it. We use it internally to
> OpenRTE to "probe" a machine and see if we have a daemon/universe
> operating on it. Basically, we launch orteprobe on the remote machine
> - it checks to see if a session directory exists on it, attempts to
> connect to any universes it finds, and then reports back on its
> findings. Based on that report, we either launch an orted on the
> remote machine (to act as our surrogate so we can launch an
> application on that cell) or connect to an existing universe on the
> remote machine (and then tell it to launch the application for us).
>
>> 4. Is there an easy way to view the data in the General Purpose
>> Registry? This may be related to my first question, in that I
>> could imagine having persistent daemons and then I would like
>> to see what is stored in the registry.
>>
>
> Well, yes and no. Ideally, that would be a command from within the
> orteconsole function, but I don't think that has been implemented yet.
> I'd be happy to do so, if that is something you would like (shouldn't
> take long at all). There are a set of "dump" functions in the registry
> API for just that purpose. I usually access them via gdb - I attach
> the debugger to the orted process, then use the dump functions to
> output the values in the registry.

What exactly do you type in for the dump functions? I saw these functions,
but could not get them to fire properly.

*snip*

Regards,
Rolf

-- 
=========================
rolf.vandevaart_at_[hidden]
781-442-3043
=========================