Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Fails to run "MPI_Comm_spawn" on remote host
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-09-15 22:39:51


We don't support the ability to add a new host during a comm_spawn call in
the 1.3 series. This is a feature that is being added for the upcoming new
feature series release (tagged 1.5).

There are two solutions to this problem in 1.3:

1. declare all hosts at the beginning of the job. You can then specify which
one to use with the "host" key.

2. you -can- add a hostfile to the job during a comm_spawn. This is done
with the "add-hostfile" key. All the hosts in the hostfile will be added to
the job. You can then specify which host(s) to use for this particular
comm_spawn with the "host" key.

All of this is documented - you should see it with a "man MPI_Comm_spawn"
command.

If you need to dynamically add a host via "host" before then, you could try
downloading a copy of the developer's trunk from the OMPI web site. It is
implemented there at this time - and also documented via the man page.

Ralph

On Tue, Sep 15, 2009 at 5:14 PM, Jaison Paul <jmulerik_at_[hidden]> wrote:

> Hi All,
> I am waiting on some inputs on my query. I just wanted to know whether I
> can run dynamic child processes using 'MPI_Comm_spawn' on remote hosts? (in
> openmpi 1.3.2)). Has anyone did that successfully? Or OpenMPI hasnt
> implemented it yet?
>
> Please help.
>
> Jaison
> http://cs.anu.edu.au/~Jaison.Mulerikkal/Home.html>
>
>
>
>
> On 14/09/2009, at 8:45 AM, Jaison Paul wrote:
>
> Hi,
>
> I am trying to create a library using OpenMPI for an SOA middleware for my
> Phd research. "MPI_Comm_spawn" is the one I need to go for. I got a
> sample example working, but only on the local host. Whenever I try to run
> the spawned children on a remote hosts, parent cannot launch children on
> remote hosts and I get the following error message:
>
> ------------------BEGIN MPIRUN AND ERROR MSG------------------------
> mpirun --prefix /opt/mpi/ompi-1.3.2/ --mca btl_tcp_if_include eth0 -np 1
> /home/jaison/mpi/advanced_MPI/spawn/manager
> Manager code started - host headnode -- myid & world_size 0 1
> Host is: myhost
> WorkDir is: /home/jaison/mpi/advanced_MPI/spawn/lib
> --------------------------------------------------------------------------
> There are no allocated resources for the application
> /home/jaison/mpi/advanced_MPI/spawn//lib
> that match the requested mapping:
>
>
> Verify that you have mapped the allocated resources properly using the
> --host or --hostfile specification.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> A daemon (pid unknown) died unexpectedly on signal 1 while attempting to
> launch so we are aborting.
>
> There may be more information reported by the environment (see above).
>
> This may be because the daemon was unable to find all the needed shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> --------------------------------------------------------------------------
> mpirun: clean termination accomplished
> --------------------------END OF ERROR
> MSG-----------------------------------
>
> I use the reserved keys - 'host' & 'wdir' - to set the remote host and work
> directory using MPI_Info. Here is the code snippet:
>
> --------------------------BEGIN Code
> Snippet-----------------------------------
> MPI_Info hostinfo;
> MPI_Info_create(&hostinfo);
> MPI_Info_set(hostinfo, "host", "myhost");
> MPI_Info_set(hostinfo, "wdir",
> "/home/jaison/mpi/advanced_MPI/spawn/lib");
>
> // Checking for 'hostinfo'. The results are okay (see above)
> int test0 = MPI_Info_get(hostinfo, "host", valuelen, value, &flag);
> int test = MPI_Info_get(hostinfo, "wdir", valuelen, value1, &flag);
> printf("Host is: %s\n", value);
> printf("WorkDir is: %s\n", value1);
>
> sprintf( launched_program, "launched_program" );
>
> MPI_Comm_spawn( launched_program, MPI_ARGV_NULL , number_to_spawn,
> hostinfo, 0, MPI_COMM_SELF, &everyone,
> MPI_ERRCODES_IGNORE );
>
> --------------------------END OF Code
> Snippet-----------------------------------
>
> I've set the LD_LIBRARY_PATH correctly. Is "MPI_Comm_spawn" implemented in
> open mpi (I am using version 1.3.2)? If so, where am I going wrong? Any
> input will be very much appreciated.
>
> Thanking you in advance.
>
> Jaison
> jmulerik_at_[hidden]
>
http://cs.anu.edu.au/~Jaison.Mulerikkal/Home.html>
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
>
http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>