Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] How to add nodes while running job
From: Ralph Castain (rhc_at_[hidden])
Date: 2011-08-27 09:12:59


OMPI has no way of knowing that you will turn the node on at some future point. All it can do is try to launch the job on the provided node, which fails because the node doesn't respond.

You'll have to come up with some scheme for telling the node to turn on in anticipation of starting a job - a resource manager is typically used for that purpose.

On Aug 27, 2011, at 6:58 AM, Rafael Braga wrote:

> I would like to know how to add nodes during a job execution.
> Now my hostfile has the node 10.0.0.23 that is off,
> I would start this node during the execution so that the job can use it
> When I run the command:
>
> mpirun -np 2 -hostfile /tmp/hosts application
>
> the following message appears:
>
> ssh: connect to host 10.0.0.23 port 22: No route to host
> --------------------------------------------------------------------------
> A daemon (pid 10773) died unexpectedly with status 255 while attempting
> to launch so we are aborting.
>
> There may be more information reported by the environment (see above).
>
> This may be because the daemon was unable to find all the needed shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --------------------------------------------------------------------------
> mpirun: clean termination accomplished
>
> thanks a lot,
>
> --
> Rafael Braga
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users