Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] How to add nodes while running job
From: Ralph Castain (rhc_at_[hidden])
Date: 2011-08-29 21:55:14


On Aug 29, 2011, at 5:40 AM, Reuti wrote:

> Am 27.08.2011 um 16:35 schrieb Ralph Castain:
>
>>
>> On Aug 27, 2011, at 8:28 AM, Rayson Ho wrote:
>>
>>> On Sat, Aug 27, 2011 at 9:12 AM, Ralph Castain <rhc_at_[hidden]> wrote:
>>>> OMPI has no way of knowing that you will turn the node on at some future
>>>> point. All it can do is try to launch the job on the provided node, which
>>>> fails because the node doesn't respond.
>>>> You'll have to come up with some scheme for telling the node to turn on in
>>>> anticipation of starting a job - a resource manager is typically used for
>>>> that purpose.
>>>
>>> Hi Ralph,
>>>
>>> Are you referring to a specific resource manager/batch system?? AFAIK,
>>> no common batch systems support MPI_Spawn properly...
>>
>> Usually, resource managers "turn on" nodes when allocating them for use by a job - SLURM is an example that does this. Helps the cluster save energy when not in use. I believe almost all the RM's out there now support this to some degree.
>>
>> Support for MPI_Comm_spawn (i.e., dynamically allocating new nodes as required by a running MPI job and turning them on) doesn't exist (to my knowledge) at this time, mostly because this MPI feature is so rarely used. I've helped (integrating from the OMPI side) several groups that were adding such support to various RM's (typically Torque), but I don't think that code has hit a distribution yet.
>
> Can you please point me to these projects?

Afraid I've lost touch with them over the last few years. They were being done by several students for their thesis work, so I don't know what, if any of it, was intended for public dissemination.

>
> I was always wondering how to phrase it in a submission request. It would need include to specify: I need 2 hrs 2 cores, then 30 minutes 1 core and finally 6 hrs 4 cores which targets already features of a real-time queuing system.

None of the work I participated in worked that way as it would be quite difficult to accurately predict when a job would need additional resources.

Instead, all used dynamic requests - i.e., the job that was doing a comm_spawn would request resources at the time of the comm_spawn call. I would pass the request to Torque, and if resources were available, immediately process them into OMPI and spawn the new job. If resources weren't available, I simply returned an error to the program so it could either (a) terminate, or (b) wait awhile and try again. One of the groups got ambitious and supported non-blocking requests (generated a callback to me with resources when they became available). Worked pretty well - might work even better once we get non-blocking MPI_Comm_spawn.

I believe they generally were happy with the results, though I think some of them wound up having Torque "hold" a global pool of resources to satisfy such requests, just to avoid blocking progress on the job while waiting for comm_spawn resources.

>
> -- Reuti
>
>
>
>>> Rayson
>>>
>>>
>>>
>>>
>>>> On Aug 27, 2011, at 6:58 AM, Rafael Braga wrote:
>>>>
>>>> I would like to know how to add nodes during a job execution.
>>>> Now my hostfile has the node 10.0.0.23 that is off,
>>>> I would start this node during the execution so that the job can use it
>>>> When I run the command:
>>>>
>>>> mpirun -np 2 -hostfile /tmp/hosts application
>>>>
>>>> the following message appears:
>>>>
>>>> ssh: connect to host 10.0.0.23 port 22: No route to host
>>>> --------------------------------------------------------------------------
>>>> A daemon (pid 10773) died unexpectedly with status 255 while attempting
>>>> to launch so we are aborting.
>>>>
>>>> There may be more information reported by the environment (see above).
>>>>
>>>> This may be because the daemon was unable to find all the needed shared
>>>> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
>>>> location of the shared libraries on the remote nodes and this will
>>>> automatically be forwarded to the remote nodes.
>>>> --------------------------------------------------------------------------
>>>> --------------------------------------------------------------------------
>>>> mpirun noticed that the job aborted, but has no info as to the process
>>>> that caused that situation.
>>>> --------------------------------------------------------------------------
>>>> mpirun: clean termination accomplished
>>>>
>>>> thanks a lot,
>>>>
>>>> --
>>>> Rafael Braga
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>>
>>>
>>> --
>>> Rayson
>>>
>>> ==================================================
>>> Open Grid Scheduler - The Official Open Source Grid Engine
>>> http://gridscheduler.sourceforge.net/
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users