Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] restricting a job to a set of hosts
From: Erik Nelson (nelsonerikd_at_[hidden])
Date: 2012-07-28 01:40:32


Reuti,

>-nolocal is IMO an option where you want to execute the `mpirun` on your
local login machine and want the MPI >processes to be allocated somewhere
in the cluster, in case you don't have any queuing system around to manage
>the resources.

yes, this is exactly my understanding of the -nolocal option. Otherwise, by
specifying an 'image set' of processors,
everything gets 'mapped' to some subset of processors in the image set.
Again, thanks for your response.

On Fri, Jul 27, 2012 at 5:15 AM, Reuti <reuti_at_[hidden]> wrote:

> Am 27.07.2012 um 03:21 schrieb Ralph Castain:
>
> > Application processes will *only* be placed on nodes included in the
> allocation. The -nolocal flag is intended to ensure that no application
> processes are started on the same node as mpirun in the case where that
> node is included in the allocation. This happens, for example, with Torque,
> where mpirun is executed on one of the allocated nodes.
>
> But the behavior is the same in Torque and SGE. The jobscript is executed
> on one of the elected exechosts (neither the submit host, nor the qmaster
> host [unless they are exechosts too]) and so eligible to be used too. In no
> case there should be -nolocal being used.
>
> -nolocal is IMO an option where you want to execute the `mpirun` on your
> local login machine and want the MPI processes to be allocated somewhere in
> the cluster, in case you don't have any queuing system around to manage the
> resources.
>
> -- Reuti
>
> > I believe SGE doesn't do that - and so the allocation won't include the
> submit host, in which case you don't need -nolocal.
> >
> >
> > On Jul 26, 2012, at 5:58 PM, Erik Nelson wrote:
> >
> >> I was under the impression that the -nolocal option keeps processes off
> the submit
> >> host (since there may be hundreds or thousands of jobs submitted at any
> time,
> >> and we don't want this host to be overloaded).
> >>
> >> My understanding of what you said in you last email is that, by listing
> the hosts, I
> >> automatically send all processes (parent and child, or master and slave
> if you
> >> prefer) to the specified list of hosts.
> >>
> >> Reading your email below, it looks like this was the correct
> understanding.
> >>
> >>
> >> On Thu, Jul 26, 2012 at 5:20 PM, Reuti <reuti_at_[hidden]>
> wrote:
> >> Am 26.07.2012 um 23:58 schrieb Erik Nelson:
> >>
> >> > Reuti,
> >> >
> >> > Thank you. Our queue is backed up, so it will take a little while
> before I can try this.
> >> >
> >> > I assume that by specifying the nodes this way, I don't need (and it
> would confuse
> >> > the system) to add -nolocal. In other words, qsub will try to put the
> parent node
> >> > somewhere in this set.
> >> >
> >> > Is this the idea?
> >>
> >> Depends what you refer to by "parent node". I assume you mean the
> submit host. This is never included in any created selection of SGE unless
> it's an execution host too.
> >>
> >> The master host of the parallel job (i.e. the one where the jobscript
> with the `mpiexec` is running) will be used as a normal machine from MPI's
> point of view.
> >>
> >> -- Reuti
> >>
> >>
> >> > Erik
> >> >
> >> >
> >> > On Thu, Jul 26, 2012 at 4:48 PM, Reuti <reuti_at_[hidden]>
> wrote:
> >> > Am 26.07.2012 um 23:33 schrieb Erik Nelson:
> >> >
> >> > > I have a purely parallel job that runs ~100 processes. Each process
> has ~identical
> >> > > overhead so the speed of the program is dominated by the slowest
> processor.
> >> > >
> >> > > For this reason, I would like to restrict the job to a specific set
> of identical (fast)
> >> > > processors on our cluster.
> >> > >
> >> > > I read the FAQ on -hosts and -hostfile, but it is still unclear to
> me what affect these
> >> > > directives will have in a queuing environment.
> >> > >
> >> > > Currently, I submit the job using the "qsub" command in the "sge"
> environment as :
> >> > >
> >> > > qsub -pe mpich 101 jobfile.job
> >> > >
> >> > > where jobfile contains the command
> >> > >
> >> > > mpirun -np 101 -nolocal ./executable
> >> >
> >> > I would leave -nolocal out here.
> >> >
> >> > $ qsub -l
> "h=compute-5-[1-9]|compute-5-1[0-9]|compute-5-2[0-9]|compute-5-3[0-2]" -pe
> mpich 101 jobfile.job
> >> >
> >> > -- Reuti
> >> >
> >> >
> >> > > I would like to restrict the job to nodes compute-5-1 to
> compute-5-32 on our machine,
> >> > > each containing 8 cpu's (slots). How do I go about this?
> >> > >
> >> > > Thanks, Erik
> >> > >
> >> > > --
> >> > > Erik Nelson
> >> > >
> >> > > Howard Hughes Medical Institute
> >> > > 6001 Forest Park Blvd., Room ND10.124
> >> > > Dallas, Texas 75235-9050
> >> > >
> >> > > p : 214 645 5981
> >> > > f : 214 645 5948
> >> > > _______________________________________________
> >> > > users mailing list
> >> > > users_at_[hidden]
> >> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> >
> >> >
> >> > _______________________________________________
> >> > users mailing list
> >> > users_at_[hidden]
> >> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> >
> >> >
> >> >
> >> > --
> >> > Erik Nelson
> >> >
> >> > Howard Hughes Medical Institute
> >> > 6001 Forest Park Blvd., Room ND10.124
> >> > Dallas, Texas 75235-9050
> >> >
> >> > p : 214 645 5981
> >> > f : 214 645 5948
> >> > _______________________________________________
> >> > users mailing list
> >> > users_at_[hidden]
> >> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >>
> >>
> >> --
> >> Erik Nelson
> >>
> >> Howard Hughes Medical Institute
> >> 6001 Forest Park Blvd., Room ND10.124
> >> Dallas, Texas 75235-9050
> >>
> >> p : 214 645 5981
> >> f : 214 645 5948
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
Erik Nelson
Howard Hughes Medical Institute
6001 Forest Park Blvd., Room ND10.124
Dallas, Texas 75235-9050
p : 214 645 5981
f : 214 645 5948