Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Setting AUTOMAKE_JOBS
From: Ralph Castain (rhc_at_[hidden])
Date: 2010-09-25 10:29:04


First off, let me publicly apologize to Jeff - this email thread came across
wrong. I wasn't mad or upset, but was speaking more tongue-in-cheek. Email
is a bad medium for such nuances, and I should have realized that before
attempting it.

My concern was solely that we had introduced a new behavior that would be
generally considered unacceptable by developers if some other software
package (e.g., make) did it. In general, I don't think it is a big problem,
but it can cause problems for compute-challenged boxes such as my laptop
when it is engaged in other compute-intensive tasks.

What I was trying (poorly) to propose is that this capability be done the
traditional way of an option. To mimic 'make', I propose that we add a -j
option to autogen.pl. If a value is given, then that is the number of
parallel automake jobs we run. If no value is given, then we use Jeff's
heuristic to determine the number to run. Obviously, setting the
automake_jobs envar will have the same effect as -j.

This gives us the best of both worlds while not surprising users. I'd be
willing to help implement it, Jeff.

HTH
Ralph

On Sat, Sep 25, 2010 at 7:03 AM, Jeff Squyres <jsquyres_at_[hidden]> wrote:

> Everyone always complains to me about how long the build takes, so I took a
> step to reduce that time.
>
> I have been manually setting AUTOMAKE_JOBS for a long time and run lots of
> other things at the same time as automake such as mail, ppt, etc. and
> haven't seen any noticeable difference. Admittedly, I don't tend to run
> competing compute-intensive things -- that didn't seem like a good idea to
> me.
>
> So -- fine, I removed the auto-set of AUTOMAKE_JOBS in r23802.
>
>
> On Sep 24, 2010, at 11:55 PM, George Bosilca wrote:
>
> > I would accept this behavior, at the condition that the threads are
> running at the lowest priority. This will give us the best of the two
> worlds, parallel build if the node is empty, and not a significant
> disturbance if I'm still busy around the computer.
> >
> > George.
> >
> >
> > "All the books in the world contain no more information than is broadcast
> as video in a single large American city in a single year. Not all bits have
> equal value.". -- Carl Sagan
> >
> > On Sep 24, 2010, at 23:08, "Paul H. Hargrove" <PHHargrove_at_[hidden]>
> wrote:
> >
> >> I don't feel as strongly about this as Ralph, but do think the new
> behavior violates the "principle of least surprise".
> >>
> >> -Paul
> >>
> >> Ralph Castain wrote:
> >>> Been thinking about this more today, and I actually find this new
> "feature" disturbing. It bothers me that OMPI is now dictating that it will
> do a parallel build without my knowledge unless I specifically tell it not
> to. If it were technically possible, would we next force "make -j4"?? How
> would the developer community feel if the authors of "make" suddenly decided
> that it would run 4 parallel threads under the covers unless you
> specifically told it not to?
> >>>
> >>> What bugs me here is that I now have to remember to set something in my
> environment to tell OMPI "you don't get to hog all my processors". Maybe
> others twiddle their thumbs and leave the computer alone while OMPI builds,
> or maybe they rarely build - but I build frequently, and I am always
> multi-tasking my time (running Word, Powerpoint, etc.). So having OMPI
> default to running a parallel build is more than a little annoying -
> frankly, it pisses me off.
> >>>
> >>> I really feel that this "feature" should be implemented as an option
> passed to autogen instead of a hidden forced behavior. If someone wants to
> run a parallel build, then by all means let them ask for it (ala "make
> -j4"). But don't just -do- it.
> >>>
> >>> Grrrr....
> >>> Ralph
> >>>
> >>>
> >>> On Fri, Sep 24, 2010 at 7:28 AM, Ralph Castain <rhc_at_[hidden]<mailto:
> rhc_at_[hidden]>> wrote:
> >>>
> >>> I hope you'll understand if I don't run that test while on the
> >>> road...one battery yank per week is my limit :-)
> >>>
> >>>
> >>> On Fri, Sep 24, 2010 at 4:40 AM, Jeff Squyres (jsquyres)
> >>> <jsquyres_at_[hidden] <mailto:jsquyres_at_[hidden]>> wrote:
> >>>
> >>> Also to clarify:
> >>>
> >>> - did autogen set am-jobs to 2 in your case? (it should do
> >>> that if lstopo is not found - it also limits itself to 4 at max)
> >>>
> >>> - in the same scenario, what happens if you manually set
> >>> am-jobs to 1 and run autogen? Ie do you get the same
> >>> heat/sluggishness? I have experienced vms causing this kind
> >>> of behavior just because they are running - causing CPU and
> >>> memory pressure.
> >>> Sent from my PDA. No type good.
> >>> On Sep 24, 2010, at 12:49 AM, "Ralph Castain"
> >>> <rhc_at_[hidden] <mailto:rhc_at_[hidden]>> wrote:
> >>>
> >>>> Sent to both for reference (see below)
> >>>>
> >>>> Just to clarify. It wasn't a deadlock situation, but rather
> >>>> that the machine was overloaded and running so hard that the
> >>>> response to keystrokes was multiple seconds. Thus, there was
> >>>> no way to shut it down from the keyboard or screen. Even a
> >>>> ctrl-c was just getting ignored for a very long time due to
> >>>> the overload.
> >>>>
> >>>> I was running vmware on my machine, and doing a heavy
> >>>> compile/build in it. On top of this, I had email, editor, and
> >>>> browsers running - and then kicked off a fresh build in a
> >>>> terminal window. With Jeff's default settings, this latter
> >>>> build thought it would be running alone on the machine, and
> >>>> promptly generated a number of threads equal to all the
> >>>> processors. Since they were already loaded, this drove the
> >>>> machine into the ground.
> >>>>
> >>>> My point is just that it is unwise to assume that the OMPI
> >>>> build can utilize all available processors. I'm sure it's
> >>>> fine for the MTT runs, especially on Jeff's machines as they
> >>>> are dedicated to that purpose - just not a good general
> >>>> assumption.
> >>>>
> >>>>
> >>>> HTH
> >>>> Ralph
> >>>>
> >>>> ====================================
> >>>> Output of "perl -V":
> >>>>
> >>>> Summary of my perl5 (revision 5 version 8 subversion 9)
> >>>> configuration:
> >>>> Platform:
> >>>> osname=darwin, osvers=10.2.0, archname=darwin-2level
> >>>> uname='darwin sjc-rcastain-87111.cisco.com
> >>>> <http://sjc-rcastain-87111.cisco.com> 10.2.0 darwin kernel
> >>>> version 10.2.0: tue nov 3 10:37:10 pst 2009;
> >>>> root:xnu-1486.2.11~1release_i386 i386 '
> >>>> config_args='-des -D prefix=/opt/local -D
> >>>> scriptdir=/opt/local/bin -D cppflags=-I/opt/local/include -D
> >>>> ccflags=-O2 -arch x86_64 -D ldflags=-L/opt/local/lib -D
> >>>> vendorprefix=/opt/local -D man1ext=1pm -D man3ext=3pm -D
> >>>> cc=/usr/bin/gcc-4.2 -D ld=/usr/bin/gcc-4.2 -D
> >>>> man1dir=/opt/local/share/man/man1p -D
> >>>> man3dir=/opt/local/share/man/man3p -D
> >>>> siteman1dir=/opt/local/share/man/man1 -D
> >>>> siteman3dir=/opt/local/share/man/man3 -D
> >>>> vendorman1dir=/opt/local/share/man/man1 -D
> >>>> vendorman3dir=/opt/local/share/man/man3 -D
> >>>> inc_version_list=5.8.8 5.8.8/darwin-2level -U i_bind -U
> >>>> i_gdbm -U i_db'
> >>>> hint=recommended, useposix=true, d_sigaction=define
> >>>> usethreads=undef use5005threads=undef useithreads=undef
> >>>> usemultiplicity=undef
> >>>> useperlio=define d_sfio=undef uselargefiles=define
> >>>> usesocks=undef
> >>>> use64bitint=define use64bitall=define uselongdouble=undef
> >>>> usemymalloc=n, bincompat5005=undef
> >>>> Compiler:
> >>>> cc='/usr/bin/gcc-4.2', ccflags ='-O2 -arch x86_64
> >>>> -fno-common -DPERL_DARWIN -I/opt/local/include
> >>>> -no-cpp-precomp -fno-strict-aliasing -pipe
> >>>> -I/usr/local/include -I/opt/local/include',
> >>>> optimize='-O3',
> >>>> cppflags='-I/opt/local/include -no-cpp-precomp -O2 -arch
> >>>> x86_64 -fno-common -DPERL_DARWIN -I/opt/local/include
> >>>> -no-cpp-precomp -fno-strict-aliasing -pipe
> >>>> -I/usr/local/include -I/opt/local/include'
> >>>> ccversion='', gccversion='4.2.1 (Apple Inc. build 5646)
> >>>> (dot 1)', gccosandvers=''
> >>>> intsize=4, longsize=8, ptrsize=8, doublesize=8,
> >>>> byteorder=12345678
> >>>> d_longlong=define, longlongsize=8, d_longdbl=define,
> >>>> longdblsize=16
> >>>> ivtype='long', ivsize=8, nvtype='double', nvsize=8,
> >>>> Off_t='off_t', lseeksize=8
> >>>> alignbytes=8, prototype=define
> >>>> Linker and Libraries:
> >>>> ld='env MACOSX_DEPLOYMENT_TARGET=10.3 /usr/bin/gcc-4.2',
> >>>> ldflags ='-L/opt/local/lib -L/usr/local/lib'
> >>>> libpth=/usr/local/lib /opt/local/lib /usr/lib
> >>>> libs=-ldbm -ldl -lm -lutil -lc
> >>>> perllibs=-ldl -lm -lutil -lc
> >>>> libc=/usr/lib/libc.dylib, so=dylib, useshrplib=false,
> >>>> libperl=libperl.a
> >>>> gnulibc_version=''
> >>>> Dynamic Linking:
> >>>> dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef,
> >>>> ccdlflags=' '
> >>>> cccdlflags=' ', lddlflags='-L/opt/local/lib -bundle
> >>>> -undefined dynamic_lookup -L/usr/local/lib'
> >>>>
> >>>>
> >>>> Characteristics of this binary (from libperl):
> Compile-time options: PERL_MALLOC_WRAP USE_64_BIT_ALL
> >>>> USE_64_BIT_INT
> >>>> USE_FAST_STDIO USE_LARGE_FILES
> USE_PERLIO
> >>>> Built under darwin
> >>>> Compiled at Feb 13 2010 13:19:33
> >>>> @INC:
> >>>> /opt/local/lib/perl5/site_perl/5.8.9/darwin-2level
> >>>> /opt/local/lib/perl5/site_perl/5.8.9
> >>>> /opt/local/lib/perl5/site_perl
> >>>> /opt/local/lib/perl5/vendor_perl/5.8.9/darwin-2level
> >>>> /opt/local/lib/perl5/vendor_perl/5.8.9
> >>>> /opt/local/lib/perl5/vendor_perl
> >>>> /opt/local/lib/perl5/5.8.9/darwin-2level
> >>>> /opt/local/lib/perl5/5.8.9
> >>>> .
> >>>>
> >>>> On Thu, Sep 23, 2010 at 10:26 PM, Ralf Wildenhues
> >>>> <Ralf.Wildenhues_at_[hidden] <mailto:Ralf.Wildenhues_at_[hidden]>> wrote:
> >>>>
> >>>> Hello Ralph,
> >>>>
> >>>> wow, that's not good to hear. I knew the perl ithreads
> >>>> implementation
> >>>> wasn't all that efficient, but causing a deadlock sounds
> >>>> like you have
> >>>> more trouble than just perl; at least I hope so. For
> >>>> reference, can
> >>>> you send 'perl -V' output (if you like, to the
> >>>> bug-automake at gnu.org <http://gnu.org>
> >>>> list).
> >>>>
> >>>> Thanks,
> >>>> Ralf
> >>>>
> >>>> * Ralph Castain wrote on Fri, Sep 24, 2010 at 03:12:16AM
> >>>> CEST:
> >>>>> I found one major negative to this change - it assumes
> >>>> that my build is
> >>>>> being done in exclusion of anything else on my
> >>>> computer. Unfortunately, this
> >>>>> is never true.
> >>>>>
> >>>>> So my laptop hemorrhaged itself into frozen silence,
> >>>> overheated to the point
> >>>>> of being burning hot, and had to have its battery
> >>>> yanked to stop the runaway
> >>>>> behavior. Not a really good thing.
> >>>>>
> >>>>> I would suggest you default this "heuristic" out, and
> >>>> let someone set it to
> >>>>> use multiple runs if-and-only-if they want it. Hate to
> >>>> cite the lowest
> >>>>> common denominator, but this was a very nasty surprise.
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Wed, Sep 22, 2010 at 7:50 AM, Jeff Squyres
> >>>> <jsquyres_at_[hidden] <mailto:jsquyres_at_[hidden]>> wrote:
> >>>>>
> >>>>>> Some of you may be unaware that recent versions of
> >>>> automake can run in
> >>>>>> parallel. That is, automake will run in parallel
> >>>> with a degree of (at most)
> >>>>>> $AUTOMAKE_JOBS. This can speed up the execution time
> >>>> of autogen.pl <http://autogen.pl> quite
> >>>>>> a bit on some platforms. On my cluster at cisco,
> >>>> here's a few quick timings
> >>>>>> of the entire autogen.pl <http://autogen.pl> process
> >>>> (of which, automake is the bottleneck):
> >>>>>>
> >>>>>> $AUTOMAKE_JOBS Total wall time
> >>>>>> value of autogen.pl
> >>>> <http://autogen.pl>
> >>>>>> 8 3:01.46
> >>>>>> 4 2:55.57
> >>>>>> 2 3:28.09
> >>>>>> 1 4:38.44
> >>>>>>
> >>>>>> This is an older Xeon machine with 2 sockets, each
> >>>> with 2 cores.
> >>>>>>
> >>>>>> There's a nice performance jump from 1 to 2, and a
> >>>> smaller jump from 2 to
> >>>>>> 4. 4 and 8 are close enough to not matter. YMMV.
> >>>>>>
> >>>>>> I just committed a heuristic to autogen.pl
> >>>> <http://autogen.pl> to setenv AUTOMAKE_JOBS if it
> >>>>>> is not already set
> >>>> (https://svn.open-mpi.org/trac/ompi/changeset/23788):
> >>>>>>
> >>>>>> - If lstopo is found in your $PATH, runs it and count
> >>>> how many PU's
> >>>>>> (processing units) you have. It'll set AUTOMAKE_JOBS
> >>>> to that number, or a
> >>>>>> maximum of 4 (which is admittedly a further heuristic).
> >>>>>> - If lstopo is not found, it just sets AUTOMAKE_JOBS
> >>>> to 2.
> >>>>>>
> >>>>>> Enjoy.
> >>>>>>
> >>>>>> --
> >>>>>> Jeff Squyres
> >>>>>> jsquyres_at_[hidden] <mailto:jsquyres_at_[hidden]>
> >>>>>> For corporate legal information go to:
> >>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
> >>>> _______________________________________________
> >>>> devel mailing list
> >>>> devel_at_[hidden] <mailto:devel_at_[hidden]>
> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> devel mailing list
> >>>> devel_at_[hidden] <mailto:devel_at_[hidden]>
> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>
> >>> _______________________________________________
> >>> devel mailing list
> >>> devel_at_[hidden] <mailto:devel_at_[hidden]>
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>
> >>>
> >>>
> >>>
> ------------------------------------------------------------------------
> >>>
> >>> _______________________________________________
> >>> devel mailing list
> >>> devel_at_[hidden]
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>
> >> --
> >> Paul H. Hargrove PHHargrove_at_[hidden]
> >> Future Technologies Group
> >> HPC Research Department Tel: +1-510-495-2352
> >> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> >>
> >> _______________________________________________
> >> devel mailing list
> >> devel_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>