Subject: Re: [MTT users] Splitting build and run phases
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-05-01 07:46:10


On Apr 30, 2009, at 5:17 PM, Barrett, Brian W wrote:

> I have what's probably a stupid question, but I couldn't find the
> answer on
> the wiki.
>

The wiki has a lot of info, but it is probably incomplete. :-\

> I've currently been building OMPI and the tests then running the
> tests all
> in the same MTT run, all in a batch job. The problem is, that means
> I've
> got a bunch of nodes reserved while building OMPI, which I can't
> actually
> use.
>
> Is there any way to split the two phases (build and run) so that I
> can build
> outside of the batch job, get the reservation, and run the tests?
>

Yes. I actually have quite a sophisticated (if I do say so
myself ;-) ) system at Cisco -- I split all my gets/installs/builds
into separate slurm jobs from the corresponding test runs, for
example. In that way, I can submit a whole pile of 1-node SLURM jobs
to do all the gets/installs/builds, and then N-node SLURM jobs for the
test runs. Even better, I make the N-node SLURM jobs depend on the 1-
node SLURM get/install/build jobs. That way, if the 1-node job fails
(e.g., someone commits a build error to the tree and the MPI install
phase fails), then SLURM will automatically dequeue any dependent jobs
without even running them. MTT would recognize this and simply not
run the test run phases, but it's nice that SLURM just kills them
without even running them. :-)

Anyhoo... The client is quite flexible; you can limit what you run by
phase and/or section. Check out the output of "./client/mtt --help".
This part in particular:

--[no-]mpi-get Do the "MPI get" phase
--[no-]mpi-install Do the "MPI install" phase
--[no-]mpi-phases Alias for --mpi-get --mpi-install
--[no-]test-get Do the "Test get" phase
--[no-]test-build Do the "Test build" phase
--[no-]test-run Do the "Test run" phase
--[no-]test-phases Alias for --test-get --test-build --
test-run
--[no-]section Do a specific section(s)

By default, the client runs everything in finds in the ini file. But
you can tell it exactly what phases to run (or not to run). For
example, say I had 2 MPI get phases:

[MPI get: ompi-nightly-trunk]
[MPI get: ompi-nightly-v1.3]

You can tell the client to run just the MPI Get phases:

    ./client/mtt --file ... --scratch ... --mpi-get

Or you can tell the client to run just the "trunk" MPI Get phase:

    ./client/mtt --file ... --scratch ... --mpi-get --section trunk

--section matching is case-insensitive.

BEWARE: the --section matching applies to *all* sections.
Specifically, if you're running a reportable phase (MPI Install, Test
Build, Test Install), you must *also* be able to match your reporter
section or that section won't be included. For example:

    ./client/mtt --file ... --scratch ... --mpi-install --section gnu-
standard --section reporter

In my cisco-ompi-core-testing.ini file (see ompi-tests/trunk/cisco/
mtt), this will run the following sections:

[MPI install: GNU-standard]
[Reporter: IU database]

I have a "nightly.pl" script (same SVN dir, see above) that launches a
set of very specific SLURM jobs to do Cisco's runs. It reads the
sections from the Cisco INI file and launches a whole series of 1-node
SLURM jobs, each with a unique scratch tree, each doing a single MPI
install section corresponding to a single MPI get section, and then
doing all corresponding Test Builds. It essentially runs "run-mtt-
compile.pl <get_section> <install_section>". This script essentially
does the following:

    # Run a single MPI Get phase
    ./client/mtt -p --file ... --scratch <foo> --mpi-get --section
reporter --section <get_section>
    # if ^^ succeeds, run a single MPI install phase
    ./client/mtt -p --file ... --scratch <foo> --mpi-install --section
reporter --section <install_section>
    # if ^^ succeeds, run all corresponding Test Get and Test Build
phases
    ./client/mtt -p --file ... --scratch <foo> --test-get --test-build

I also sbatch a whole pile of corresponding N-node Test Run SLURM jobs
that are dependent upon the above SLURM job that essentially run the
following:

    ./client/mtt -p --file ... --scratch <foo> --test-run --section
reporter --section <run_section>

Hope that helps.

-- 
Jeff Squyres
Cisco Systems