I have played a bit with google datastore and here is a proposal for mtt DB
infra and some accompanying tools for submission and querying:
1. Scope and requirements
a. provide storage services for storing test results generated by mtt.
Storage services will be implemented over datastore.
b. provide storage services for storing benchmarking results generated by
various mpi based applications (not mtt based, for example: fluent,
c. test or benchmarking results stored in the datastore can be grouped and
referred as a group (for example: mtt execution can generate many mtt
results consisting of different phases. This mtt execution will be referred
as a session)
d. Benchmarking and test results which are generated by mtt or any other mpi
based applications, can be stored in the datastore and grouped by some
e. The mtt should not depend or call directly any datastore`s provided APIs.
The mtt client (or framework/scripts executing mpi based applications)
should generate test/benchmarking results in some internal format, which
will be processed later by external tools. These external tools will be
responsible for saving test results in the datastore. Same rules should be
applied for non mtt based executions of mpi-based applications (line fluent,
openfoam,...). The scripts which are wrapping such executions will dump
benchmarking results in some internal form for later processing by external
f. The internal form for representation of test/benchmarking results can be
XML. The external tool will receive (as cmd line params) XML files, process
them and save to the datastore.
d. The external tools will be familiar with datastore object model and will
provide bridge between test results (XML) and actual datastore.
2. Flow and use-cases
a. The mtt client will dump all test related information into XML file. The
file will be created for every phase executed by mtt. (today there are many
summary txt and html files generated for every test phase, it is pretty easy
to add xml generation of the same information)
b. mtt_save_to_db.py - script which will go over mtt scratch dir, find all
xml files generated for every mtt phase, parse it and save to datastore,
preserving test results relations,i.e. all test results will be grouped by
mtt general info: mpi version, name, date, ....
c. same script can scan, parse and save from xml files generated by wrapper
scripts for non mtt based executions (fluent, ..)
d. mtt_query_db.py script will be provided with basic query capabilities
over proposed datastore object model. Most users will prefer writing custom
sql-like select queries for fetching results.
3. Important notes:
a. The single mtt client execution generates many result files, every
generated file represents test phase. This file contains test results and
can be characterized as a set of attributes with its values. Every test
phase has its own attributes which are differ for different phases. For
example: attributes for TestBuild phase has keys "compiler_name,
compiler_version", the MPIInstall phase has attributes: prefix_dir, arch,
Hence, most of the datastore objects representing phases of MTT are derived
from "db.Expando" model, which allows having dynamic attributes for its
The attached is archive with a simple test for using datastore for mtt.
Please see models.py file with proposed object model and comment.
You can run the attached example in the google datastore dev environment. (
On Tue, Mar 24, 2009 at 12:17 AM, Jeff Squyres <jsquyres_at_[hidden]> wrote:
> On Mar 23, 2009, at 9:05 AM, Ethan Mallove wrote:
>> Resource | Unit | Unit cost
>> Outgoing Bandwidth | gigabytes | $0.12
>> Incoming Bandwidth | gigabytes | $0.10
>> CPU Time | CPU hours | $0.10
>> Stored Data | gigabytes per month | $0.15
>> Recipients Emailed | recipients | $0.0001
>> Would we itemize the MTT bill on a per user basis? E.g., orgs that
>> use MTT more, would have to pay more?
> Let's assume stored data == incoming bandwidth, because we never throw
> anything away. And let's go with the SWAG of 100GB. We may or may not be
> able to gzip the data uploading to the server. So if anything, we *might*
> be able to decrease the incoming data and have higher level of stored data.
> I anticipate our outgoing data to be significantly less, particularly if we
> can gzip the outgoing data (which I think we can). You're right, CPU time
> is a mystery -- we won't know what it will be until we start running some
> queries to see what happens.
> 100GB * $0.10 = $10
> 100GB * $0.15 = $15
> total = $25 for the first month
> So let's SWAG at $25/mo for a year = $300. This number will be wrong for
> several reasons, but it at least gives us a ballpark. For $300/year, I
> think we (the OMPI project) can find a way to pay for this fairly easily.
> Jeff Squyres
> Cisco Systems
> mtt-devel mailing list