I have played a bit with google datastore and here is a proposal for mtt DB infra and some accompanying tools for submission and querying:
1. Scope and requirements
a. provide storage services for storing test results generated by mtt. Storage services will be implemented over datastore.
b. provide storage services for storing benchmarking results generated by various mpi based applications (not mtt based, for example: fluent, openfoam)
c. test or benchmarking results stored in the datastore can be grouped and referred as a group (for example: mtt execution can generate many mtt results consisting of different phases. This mtt execution will be referred as a session)
d. Benchmarking and test results which are generated by mtt or any other mpi based applications, can be stored in the datastore and grouped by some logical criteria.
e. The mtt should not depend or call directly any datastore`s provided APIs. The mtt client (or framework/scripts executing mpi based applications) should generate test/benchmarking results in some internal format, which will be processed later by external tools. These external tools will be responsible for saving test results in the datastore. Same rules should be applied for non mtt based executions of mpi-based applications (line fluent, openfoam,...). The scripts which are wrapping such executions will dump benchmarking results in some internal form for later processing by external tools.
f. The internal form for representation of test/benchmarking results can be XML. The external tool will receive (as cmd line params) XML files, process them and save to the datastore.
d. The external tools will be familiar with datastore object model and will provide bridge between test results (XML) and actual datastore.
2. Flow and use-cases
a. The mtt client will dump all test related information into XML file. The file will be created for every phase executed by mtt. (today there are many summary txt and html files generated for every test phase, it is pretty easy to add xml generation of the same information)
b. mtt_save_to_db.py - script which will go over mtt scratch dir, find all xml files generated for every mtt phase, parse it and save to datastore, preserving test results relations,i.e. all test results will be grouped by mtt general info: mpi version, name, date, ....
c. same script can scan, parse and save from xml files generated by wrapper scripts for non mtt based executions (fluent, ..)
d. mtt_query_db.py script will be provided with basic query capabilities over proposed datastore object model. Most users will prefer writing custom sql-like select queries for fetching results.
3. Important notes:
a. The single mtt client execution generates many result files, every generated file represents test phase. This file contains test results and can be characterized as a set of attributes with its values. Every test phase has its own attributes which are differ for different phases. For example: attributes for TestBuild phase has keys "compiler_name, compiler_version", the MPIInstall phase has attributes: prefix_dir, arch, ....
Hence, most of the datastore objects representing phases of MTT are derived from "db.Expando" model, which allows having dynamic attributes for its derived sub-classes.
The attached is archive with a simple test for using datastore for mtt. Please see models.py file with proposed object model and comment.
You can run the attached example in the google datastore dev environment. (http://code.google.com/appengine/downloads.html
On Tue, Mar 24, 2009 at 12:17 AM, Jeff Squyres <email@example.com>
Let's assume stored data == incoming bandwidth, because we never throw anything away. And let's go with the SWAG of 100GB. We may or may not be able to gzip the data uploading to the server. So if anything, we *might* be able to decrease the incoming data and have higher level of stored data.
On Mar 23, 2009, at 9:05 AM, Ethan Mallove wrote:
Resource | Unit | Unit cost
Outgoing Bandwidth | gigabytes | $0.12
Incoming Bandwidth | gigabytes | $0.10
CPU Time | CPU hours | $0.10
Stored Data | gigabytes per month | $0.15
Recipients Emailed | recipients | $0.0001
Would we itemize the MTT bill on a per user basis? E.g., orgs that
use MTT more, would have to pay more?
I anticipate our outgoing data to be significantly less, particularly if we can gzip the outgoing data (which I think we can). You're right, CPU time is a mystery -- we won't know what it will be until we start running some queries to see what happens.
100GB * $0.10 = $10
100GB * $0.15 = $15
total = $25 for the first month
So let's SWAG at $25/mo for a year = $300. This number will be wrong for several reasons, but it at least gives us a ballpark. For $300/year, I think we (the OMPI project) can find a way to pay for this fairly easily.