Please comment on the proposed object model and flows. We will have 1-2 ppl
working on this in a 2-3w. Till that moment I would like to finalize the
scope and flows.
On Mon, Apr 6, 2009 at 4:54 PM, Mike Dubman <mike.ompi_at_[hidden]> wrote:
> Hello Guys,
> I have played a bit with google datastore and here is a proposal for mtt DB
> infra and some accompanying tools for submission and querying:
> 1. Scope and requirements
> a. provide storage services for storing test results generated by mtt.
> Storage services will be implemented over datastore.
> b. provide storage services for storing benchmarking results generated by
> various mpi based applications (not mtt based, for example: fluent,
> c. test or benchmarking results stored in the datastore can be grouped and
> referred as a group (for example: mtt execution can generate many mtt
> results consisting of different phases. This mtt execution will be referred
> as a session)
> d. Benchmarking and test results which are generated by mtt or any other
> mpi based applications, can be stored in the datastore and grouped by some
> logical criteria.
> e. The mtt should not depend or call directly any datastore`s provided
> APIs. The mtt client (or framework/scripts executing mpi based applications)
> should generate test/benchmarking results in some internal format, which
> will be processed later by external tools. These external tools will be
> responsible for saving test results in the datastore. Same rules should be
> applied for non mtt based executions of mpi-based applications (line fluent,
> openfoam,...). The scripts which are wrapping such executions will dump
> benchmarking results in some internal form for later processing by external
> f. The internal form for representation of test/benchmarking results can be
> XML. The external tool will receive (as cmd line params) XML files, process
> them and save to the datastore.
> d. The external tools will be familiar with datastore object model and will
> provide bridge between test results (XML) and actual datastore.
> 2. Flow and use-cases
> a. The mtt client will dump all test related information into XML file. The
> file will be created for every phase executed by mtt. (today there are many
> summary txt and html files generated for every test phase, it is pretty easy
> to add xml generation of the same information)
> b. mtt_save_to_db.py - script which will go over mtt scratch dir, find all
> xml files generated for every mtt phase, parse it and save to datastore,
> preserving test results relations,i.e. all test results will be grouped by
> mtt general info: mpi version, name, date, ....
> c. same script can scan, parse and save from xml files generated by wrapper
> scripts for non mtt based executions (fluent, ..)
> d. mtt_query_db.py script will be provided with basic query capabilities
> over proposed datastore object model. Most users will prefer writing custom
> sql-like select queries for fetching results.
> 3. Important notes:
> a. The single mtt client execution generates many result files, every
> generated file represents test phase. This file contains test results and
> can be characterized as a set of attributes with its values. Every test
> phase has its own attributes which are differ for different phases. For
> example: attributes for TestBuild phase has keys "compiler_name,
> compiler_version", the MPIInstall phase has attributes: prefix_dir, arch,
> Hence, most of the datastore objects representing phases of MTT are
> derived from "db.Expando" model, which allows having dynamic attributes for
> its derived sub-classes.
> The attached is archive with a simple test for using datastore for mtt.
> Please see models.py file with proposed object model and comment.
> You can run the attached example in the google datastore dev environment. (
> Please comment.
> On Tue, Mar 24, 2009 at 12:17 AM, Jeff Squyres <jsquyres_at_[hidden]> wrote:
>> On Mar 23, 2009, at 9:05 AM, Ethan Mallove wrote:
>>> Resource | Unit | Unit cost
>>> Outgoing Bandwidth | gigabytes | $0.12
>>> Incoming Bandwidth | gigabytes | $0.10
>>> CPU Time | CPU hours | $0.10
>>> Stored Data | gigabytes per month | $0.15
>>> Recipients Emailed | recipients | $0.0001
>>> Would we itemize the MTT bill on a per user basis? E.g., orgs that
>>> use MTT more, would have to pay more?
>> Let's assume stored data == incoming bandwidth, because we never throw
>> anything away. And let's go with the SWAG of 100GB. We may or may not be
>> able to gzip the data uploading to the server. So if anything, we *might*
>> be able to decrease the incoming data and have higher level of stored data.
>> I anticipate our outgoing data to be significantly less, particularly if
>> we can gzip the outgoing data (which I think we can). You're right, CPU
>> time is a mystery -- we won't know what it will be until we start running
>> some queries to see what happens.
>> 100GB * $0.10 = $10
>> 100GB * $0.15 = $15
>> total = $25 for the first month
>> So let's SWAG at $25/mo for a year = $300. This number will be wrong for
>> several reasons, but it at least gives us a ballpark. For $300/year, I
>> think we (the OMPI project) can find a way to pay for this fairly easily.
>> Jeff Squyres
>> Cisco Systems
>> mtt-devel mailing list