On Tue, Apr 14, 2009 at 5:04 PM, Jeff Squyres <email@example.com>
On Apr 13, 2009, at 2:08 PM, Mike Dubman wrote:
Sorry for joining the discussion late... I was on travel last week and that always makes me waaay behind on my INBOX. :-(
Ah, good point (python/java not perl). But I think that lib/MTT/Reporter/GoogleDataStore.pm could still be a good thing -- we have invested a lot of time/effort into getting our particular mtt clients setup just the way we want them, setting up INI files, submitting to batch schedulers, etc.
On Mon, Apr 13, 2009 at 5:44 PM, Ethan Mallove <firstname.lastname@example.org> wrote:
Will this translate to something like
lib/MTT/Reporter/GoogleDatabase.pm? If we are to move away from the
current MTT Postgres database, we want to be able to submit results to
both the current MTT database and the new Google database during the
transition period. Having a GoogleDatabase.pm would make this easier.
I think we should keep both storage options: current postgress and datastore. The mtt changes will be minor to support datastore.
Due that fact that google appengine API (as well as datastore API) can be python or java only, we will create external scripts to manipulate datastore objects:
A GoogleDataStore.pm reporter could well fork/exec a python/java executable to do the actual communication/storing of the data, right...? More below.
completely agree, once we have external python/java/cobol scripts to manipulate GDS objects, we should wrap it by perl and call from MTT in same way like it works today for submitting to the postgress.
Could be pretty easy to have a Reporter/GDS.pm (I keep making that filename shorter, don't I? :-) ) that simply invokes the mtt-result-submit-to-datastore.pt script on the xml that it dumped for that particular test.
The mtt will dump test results in xml format. Then, we provide two python (or java?) scripts:
mtt-results-submit-to-datastore.py - script will be called at the end of mtt run and will read xml files, create objects and save to datastore
Specifically: I do like having partial results submitted while my MTT tests are running. Cisco's testing cycle is about 24 hours, but groups of tests are finishing all the time, so it's good to see those results without having to wait the full 24 hours before anything shows up. I guess that's my only comment on the idea of having a script that traverses the MTT scratch to find / submit everything -- I'd prefer if we kept the same Reporter idea and used an underlying .py script to submit results as they become ready.
Is this do-able?
sounds good, we should introduce some guid (like pid) for mtt session, where all mtt results generated by this session will be referring to this guid. Later we use this guid to submit partial results as they become ready and connect it to the appropriate mtt session object (see models.py)
mtt-results-query.py - sample script to query datastore and generate some simple visual/tabular reports. It will serve as tutorial for howto access mtt data from scripts for reporting.
Later, we add another script to replace php web frontend. It will be hosted on google appengine machines and will provide web viewer for mtt results. (same way like index.php does today)
I think that Ethan was asking was: can't MTT run Fluent and then use the normal Reporter mechanism to report the results into whatever back-end data store we have? (postgres or GDS)
> b. mtt_save_to_db.py - script which will go over mtt scratch dir, find
> all xml files generated for every mtt phase, parse it and save to
> datastore, preserving test results relations,i.e. all test results will
> be grouped by mtt general info: mpi version, name, date, ....
> c. same script can scan, parse and save from xml files generated by
> wrapper scripts for non mtt based executions (fluent, ..)
I'm confused here. Can't MTT be outfitted to report results of a
I think we can enhance mtt to be not only mpi testing platform, but also to serve as mpi benchmarking platform. We can use datastore to keep mpi-based benchmarking results in the same manner like mtt does for testing results. (no changes to mtt required for that, it is just a side effect of using datastore to keep data of any type)
ahhh, okie, i see.
Correct me if Im wrong, the current mtt implementation allows following way of executing mpi test:
/path/to/mpirun <mpirun options> <test>
Many mpi based applications have embedded MPI libraries and non-standard way to start it, one should set env variable to point to desired mpi installation or pass it as cmd line argument, for example:
fluent <cmd line args>
pamworld -np 2 -mpidir=/path/to/openmpi/dir ....
Im not sure if it is possible to express that execution semantic in mtt ini file. Please suggest.
So far, it seems that such executions can be handled externally from mtt but using same object model.
I can see the value of both sides -- a) using the MTT client as the gateway to *all* data storage, or b) making MTT but one (possibly of many) tools that can write into the GDS. a) certainly is more attractive towards having a common data format back in GDS such that a single web tool is capable of reporting from the data and being able to make conherent sense out of the data (vs. 3rd party tools that put data back in GDS that may not be in exactly the same format / layout and therefore our web reporter may not be able to make sense out of the data and report on it).
I think that having a Reporter/GDS.pm that system()'s the back-end python script gives the best of both worlds -- the MTT client can [continue to] submit results in the normal way, but there's also a standalone script that can submit results from external tool runs (e.g., manually running Fluent, parsing the results, and submitting to our GDS). And hopefully the back-end python script will enforce a specific structure to the data that is submitted so that all tools -- MTT and any 3rd party tools -- adhere to the same format and the reporter can therefore report on it coherently.
agree. (a) is a preferred form. (b) can be used for tools that cannot be called from mtt.
For the attachment...
I can "sorta read" python, but I'm not familiar with its intricacies and its internal APIs.
- models.py: looks good. I don't know if *all* the fields we have are listed here; it looks fairly short to me. Did you attempt to include all of the fields we submit through the various phases in Reporter are there, or did you intentionally leave some out? (I honestly haven't checked; it just "feels short" to me compared to our SQL schema).
I listed only some of the fields in every object representing specific test result source (called phase in mtt language). This is because every test result source object is derived from python provided db.Expando class. This gives us great flexibility, like adding dynamic attributes for every objects, for example:
obj = new MttBuildPhaseResult()
obj.my_favorite_dynamic_key = "hello"
obj.my_another_dynamic_key = 7
So, we can have all phase attributes in the phase object without defining it in the *sql schema way*. Also we can query object model by these dynamic keys.
--> meta question: is it in the zen of GDS to not have too many index fields like you would in SQL? I.e., if you want to do an operation on GDS that you
as far as it seems now, gds creates indexes automatically and also provides API to define indexes manually.
would typically use an SQL index field for, is the idea that you would do a map/reduce to select the data instead of an index field?
yep. seems correct.
- start_datastore.sh: hmm. This script seems to imply that the datastore is *local*! Don't we have to HTTP submit the results to Google? More specifically: what is dev_appserver.py? Is that, perchance, just a local proxy agent that will end up submitting our data to $datastore_path, which actually resides at Google? Do we have to use a specific google username/URL to submit (and query) results?
You need to download google`s sdk (dev_appserver.py is a part of it). In order to develop for gds you run your code inside sdk locally, and when feel comfortable with it - you upload it to the google cluster. In order to run attached example, you need to download sdk, put it in the following dir hierarchy:
and run start_datastore.sh, which will run local instance of GDS on your machine.Then in another shell you need to run vbech-dev.py, which simulates mtt client accessing GDS, storing some objects in according to proposed models and then running some sql-like quires to fetch and manipulate results.
- there's no comments in vbench-dev.py -- can you explain what's going on in there? Can you explain how we would use these scripts?
This is a mtt simulator, it implements google appengine API to receive HTTP requests and call appropriate callbacks. (there is a map of specific urls to callbacks).
The main callback (which intercepts http GET requests to specific URL) runs the test code which creates objects defined in models.py, groups many test results into MTTSession and they run some queries to access previously created objects.
The real mtt client will use URL pointing to MTT python code running at google`s cluster, and use near same code to create/query/manipulate objects defined in models.py.
- it *looks* like these scripts are for storing data out in the GDS. Have you looked at the querying side? Do we know that storing data in the form you listed in models.py are easily retrievable in the ways that we want? E.g., can you mock up queries that resemble the queries we currently have in our web-based query system today, just to show that storing the data in this way will actually allow us to do the kinds of queries that we want to do?
In short: I think I'm missing much of the back-story / rationale of how the scripts in your tarball work / are to be used.
BTW -- if it's useful to have a teleconference about this kind of stuff, I can host a WebEx meeting. WebEx has local dialins around the world, including Israel...
sure, what about next week?