Open MPI logo

MTT Devel Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all MTT Devel mailing list

Subject: Re: [MTT devel] GSOC application
From: Mike Dubman (mike.ompi_at_[hidden])
Date: 2009-04-15 08:57:07


On Tue, Apr 14, 2009 at 11:50 PM, Ethan Mallove <ethan.mallove_at_[hidden]>wrote:

> On Tue, Apr/14/2009 09:27:14PM, Mike Dubman wrote:
> > On Tue, Apr 14, 2009 at 5:04 PM, Jeff Squyres <jsquyres_at_[hidden]>
> wrote:
> >
> > On Apr 13, 2009, at 2:08 PM, Mike Dubman wrote:
> >
> > Hello Ethan,
> >
> > Sorry for joining the discussion late... I was on travel last week
> and
> > that always makes me waaay behind on my INBOX. *:-(
> >
> > On Mon, Apr 13, 2009 at 5:44 PM, Ethan Mallove <
> ethan.mallove_at_[hidden]>
> > wrote:
> >
> > Will this translate to something like
> > lib/MTT/Reporter/GoogleDatabase.pm? *If we are to move away from
> the
> > current MTT Postgres database, we want to be able to submit
> results to
> > both the current MTT database and the new Google database during
> the
> > transition period. Having a GoogleDatabase.pm would make this
> easier.
> >
> > I think we should keep both storage options: current postgress and
> > datastore. The mtt changes will be minor to support datastore.
> > Due that fact that google appengine API (as well as datastore API)
> can
> > be python or java only, we will create external scripts to
> manipulate
> > datastore objects:
> >
> > Ah, good point (python/java not perl). *But I think that
> > lib/MTT/Reporter/GoogleDataStore.pm could still be a good thing --
> we
> > have invested a lot of time/effort into getting our particular mtt
> > clients setup just the way we want them, setting up INI files,
> > submitting to batch schedulers, etc.
> >
> > A GoogleDataStore.pm reporter could well fork/exec a python/java
> > executable to do the actual communication/storing of the data,
> right...?
> > *More below.
> >
> > completely agree, once we have external python/java/cobol scripts to
> > manipulate GDS objects, we should wrap it by perl and call from MTT in
> > same way like it works today for submitting to the postgress.
> >
> > *
> >
> > The mtt will dump test results in xml format. Then, we provide two
> > python (or java?) scripts:
> >
> > mtt-results-submit-to-datastore.py - script will be called at the
> end
> > of mtt run and will read xml files, create objects and save to
> > datastore
> >
> > Could be pretty easy to have a Reporter/GDS.pm (I keep making that
> > filename shorter, don't I? :-) ) that simply invokes the
> > mtt-result-submit-to-datastore.pt script on the xml that it dumped
> for
> > that particular test.
> >
> > Specifically: I do like having partial results submitted while my
> MTT
> > tests are running. *Cisco's testing cycle is about 24 hours, but
> groups
> > of tests are finishing all the time, so it's good to see those
> results
> > without having to wait the full 24 hours before anything shows up.
> *I
> > guess that's my only comment on the idea of having a script that
> > traverses the MTT scratch to find / submit everything -- I'd prefer
> if
> > we kept the same Reporter idea and used an underlying .py script to
> > submit results as they become ready.
> >
> > Is this do-able?
> >
> > sounds good, we should introduce some guid (like pid) for mtt session,
> > where all mtt results generated by this session will be referring to
> this
> > guid.* Later we use this guid to submit partial results as they become
> > ready and connect it to the appropriate mtt session object (see
> models.py)
> >
> > mtt-results-query.py - sample script to query datastore and
> generate
> > some simple visual/tabular reports. It will serve as tutorial for
> > howto access mtt data from scripts for reporting.
> >
> > Later, we add another script to replace php web frontend. It will
> be
> > hosted on google appengine machines and will provide web viewer
> for
> > mtt results. (same way like index.php does today)
> >
> > Sounds good.
> >
> > > * * *b. mtt_save_to_db.py - script which will go over mtt
> scratch
> > dir, find
> > > * * *all xml files generated for every mtt phase, parse it and
> save
> > to
> > > * * *datastore, preserving test results relations,i.e. all test
> > results will
> > > * * *be grouped by mtt general info: mpi version, name, date,
> ....
> > >
> > > * * *c. same script can scan, parse and save from xml files
> > generated by
> > > * * *wrapper scripts for non mtt based executions (fluent, ..)
> >
> > I'm confused here. *Can't MTT be outfitted to report results of a
> > Fluent run?
> >
> > I think we can enhance mtt to be not only mpi testing platform,
> but
> > also to serve as mpi benchmarking platform. We can use datastore
> to
> > keep mpi-based benchmarking results in the same manner like mtt
> does
> > for testing results. (no changes to mtt required for that, it is
> just
> > a side effect of using datastore to keep data of any type)
> >
> > I think that Ethan was asking was: can't MTT run Fluent and then use
> the
> > normal Reporter mechanism to report the results into whatever
> back-end
> > data store we have? *(postgres or GDS)
> >
> > ahhh, okie, i see.
> >
> > Correct me if Im wrong, the current mtt implementation allows
> following
> > way of executing mpi test:
> > /path/to/mpirun <mpirun options> <test>
> >
> > Many mpi based applications have embedded MPI libraries and
> non-standard
> > way to start it, one should set env variable to point to desired mpi
> > installation or pass it as cmd line argument, for example:
> >
> > for fluent:
> >
> > export OPENMPI_ROOT=/path/to/openmpi
> > fluent <cmd line args>
> >
>
> We'd probably want a special "MPI details" INI section to run Fluent,
> e.g.,
>
> [MPI Details: Fluent]
> exec = fluent @fluent_args@
> ...
>
> > for pamcrash:
> > pamworld -np 2 -mpidir=/path/to/openmpi/dir ....
>
> Ditto for pamcrash.
>
> >
> > Im not sure if it is possible to express that execution semantic in
> mtt
> > ini file. Please suggest.
> > So far, it seems that such executions can be handled externally from
> mtt
> > but using same object model.
>
> MTT supports the following INI parameters:
>
> * setenv
> * prepend_path
> * env_module
> * env_importer
>

aha, great

>
> >
> > *
> >
> > I can see the value of both sides -- a) using the MTT client as the
> > gateway to *all* data storage, or b) making MTT but one (possibly of
> > many) tools that can write into the GDS. *a) certainly is more
> > attractive towards having a common data format back in GDS such that
> a
> > single web tool is capable of reporting from the data and being able
> to
> > make conherent sense out of the data (vs. 3rd party tools that put
> data
> > back in GDS that may not be in exactly the same format / layout and
> > therefore our web reporter may not be able to make sense out of the
> data
> > and report on it).
> >
> > I think that having a Reporter/GDS.pm that system()'s the back-end
> > python script gives the best of both worlds -- the MTT client can
> > [continue to] submit results in the normal way, but there's also a
> > standalone script that can submit results from external tool runs
> (e.g.,
> > manually running Fluent, parsing the results, and submitting to our
> > GDS). *And hopefully the back-end python script will enforce a
> specific
> > structure to the data that is submitted so that all tools -- MTT and
> any
> > 3rd party tools -- adhere to the same format and the reporter can
> > therefore report on it coherently.
> >
> > agree. (a) is a preferred form. (b) can be used for tools that cannot
> be
> > called from mtt.
> > *
> >
> > For the attachment...
> >
> > I can "sorta read" python, but I'm not familiar with its intricacies
> and
> > its internal APIs.
> >
> > - models.py: looks good. *I don't know if *all* the fields we have
> are
> > listed here; it looks fairly short to me. *Did you attempt to
> include
> > all of the fields we submit through the various phases in Reporter
> are
> > there, or did you intentionally leave some out? *(I honestly haven't
> > checked; it just "feels short" to me compared to our SQL schema).
> >
> > I listed only some of the fields in every object representing specific
> > test result source (called phase in mtt language). This is because
> every
> > test result source object is derived from python provided db.Expando
> > class. This gives us great flexibility, like adding dynamic attributes
> for
> > every objects, for example:
> >
> > obj = new MttBuildPhaseResult()
> > obj.my_favorite_dynamic_key = "hello"
> > obj.my_another_dynamic_key = 7
> >
> > So, we can have all phase attributes in the phase object without
> defining
> > it in the *sql schema way*. Also we can query object model by these
> > dynamic keys.
> >
> > *
>
> It looks like model.py doesn't have the daisy chain of inheritance
> that the SQL schema requires.
>
> http://svn.open-mpi.org/trac/mtt/browser/trunk/docs/sql-schema-v3.pdf
>
> Shouldn't RunTestPhase back-reference the MPIInstallPhase,
> TestBuildPhase, and TestSession phase? E.g., we might need to look at
> the configure arguments that are keyed to a given test run.
>
> -Ethan
>

> you are right, will add it to the model. Every phase object will have a
> reference to other relevant phase objects, i.e.
>

RunTestPhase -> MPIInstallPhase
RunTestPhase -> TestBuildPhase
*Phase -> TestSession

sounds good? Will go over sql schema and try to track additional relations.

>
>
> >
> > --> meta question: is it in the zen of GDS to not have too many
> index
> > fields like you would in SQL? *I.e., if you want to do an operation
> on
> > GDS that you
> >
> > as far as it seems now, gds creates indexes automatically and also
> > provides API to define indexes manually.
> >
> > would typically use an SQL index field for, is the idea that you
> would
> > do a map/reduce to select the data instead of an index field?
> >
> > yep. seems correct.
> >
> > *
> >
> > - start_datastore.sh: hmm. *This script seems to imply that the
> > datastore is *local*! *Don't we have to HTTP submit the results to
> > Google? *More specifically: what is dev_appserver.py? *Is that,
> > perchance, just a local proxy agent that will end up submitting our
> data
> > to $datastore_path, which actually resides at Google? *Do we have to
> use
> > a specific google username/URL to submit (and query) results?
> >
> > You need to download google`s sdk (dev_appserver.py is a part of it).
> In
> > order to develop for gds you* run your code inside sdk locally, and
> when
> > feel comfortable with it - you upload it to the google cluster. In
> order
> > to run attached example, you need to download sdk, put it in the
> following
> > dir hierarchy:
> >
> > somedir/sdk
> > somedir/vbench-dev
> >
> > and run start_datastore.sh, which will run local instance of GDS on
> your
> > machine.Then in another shell you need to run vbech-dev.py, which
> > simulates mtt client accessing GDS, storing some objects in according
> to
> > proposed models and then running some sql-like quires to fetch and
> > manipulate results.
> >
> > see
> >
> http://code.google.com/appengine/docs/python/gettingstarted/devenvironment.html
> >
> > - there's no comments in vbench-dev.py -- can you explain what's
> going
> > on in there? *Can you explain how we would use these scripts?
> >
> > This is a mtt simulator, it implements google appengine API to receive
> > HTTP requests and call appropriate callbacks. (there is a map of
> specific
> > urls to callbacks).
> >
> > The main callback (which intercepts http GET requests to specific URL)
> > runs the test code which creates objects defined in models.py, groups
> many
> > test results into MTTSession and they run some queries to access
> > previously created objects.
> >
> > The real mtt client will use URL pointing to MTT python code running
> at
> > google`s cluster, and use near same code to create/query/manipulate
> > objects defined in models.py.
> >
> > *
> >
> > - it *looks* like these scripts are for storing data out in the GDS.
> > *Have you looked at the querying side? *Do we know that storing data
> in
> > the form you listed in models.py are easily retrievable in the ways
> that
> > we want? *E.g., can you mock up queries that resemble the queries we
> > currently have in our web-based query system today, just to show
> that
> > storing the data in this way will actually allow us to do the kinds
> of
> > queries that we want to do?
> >
> > I think vbench-dev.py shows some querying capabilities for stored
> objects,
> > there are many ways to query objects by object CLASS and Attributes.
> > see many examples here:
> >
> > see
> >
> http://code.google.com/appengine/docs/python/gettingstarted/usingdatastore.html
> > for more querying examples we can use.
> >
> > *
> >
> > In short: I think I'm missing much of the back-story / rationale of
> how
> > the scripts in your tarball work / are to be used.
> >
> > BTW -- if it's useful to have a teleconference about this kind of
> stuff,
> > I can host a WebEx meeting. *WebEx has local dialins around the
> world,
> > including Israel...
> >
> > sure, what about next week?
> > *
> >
> > regards
> >
> > Mike
> >
> > --
> > Jeff Squyres
> > Cisco Systems
> >
> > References
> >
> > Visible links
> > . mailto:jsquyres_at_[hidden]
> > . mailto:ethan.mallove_at_[hidden]
> > . http://submit-to-datastore.pt/
> > .
> http://code.google.com/appengine/docs/python/gettingstarted/devenvironment.html
> > .
> http://code.google.com/appengine/docs/python/gettingstarted/usingdatastore.html
>
> > _______________________________________________
> > mtt-devel mailing list
> > mtt-devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel
>
> _______________________________________________
> mtt-devel mailing list
> mtt-devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel
>