Open MPI logo

MTT Devel Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all MTT Devel mailing list

Subject: Re: [MTT devel] GSOC application
From: Mike Dubman (mike.ompi_at_[hidden])
Date: 2009-04-15 09:14:42


On Wed, Apr 15, 2009 at 3:51 AM, Jeff Squyres <jsquyres_at_[hidden]> wrote:

> On Apr 14, 2009, at 2:27 PM, Mike Dubman wrote:
>
> Ah, good point (python/java not perl). But I think that
>> lib/MTT/Reporter/GoogleDataStore.pm could still be a good thing -- we have
>> invested a lot of time/effort into getting our particular mtt clients setup
>> just the way we want them, setting up INI files, submitting to batch
>> schedulers, etc.
>>
>> A GoogleDataStore.pm reporter could well fork/exec a python/java
>> executable to do the actual communication/storing of the data, right...?
>> More below.
>>
>> completely agree, once we have external python/java/cobol scripts to
>> manipulate GDS objects, we should wrap it by perl and call from MTT in same
>> way like it works today for submitting to the postgress.
>>
>
> So say we all! :-)
>
> (did they show Battlestar Gallactica in Israel? :-) )
>
> sounds good, we should introduce some guid (like pid) for mtt session,
>> where all mtt results generated by this session will be referring to this
>> guid. Later we use this guid to submit partial results as they become ready
>> and connect it to the appropriate mtt session object (see models.py)
>>
>
> I *believe* have have 2 values like this in the MTT client already:
>
> - an ID that represents a single MTT client run
> - an ID that represents a single MTT mpi install->test build->test run tree
>
>
> I think that Ethan was asking was: can't MTT run Fluent and then use the
>> normal Reporter mechanism to report the results into whatever back-end data
>> store we have? (postgres or GDS)
>>
>> ahhh, okie, i see.
>>
>> Correct me if Im wrong, the current mtt implementation allows following
>> way of executing mpi test:
>> /path/to/mpirun <mpirun options> <test>
>>
>
> Yes and no; it's controlled by the mpi details section, right? You can put
> whatever you want in there.
>
> Many mpi based applications have embedded MPI libraries and non-standard
>> way to start it, one should set env variable to point to desired mpi
>> installation or pass it as cmd line argument, for example:
>>
>> for fluent:
>>
>> export OPENMPI_ROOT=/path/to/openmpi
>> fluent <cmd line args>
>>
>>
>> for pamcrash:
>> pamworld -np 2 -mpidir=/path/to/openmpi/dir ....
>>
>> Im not sure if it is possible to express that execution semantic in mtt
>> ini file. Please suggest.
>> So far, it seems that such executions can be handled externally from mtt
>> but using same object model.
>>
>
> Understood. I think you *could* get MTT to run these with specialized mpi
> details sections. But it may or may not be worth it.
>
> For the attachment...
>>
>> I can "sorta read" python, but I'm not familiar with its intricacies and
>> its internal APIs.
>>
>> - models.py: looks good. I don't know if *all* the fields we have are
>> listed here; it looks fairly short to me. Did you attempt to include all of
>> the fields we submit through the various phases in Reporter are there, or
>> did you intentionally leave some out? (I honestly haven't checked; it just
>> "feels short" to me compared to our SQL schema).
>>
>> I listed only some of the fields in every object representing specific
>> test result source (called phase in mtt language).
>>
>
> Ok. So that's only a sample -- just showing an example, not necessarily
> trying to be complete. Per Ethan's comments, there are a bunch of other
> fields that we have and/or we might just be able to "tie them together" in
> GDS. I.e., our data is hierarchical -- it worked well enough in SQL because
> you could just have one record about a test build refer to another record
> about the corresponding mpi install. And so on. Can we do something
> similar in GDS?
>

yep, actually in GDS it should be much easier to have hierarchy, because it
is OO storage. We just need to map all object relations and put it in
models.py - gds will do the rest :)

>
>
> This is because every test result source object is derived from python
>> provided db.Expando class. This gives us great flexibility, like adding
>> dynamic attributes for every objects, for example:
>>
>> obj = new MttBuildPhaseResult()
>> obj.my_favorite_dynamic_key = "hello"
>> obj.my_another_dynamic_key = 7
>>
>> So, we can have all phase attributes in the phase object without defining
>> it in the *sql schema way*. Also we can query object model by these dynamic
>> keys.
>>
>
> Hmm. Ok, so you're saying that we define a "phase object" (for each phase)
> with all the fields that we expect to have, but if we need to, we can create
> fields on the fly, and google will just "do the right thing" and associate
> *all* the data (the "expected" fields and the "dynamic" fields) together?

yep. correct. We can define only static attributes (which we know for sure
should present in every object of given type and leave phase specific
attributes to stay dynamic)

>
>
> --> meta question: is it in the zen of GDS to not have too many index
>> fields like you would in SQL? I.e., if you want to do an operation on GDS
>> that you
>>
>> as far as it seems now, gds creates indexes automatically and also
>> provides API to define indexes manually.
>> would typically use an SQL index field for, is the idea that you would do
>> a map/reduce to select the data instead of an index field?
>>
>> yep. seems correct.
>>
>
> K.
>
> - start_datastore.sh: hmm. This script seems to imply that the datastore
>> is *local*! Don't we have to HTTP submit the results to Google? More
>> specifically: what is dev_appserver.py? Is that, perchance, just a local
>> proxy agent that will end up submitting our data to $datastore_path, which
>> actually resides at Google? Do we have to use a specific google
>> username/URL to submit (and query) results?
>>
>>
>> You need to download google`s sdk (dev_appserver.py is a part of it). In
>> order to develop for gds you run your code inside sdk locally, and when
>> feel comfortable with it - you upload it to the google cluster. In order to
>> run attached example, you need to download sdk, put it in the following dir
>> hierarchy:
>>
>> somedir/sdk
>> somedir/vbench-dev
>>
>> and run start_datastore.sh, which will run local instance of GDS on your
>> machine.Then in another shell you need to run vbech-dev.py, which simulates
>> mtt client accessing GDS, storing some objects in according to proposed
>> models and then running some sql-like quires to fetch and manipulate
>> results.
>>
>> see
>> http://code.google.com/appengine/docs/python/gettingstarted/devenvironment.html
>>
>
> Ah, I see. Makes sense.
>
> - there's no comments in vbench-dev.py -- can you explain what's going on
>> in there? Can you explain how we would use these scripts?
>>
>> This is a mtt simulator, it implements google appengine API to receive
>> HTTP requests and call appropriate callbacks. (there is a map of specific
>> urls to callbacks).
>>
>> The main callback (which intercepts http GET requests to specific URL)
>> runs the test code which creates objects defined in models.py, groups many
>> test results into MTTSession and they run some queries to access previously
>> created objects.
>>
>> The real mtt client will use URL pointing to MTT python code running at
>> google`s cluster, and use near same code to create/query/manipulate objects
>> defined in models.py.
>>
>

The GDS API allows to specify onGET() and onPOST() callbacks which will be
called by appengine on user requests.
I used that API in the attached samle just as a shortest way to play with
GDS (create, save, query objects).
I think that GDS enabled mtt client will use smth like that:

GOOGLE_PROVIDED_URL_FOR_MTT = http://mtt.google.com/db
db = new DB(GOOGLE_PROVIDED_URL_FOR_MTT)

mttSessions = db.query("select * from TestSessions")
foreach session from mttSessions {
     // do smth, find sesion phase objects or whatever
}

So, we will not need explicit access by http GET/POST to the GDS. We will
use GDS remote api. The remote api will wrap all its calls by post/get
semantic.

We will need GET/POST access only to implement mtt results viewer applet
which will be hosted @google.

>
> Ok. But this code should really be intercepting PUT (or POST) requests,
> not GET, right?
>
> I ask because the MTT client currently POST's the data to send it via HTTP
> to the remote server.
>
> - it *looks* like these scripts are for storing data out in the GDS. Have
>> you looked at the querying side? Do we know that storing data in the form
>> you listed in models.py are easily retrievable in the ways that we want?
>> E.g., can you mock up queries that resemble the queries we currently have
>> in our web-based query system today, just to show that storing the data in
>> this way will actually allow us to do the kinds of queries that we want to
>> do?
>>
>> I think vbench-dev.py shows some querying capabilities for stored objects,
>> there are many ways to query objects by object CLASS and Attributes.
>> see many examples here:
>>
>> see
>> http://code.google.com/appengine/docs/python/gettingstarted/usingdatastore.html for
>> more querying examples we can use.
>>
>
> Ok.
>
> My only point is that we might want to think a little about the queries we
> want to do when designing the interfaces to stuff all the data into the GDS
> -- it may be helpful to have *some* structure to the data that goes into GDS
> if it helps the queries that we ultimately want to do.
>
> Do you want to try making queries for the data that you're shoving into GDS
> that simulate some of the same queries that we can perform today? This will
> just help validate a) that we can move current functionality up to GDS, and
> b) we can easily make up some new queries that we *can't* easily do on
> postgres today -- it might be fun/useful to see if GDS can handle such
> queries.

agree, the 1st milestone is to have script to submit results to GDS by using
remote GDS API, also ptovide some basic query capabilities for text
reporting (by using same remote API).

>
>
> Maybe the first goal should be -- once you guys get a good understanding of
> using GDS -- will be to have an MTT Reporter that we can all start using to
> start stuffing data into GDS. Once we have a bit of data out there, you can
> start trying to query the data and see what kinds of capabilities the query
> side has. Since we have basically limitless ability to generate data to
> submit into GDS :-), if we screw up the first few model definitions and end
> up wiping the data and starting over during this development process, it's
> no big deal -- just wait one day and the GDS will be populated again with
> new data from our MTT runs. :-)
>
> What do you think?
>

actually, the attached example allows easy creation and quering of objects
in GDS. We will rewrite it to use GDS remote API and will play with it.

>
> In short: I think I'm missing much of the back-story / rationale of how the
>> scripts in your tarball work / are to be used.
>>
>> BTW -- if it's useful to have a teleconference about this kind of stuff, I
>> can host a WebEx meeting. WebEx has local dialins around the world,
>> including Israel...
>>
>>
>> sure, what about next week?
>>
>
> I have a Doodle account -- let's try that to do the scheduling:
>
> http://doodle.com/gzpgaun2ef4szt29
>
> Ethan, Josh, and I are all in US Eastern timezone (I don't know if Josh
> will participate), so that might make scheduling *slightly* easier. I
> started timeslots at 8am US Eastern and stopped as 2pm US Eastern -- that's
> already pretty late in Israel. I also didn't list Friday, since that's the
> weekend in Israel.

can we do it on your morining? (our after noon) :)

>
>
> --
> Jeff Squyres
> Cisco Systems
>
>