From: Ethan Mallove (ethan.mallove_at_[hidden])
Date: 2006-10-27 12:42:58


On Fri, Oct/27/2006 10:31:44AM, Josh Hursey wrote:
>
> On Oct 27, 2006, at 7:39 AM, Jeff Squyres wrote:
>
> > On Oct 25, 2006, at 10:37 AM, Josh Hursey wrote:
> >
> >> The discussion started with the bug characteristics of v1.2 versus
> >> the trunk.
> >
> > Gotcha.
> >
> >> It seemed from the call that IU was the only institution that can
> >> asses this via MTT as noone else spoke up. Since people were
> >> interested in seeing things that were breaking I suggested that I
> >> start forwarding the IU internal MTT reports (run nightly and
> >> weekly) to the testing_at_open-mpi.org. This was meet by Brain
> >> insisting that it would result in "thousands" of emails to the
> >> development list. I clarified that it is only 3 - 4 messages a day
> >> from IU. However if all other institutions do this then it would be
> >> a bunch of email (where 'a bunch' would still be less than
> >> 'thousands'). That's how we got to a 'we need a single summary
> >> presented to the group' comment. It should be noted that we brought
> >> up IU sending to the 'testing_at_[hidden]' list as a bandaid until
> >> MTT could do it better.
> >
> > How about sending them to me and Ethan?
>
> Sure I can add you both to the list if you like.
>
> >
> >> This single summary can be email or a webpage that people can
> >> check. Rich said that he would prefer a webpage, and noone else
> >> really had a comment. That got us talking about the current summary
> >> page that MTT generates. Tim M mentioned that the current website
> >> is difficult to figure out how to get the answers you need. I
> >> agree, it is hard [usability] for someone to go to the summary page
> >> and answer the question "So what failed from IU last night, and how
> >> does that differ from Yesterday -- e.g., what regressed and
> >> progressed yesterday at IU?". The website is flexible enough to due
> >> it, but having a couple of basic summary pages would be nice for
> >> basic users. What that should look like we can discuss further.
> >
> > Agreed; we aren't super-fond of the current web page, either. Do you
> > guys want to have a teleconf to go over the current status of MTT,
> > where you want it to go, etc.? I consider IU's input here quite
> > important, since you're the ones pushing the boundaries, flexing
> > MTT's muscles, etc.
>
> In my previous email I suggested a couple of questions that I would
> like a webpage to answer. A teleconf might be good to talk about some
> of the various items that IU is trying to do around MTT.
>
> >
> >> The IU group really likes the emails that we currently generate. A
> >> plain-text summary of the previous run. I posted copies on the MTT
> >> bug tracker here:
> >> http://svn.open-mpi.org/trac/mtt/ticket/61
> >> Currently we have not put the work in to aggregate the runs, so for
> >> each INI file that we run we get 1 email to the IU group. This is
> >> fine for the moment, but as we add the rest of the clusters and
> >> dimensions in the testing matrix we will need MTT to aggregate the
> >> results for us and generate such an email.
> >
> > Ok.
> >
> > We created another ticket yesterday to make a new MTT Reporter (our
> > internal plugins) that duplicates this output format. It actually
> > shouldn't be that hard -- we don't have to do parsing to get the
> > numbers that you're reporting; we have access to the actual data. So
> > it's mostly caching the data, calculating the totals that you're
> > calculating, and printing in your output format.
> >
> > Ethan has some other short tasks to do before he gets to this, but
> > its near the top of the priority list. You can see the current
> > workflow on the wiki (this is a living document; it keeps changing as
> > requirements, etc. change):
> >
> > http://svn.open-mpi.org/trac/mtt/wiki/TaskPlan
> >
>
> Awesome Thanks! :)
>
> >> So I think the general feel of the discussion is that we need the
> >> following from MTT:
> >> - A 'basic' summary page providing answers to some general
> >> frequently asked queries. The current interface is too advanced for
> >> the current users.
> >
> > We have the summary.php page, but I personally have never found it
> > too useful. :-)
> >
> > We're getting towards a full revamp of reporter.php (got some other
> > tasks to complete first, but we're definitely starting to think about
> > it) -- got any ideas / input? Our "haven't thought about it much
> > yet" idea is to be more menu/Q-A driven with a few common queries
> > easily available (rather than a huge, complicated single screen).
>
> See previous email for some general ideas. Tim M might have a few
> more that he would like to see since he is the one at IU that is
> watching the nightly results the closest.
>
> >
> >> - A summary email [in plain-text preferably] similar to the one
> >> that IU generated showing an aggregation of the previous nights
> >> results for (a) all reporters (b) my institution [so I can track
> >> them down and file bugs].
> >
> > For the moment, we don't have the dynamic capability for you to login
> > to the web page, create a report, and say "mail this to me nightly".
> > However, Ethan can make up custom reports on the server quite easily
> > -- if you want some IU-specific reports, just file a ticket and
> > Ethan can Make It So.
> >
>
> Cool. We'll talk it over and see what we would like.
>
>
> >> - 1 email a day on the previous nights testing results.
> >
> > That's what we intended for the mails that are coming today, but it
> > seemed to not be sufficient -- we ended up with 4 nightly mails, one
> > for each relevant phase failures and a 4th for showing stderr of mpi
> > installs.
> >
> >> Some relevant bugs currently in existence:
> >> http://svn.open-mpi.org/trac/mtt/ticket/92
> >> http://svn.open-mpi.org/trac/mtt/ticket/61
> >> http://svn.open-mpi.org/trac/mtt/ticket/94
> >>
> >>
> >> The other concern is that given the frequency of testing as bugs
> >> appear from the testing someone needs to make sure the bug tracker
> >> is updated. I think the group is unclear about how this is done.
> >> Meaning when a MTT identifies a test as failed whom is responsible
> >> for putting the bug in the bug tracker?
> >
> > At the moment, I've been manually examining the mails every day and
> > firing off e-mails to those responsible. However, due to travel last
> > week and this week, I've gotten quite behind. :-(
>
> I wonder if there is a way to do something more automated. Probably
> too advanced for MTT 2.0 or 3.0, but something to think about. Maybe
> tie it in with the bug tracker, so send a "Bug Master Engineer" an
> aggregated list of failures that can be easily put into TRAC. Donno..
> just an idea to help take the burden off of you.
>

I think for starters, we want to at least tie in trac links
anywhere on the webpage an ompi rev number is referenced.
Then given a changeset, we can figure out contact info ... ?

> >
> >> The obvious solution is the institution that identified the bug.
> >> [Warning: My opinion] But then that becomes unwieldy for IU since
> >> we have a large testing matrix, and would need to commit someone to
> >> doing this everyday (and it may take all day to properly track a
> >> set of bugs). Also this kind of punishes an institution for testing
> >> more instead of providing incentive to test.
> >
> > True. I don't know the proper answer to this, either -- I know the
> > "Jeff look at e-mail" solution doesn't scale well.
> >
> >> ------ Page Break -- Context switch ------
> >>
> >> In case you all want to know what we are doing here at IU. I
> >> attached to this email our planed MTT testing matrix. Currently we
> >> have BigRed and Odin running the complete matrix less the BLACS
> >> tests. Wotan and Thor will come online as we get more resources to
> >> support them.
> >>
> >> In order to do such a complex testing matrix we have various .INI
> >> files that we use. And since some of the dimensions in the matrix
> >> are large we break some of the tests into a couple .INI files that
> >> are submitted concurrently to have them run in a reasonable time.
> >>
> >> <MTT-testing-matrix.txt>
> >
> > Awesome.
> >
> > I would like to schedule some phone time with you guys and Ethan and
> > me to talk about what's working, what's not working, etc. One
> > obvious question I have is: is the INI config file format suitable?
> > Do we need to do something more complex that would allow
> > consolidation of your various configurations? ...etc.
>
> Tim M and I spent the better part of two days revamping our current
> setup to do some more 'advanced' things (Parallel builds, etc...). We
> are putting all of these scripts in ompi-tests/iu/mtt in case anyone
> wants to see how we are doing it and use that as an example for doing
> something similar.
>
> Basically our problems are:
> - Testing results come in at various times as they complete, we
> would really like a 'status report' at 8 am every day finished or not.

For now, could you load up this webpage everyday at 8am?

http://tinyurl.com/ydt777

> - Due to the combinatorial effect of MTT this lends itself to some
> obvious parallelism. Can we harness that to reduce the time to
> complete the testing cycle.

I brought that up at the last MTT "Developers Conference" :) TET
does it. Sounds like a 3.0 (or 4.0) thing.

> - We will soon have 4 clusters [wotan, bigred, odin, thor] each
> running 3 branches [trunk, v1.2, v1.1], 2 different builds [64 bit
> gcc, 32 bit gcc] every night! That 24 sets of the nightly tests, and
> we have biweekly tests in there as well :o. That means a lot of INI
> files that basically say the same thing.
>
> What we are trying to do:
> - Generalize the INI files with default sets that can be plugged in.

FWIW, I've been able to generalize INI sections by splitting
them out into seperate INI files. E.g., [Reporter], [Mpi
get: trunk], [Test Run: trivial], etc. are always the same,
but I have four different [MPI install] sections (for each
combo of 32/64-bit and Sparc/i386). I can then cat specific
INI files (some contain single INI sections) into client/mtt
with the '-' option for whichever configuration I'm testing.
Also, I wonder if --[no]-section and command-line INI param
overrides would help? (see
http://svn.open-mpi.org/trac/mtt/attachment/ticket/61/do_mtt.pl).

> - Make the scripts more general so they can be used easily across
> all clusters
> - Reduce the number of emails to at most 2 from the nightly runs
> per cluster [Progress, and Final] -- we are not using SLURM, LL, and
> a hostlist in or runs.
> - Increase the parallelism per stage as much as possible, in as
> general a way as possible.
> - 8 am (or 10 am) status report from our script to check on the run
> as it goes.
>
> We already have a list of refinements that we would like to add to
> this new script setup, but those are a bit more advanced (e.g., using
> a mgr/worker model to use allocations as they become available, using
> a queue to order the tests into the most important, etc.)
>
> One thing that would be nice if MTT could do, but would be initially
> institution specific would be a custom trigger for aggregation from
> the MTT server. The problem is that we currently get 2 emails from
> each cluster every night (this does not include the weekly runs) so
> that will be 8 emails a day, which can be a bit hard to parse. If we
> put the aggregation code close to the server (or just had a way for
> us to query the DB from the IU side via ODBC stuff) then we could
> have the aggregation function generate 2 emails which include results
> from all clusters. So 2 giant emails instead of 8 smaller emails.
> Just an idea, but if you gave me the information so that I can send
> queries to the MTT database I can mock up something and we can all
> experiment with it to see if we can generalize a bit. Obviously the
> 'guest' user access to the DB that this aggregation function will use
> would only have read access since we don't want it modifying the DB.
>
> So generally yeah I think we would like to have a teleconf to talk
> about our experiences with MTT, and what we have done around it to
> fit our needs. We realize that we are pushing it a bit further than
> others, so we are fine with doing a home brewed solution for a while
> until MTT is able to replicate the functionality.

When does everyone want to talk?

-Ethan

>
> Thanks!
> Josh
>
> >
> > --
> > Jeff Squyres
> > Server Virtualization Business Unit
> > Cisco Systems
> >
> > _______________________________________________
> > mtt-users mailing list
> > mtt-users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
>
> ----
> Josh Hursey
> jjhursey_at_[hidden]
> http://www.open-mpi.org/
>
> _______________________________________________
> mtt-users mailing list
> mtt-users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users