Date view	Thread view	Subject view	Author view

From: Josh Hursey (jjhursey_at_[hidden])
Date: 2006-10-27 10:06:38

Next message: Josh Hursey: "Re: [MTT users] Discussion on teleconf yesterday?"
Previous message: Jeff Squyres: "Re: [MTT users] Discussion on teleconf yesterday?"
In reply to: Ethan Mallove: "Re: [MTT users] Discussion on teleconf yesterday?"
Next in thread: Jeff Squyres: "Re: [MTT users] Discussion on teleconf yesterday?"

On Oct 25, 2006, at 1:30 PM, Ethan Mallove wrote:

> On Wed, Oct/25/2006 10:37:31AM, Josh Hursey wrote:
>> The discussion started with the bug characteristics of v1.2 versus
>> the trunk.
>>
>> It seemed from the call that IU was the only institution that can
>> asses this via MTT as noone else spoke up. Since people were
>> interested in seeing things that were breaking I suggested that I
>> start forwarding the IU internal MTT reports (run nightly and weekly)
>> to the testing_at_open-mpi.org. This was meet by Brain insisting that it
>> would result in "thousands" of emails to the development list. I
>> clarified that it is only 3 - 4 messages a day from IU. However if
>> all other institutions do this then it would be a bunch of email
>> (where 'a bunch' would still be less than 'thousands'). That's how we
>> got to a 'we need a single summary presented to the group' comment.
>> It should be noted that we brought up IU sending to the
>> 'testing_at_open-
>> mpi.org' list as a bandaid until MTT could do it better.
>>
>> This single summary can be email or a webpage that people can check.
>> Rich said that he would prefer a webpage, and noone else really had a
>> comment. That got us talking about the current summary page that MTT
>> generates. Tim M mentioned that the current website is difficult to
>> figure out how to get the answers you need. I agree, it is hard
>> [usability] for someone to go to the summary page and answer the
>> question "So what failed from IU last night, and how does that differ
>> from Yesterday -- e.g., what regressed and progressed yesterday at
>> IU?". The website is flexible enough to due it, but having a couple
>> of basic summary pages would be nice for basic users. What that
>> should look like we can discuss further.
>>
>> The IU group really likes the emails that we currently generate. A
>> plain-text summary of the previous run. I posted copies on the MTT
>> bug tracker here:
>> http://svn.open-mpi.org/trac/mtt/ticket/61
>> Currently we have not put the work in to aggregate the runs, so for
>> each ini file that we run we get 1 email to the IU group. This is
>> fine for the moment, but as we add the rest of the clusters and
>> dimensions in the testing matrix we will need MTT to aggregate the
>> results for us and generate such an email.
>>
>> So I think the general feel of the discussion is that we need the
>> following from MTT:
>> - A 'basic' summary page providing answers to some general
>> frequently asked queries. The current interface is too advanced for
>> the current users.
>
> Sounds like summary.php does not suffice.
>
> http://www.open-mpi.org/mtt/summary.php

Yea. I think the feeling is that the DB query page is too complex,
and the summary page is too all inclusive making is difficult to find
the answers to questions that the IU group asks every day of MTT.

>
> One thought is that we've already iterated on the summary,
> and my understanding was that the summary wasn't intended to
> satisfy any one particular user's needs. By MTT 2.0, I think
> Reporter should be flexible and usable enough to do that.

I figured this was a future work item for MTT. Until then we have
been scripting around it doing a 'home brewed' version which is what
I sent you all.

> Do
> we need more summary pages? If various folks need more
> specific reports than what summary.php provides, I'd be
> happy to do the cut-n-paste into the nightly emailer to give
> them those reports, while we get custom email alerts going
> (in MTT 2.0 or whenever).
>

Maybe is we come up with some specific questions that most users of
MTT would ask we could have a couple more targeted summary pages that
are shorter, simplier, and easier to parse.
We can go around and find questions but a few that I can think of are
(for a specific institution and/or machine):
  - What test dimensions ran last night? [trunk + 32 bit + Intel,
trunk + 64 bit + IBM, ...]
  - What failed last night at this institution or on this machine?
  - [A bit more complex] How do the failures from last night compare
to the previous report for this testing dimension?
Failures from:
  Last Night | Previous Entry (Likely yesterday)
--------------+----------------------------------
  ibm/spawn | -- Passed --
   -- Passed-- | trivial/hello_c
  ibm/gather | ibm/gather
  ... | ....

That would tell us the obvious regressions, and if any of the MTT
scripts failed the night before. All institution or machien specific.

>
>> - A summary email [in plain-text preferably] similar to the one
>> that IU generated showing an aggregation of the previous nights
>> results for (a) all reporters (b) my institution [so I can track them
>> down and file bugs].
>
>
> Could some or all of print-results.pl (which was attached to
> #61) be inserted into lib/MTT/Reporter/Email.pm, so that all
> mtt users can use what you have (if they need something
> similar to your email reports)? At first glance, it looks
> like a lot of print-results.pl is generalized enough for
> that.

Sure. Feel free to use any of those scripts as you like. I just want
to say that MTT is running great, there are a few features that we
need for our setup that we had to script around it in order to do,
but the core is still very much MTT.

>
>
>> - 1 email a day on the previous nights testing results.
>>
>> Some relevant bugs currently in existence:
>> http://svn.open-mpi.org/trac/mtt/ticket/92
>> http://svn.open-mpi.org/trac/mtt/ticket/61
>> http://svn.open-mpi.org/trac/mtt/ticket/94
>
>
> http://svn.open-mpi.org/trac/mtt/ticket/70 (show "new"
> failures) is also relevant for getting a quick and automated
> info on what regressed/progressed last night.

Cool Thanks. :)

-- Josh

>
> -Ethan
>
>
>>
>>
>> The other concern is that given the frequency of testing as bugs
>> appear from the testing someone needs to make sure the bug tracker is
>> updated. I think the group is unclear about how this is done. Meaning
>> when a MTT identifies a test as failed whom is responsible for
>> putting the bug in the bug tracker?
>> The obvious solution is the institution that identified the bug.
>> [Warning: My opinion] But then that becomes unwieldy for IU since we
>> have a large testing matrix, and would need to commit someone to
>> doing this everyday (and it may take all day to properly track a set
>> of bugs). Also this kind of punishes an institution for testing more
>> instead of providing incentive to test.
>>
>> ------ Page Break -- Context switch ------
>>
>> In case you all want to know what we are doing here at IU. I attached
>> to this email our planed MTT testing matrix. Currently we have BigRed
>> and Odin running the complete matrix less the BLACS tests. Wotan and
>> Thor will come online as we get more resources to support them.
>>
>> In order to do such a complex testing matrix we have various .ini
>> files that we use. And since some of the dimensions in the matrix are
>> large we break some of the tests into a couple .ini files that are
>> submitted concurrently to have them run in a reasonable time.
>>
>> | BigRed | Odin | Thor | Wotan
>> -----+----------+----------+--------+------
>> Sun |N |N | IMB | BLACS
>> Mon |N BLACS |N |N |N
>> Tues |N |N IMB* |N |N
>> Wed |N IMB* |N |N |N
>> Thur |N |N BLACS |N |N
>> Fri |N |N |N |N
>> Sat |N Intel* |N Intel* | BLACS | IMB
>>
>> N = Nightly run
>> * = Large runs
>> All runs start at 2 am on the day listed.
>>
>> =====================
>> BigRed
>> =====================
>> Nightly
>> -------
>> - Branches: trunk, v1.2
>> - Configurations: All 64 and 32 bit builds
>> * MX, LoadLeveler, No debug, gcc 3.x
>> - Test Suites
>> * Trivial
>> * IBM suite
>> - Processing Elements/tasks/cores/...
>> * # < 8 hours
>> * 7 nodes/28 tasks [to start with]
>> - Runtime Parameters
>> * PML ob1/BTL mx,sm,self
>> * PML cm /MTL mx
>>
>> Weekly: Monday 2am Submission
>> -------------------------------------
>> - Branches: trunk, v1.2
>> - Configurations: All 64 and 32 bit builds
>> * MX, LoadLeveler, No debug, gcc 3.x
>> - Test Suites
>> * BLACS
>> - Processing Elements/tasks/cores/...
>> * # < 1 days
>> * 32 nodes/128 tasks [to start with]
>> - Runtime Parameters
>> * PML ob1/BTL mx,sm,self
>> * PML cm /MTL mx
>>
>> Weekly: Wednesday 2am Submission
>> -------------------------------------
>> - Branches: trunk, v1.2
>> - Configurations: All 64 and 32 bit builds
>> * MX, LoadLeveler, No debug, gcc 3.x
>> - Test Suites
>> * IMB
>> - Processing Elements/tasks/cores/...
>> * # < 1 days
>> * 32 nodes/128 tasks [to start with]
>> - Runtime Parameters
>> * PML ob1/BTL mx,sm,self
>> * PML cm /MTL mx
>>
>> Weekly: Saturday 2am Submission
>> ----------------------------------
>> - Branches: trunk, v1.2
>> - Configurations: All 64 and 32 bit builds
>> * MX, LoadLeveler, No debug, gcc 3.x
>> * MX, LoadLeveler, No debug, gcc 4.x
>> - Trivial only
>> * MX, LoadLeveler, No debug, IBM compiler
>> - Trivial only
>> - Test Suites
>> * Intel
>> - Processing Elements/tasks/cores/...
>> * # < 1 days
>> * 32 nodes/128 tasks [to start with]
>> - Runtime Parameters
>> * PML ob1/BTL mx,sm,self
>> * PML cm /MTL mx
>>
>> =====================
>> Odin (128 dual processor machines)
>> =====================
>> Nightly
>> -------
>> - Branches: trunk, v1.2
>> - Configurations: All 64 and 32 bit builds
>> * No debug, gcc 3.x
>> - Test Suites
>> * Trivial
>> * IBM suite
>> * Intel
>> - Processing Elements/tasks/cores/...
>> * # < 8 hours
>> * 8 nodes/16 tasks [to start with]
>> - Runtime Parameters
>> * PML ob1/BTL tcp,sm,self
>>
>> Weekly: Tuesday 2am Submission
>> -------------------------------------
>> - Branches: trunk, v1.2
>> - Configurations: All 64 and 32 bit builds
>> * No debug, gcc 3.x
>> - Test Suites
>> * IMB
>> - Processing Elements/tasks/cores/...
>> * # < 1 day
>> * 32 nodes/64 tasks
>> - Runtime Parameters
>> * PML ob1/BTL tcp,sm,self
>>
>> Weekly: Thursday 2am Submission
>> -------------------------------------
>> - Branches: trunk, v1.2
>> - Configurations: All 64 and 32 bit builds
>> * No debug, gcc 3.x
>> - Test Suites
>> * BLACS
>> - Processing Elements/tasks/cores/...
>> * # < 1 day
>> * 32 nodes/64 tasks
>> - Runtime Parameters
>> * PML ob1/BTL tcp,sm,self
>>
>> Weekly: Saturday 2am Submission
>> ----------------------------------
>> - Branches: trunk, v1.2
>> - Configurations: All 64 and 32 bit builds
>> * No debug, gcc 3.x
>> - Test Suites
>> * Intel
>> - Processing Elements/tasks/cores/...
>> * # < 1 day
>> * 32 nodes/64 tasks
>> - Runtime Parameters
>> * PML ob1/BTL tcp,sm,self
>>
>> =====================
>> Thor (8 dual processor nodes)
>> =====================
>> Nightly
>> -------
>> - Branches: trunk, v1.2
>> - Configurations: All 32 bit builds
>> * No debug, gcc 3.x
>> * No debug, ICC
>> - Test Suites
>> * Trivial
>> * IBM suite
>> * Intel
>> - Processing Elements/tasks/cores/...
>> * # < 8 hours
>> * 4 nodes/8 tasks
>> - Runtime Parameters
>> * PML ob1/BTL mx,mvapi,tcp,sm,self
>>
>> Weekly: Saturday 2am Submission
>> -------------------------------------
>> - Branches: trunk, v1.2
>> - Configurations: All 32 bit builds
>> * No debug, gcc 3.x
>> * No debug, ICC
>> - Test Suites
>> * BLACS
>> - Processing Elements/tasks/cores/...
>> * # < 1 day
>> * 4 nodes/8 tasks
>> - Runtime Parameters
>> * PML ob1/BTL mx,mvapi,tcp,sm,self
>>
>> Weekly: Sunday 2am Submission
>> ----------------------------------
>> - Branches: trunk, v1.2
>> - Configurations: All 32 bit builds
>> * No debug, gcc 3.x
>> * No debug, ICC
>> - Test Suites
>> * IMB
>> - Processing Elements/tasks/cores/...
>> * # < 1 day
>> * 4 nodes/8 tasks
>> - Runtime Parameters
>> * PML ob1/BTL mx,mvapi,tcp,sm,self
>>
>> =====================
>> Wotan (16 dual processor machine)
>> =====================
>> Nightly (Not Sat or Sun)
>> -------
>> - Branches: trunk, v1.2
>> - Configurations: All 64 and 32 bit builds
>> * No debug, gcc 3.x
>> - Test Suites
>> * Trivial
>> * IBM suite
>> * Intel
>> - Processing Elements/tasks/cores/...
>> * # < 8 hours
>> * 8 nodes/16 tasks
>> - Runtime Parameters
>> * PML ob1/BTL mvapi,tcp,sm,self
>>
>> Weekly: Saturday 2am Submission
>> -------------------------------------
>> - Branches: trunk, v1.2
>> - Configurations: All 64 and 32 bit builds
>> * No debug, gcc 3.x
>> - Test Suites
>> * IMB
>> - Processing Elements/tasks/cores/...
>> * # < 1 day
>> * 16 nodes/32 tasks
>> - Runtime Parameters
>> * PML ob1/BTL mvapi,tcp,sm,self
>>
>> Weekly: Sunday 2am Submission
>> ----------------------------------
>> - Branches: trunk, v1.2
>> - Configurations: All 64 and 32 bit builds
>> * No debug, gcc 3.x
>> - Test Suites
>> * BLACS
>> - Processing Elements/tasks/cores/...
>> * # < 1 day
>> * 16 nodes/32 tasks
>> - Runtime Parameters
>> * PML ob1/BTL mvapi,tcp,sm,self
>>
>
>>
>> Questions? Thoughts?
>>
>> -- Josh
>>
>> On Oct 25, 2006, at 8:37 AM, Jeff Squyres wrote:
>>
>>> Looking over Len's minutes from yesterday, I see that there was a
>>> bunch of discussion about MTT on the OMPI teleconf yesterday, but
>>> neither Ethan nor I were there to be a part of it. :-\
>>>
>>> I couldn't make much sense from Len's minutes:
>>>
>>> -----
>>> - having some trouble with MTT config, so will try to look more
>>> closely at some of these failures
>>> - instead of e-mails sending them to the testing at MTT list
>>> - plenty of internal IU e-mail, better to have one summary e-mail
>>> each day
>>> - cannot send a summary
>>> - send to mtt list and digest it
>>> - or you can just file bugs
>>> - can't use mtt web site to get the info
>>> -----
>>>
>>> What is IU requesting? Who can't use the MTT web site to get info?
>>> What info are you trying to get / why can't you get it?
>>>
>>> Should we have a teleconf about MTT stuff?
>>>
>>> I'm on travel and unavailable all today, but have time tomorrow
>>> (Thurs).
>>>
>>> --
>>> Jeff Squyres
>>> Server Virtualization Business Unit
>>> Cisco Systems
>>>
>>> _______________________________________________
>>> mtt-users mailing list
>>> mtt-users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
>>
>
>> _______________________________________________
>> mtt-users mailing list
>> mtt-users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
>
> _______________________________________________
> mtt-users mailing list
> mtt-users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users

----
Josh Hursey
jjhursey_at_[hidden]
http://www.open-mpi.org/

Next message: Josh Hursey: "Re: [MTT users] Discussion on teleconf yesterday?"
Previous message: Jeff Squyres: "Re: [MTT users] Discussion on teleconf yesterday?"
In reply to: Ethan Mallove: "Re: [MTT users] Discussion on teleconf yesterday?"
Next in thread: Jeff Squyres: "Re: [MTT users] Discussion on teleconf yesterday?"

Date view	Thread view	Subject view	Author view