On Nov 3, 2010, at 9:10 PM, Jeff Squyres wrote:
> Ethan / Josh --
> The HDF guys are interested in potentially using MTT.
I just forwarded a message to the mtt-devel list about some work at IU to use MTT to test the CIFTS FTB project. So maybe development between these two efforts can be mutually beneficial.
> They have some questions about the database. Can you guys take a whack at answering them? (be sure to keep the CC, as Elena/Quincey aren't on the list)
> On Nov 3, 2010, at 1:29 PM, Quincey Koziol wrote:
>> Lots of interest here about MTT, thanks again for taking time to demo it and talk to us!
> Glad to help.
>> One lasting concern was the slowness of the report queries - what's the controlling parameter there? Is it the number of tests, the size of the output, the number of configurations of each test, etc?
> All of the above. On a good night, Cisco dumps in 250k test runs to the database. That's just a boatload of data. End result: the database is *HUGE*. Running queries just takes time.
> If the database wasn't so huge, the queries wouldn't take nearly as long. The size of the database is basically how much data you put into it -- so it's really a function of everything you mentioned. I.e., increasing any one of those items increases the size of the database. Our database is *huge* -- the DB guys tell me that it's lots and lots of little data (with blobs of stdout/stderr here an there) that make it "huge", in SQL terms.
> Josh did some great work a few summers back that basically "fixed" the speed of the queries to a set speed by effectively dividing up all the data into month-long chunks in the database. The back-end of the web reporter only queries the relevant month chunks in the database (I think this is a postgres-specific SQL feature).
> Additionally, we have the DB server on a fairly underpowered machine that is shared with a whole pile of other server duties (www.open-mpi.org, mailman, ...etc.). This also contributes to the slowness.
Yeah this pretty much sums it up. The current Open MPI MTT database is 141 GB, and contains data as far back as Nov. 2006. The MTT Reporter takes some of this time just to convert the raw database output into pretty HTML (it is currently written in PHP). At the bottom of the MTT Reporter you will see some stats on where the Reporter took most of its time.
How long the Reporter took total to return the result is:
Total script execution time: 24 second(s)
How long just the database query took is reported as:
Total SQL execution time: 19 second(s)
We also generate an overall contribution graph which is also linked at the bottom to give you a feeling of the amount of data coming in every day/week/month.
Jeff mentioned the partition tables work that I did a couple summers ago. The partition tables help quite a lot by partitioning the data into week long chunks so shorter date ranges will be faster than longer date ranges since they pull a smaller table with respect to all of the data to perform a query. The database interface that the MTT Reporter uses is abstracted away from the partition tables, it is really just the DBA (I guess that is me these days) that has to worry about their setup (which is usually just a 5 min task once a year). Most of the queries to MTT ask for date ranges like 'past 24 hours', 'past 3 days' so breaking up the results by week saves some time.
One thing to also notice is that usually the first query through the MTT Reporter is the slowest. After that first query the MTT database (postgresql in this case) it is able to cache some of the query information which should make subsequent queries a little faster.
But the performance is certainly not where I would like it, and there are still a few ways to make it better. I think if we moved to a newer server that is not quite as heavily shared we would see a performance boost. Certainly if we added more RAM to the system, and potentially a faster disk array that would improve the performance. I think there are still a few things that I can do to the database schema to improve common queries. Better normalization of incoming data would certainly help things. There are likely also some places in the current MTT Reporter where performance might be improved on the sorting/rendering side of things.
The text blobs (database fields of variable string length) for stderr/stdout should not be contributing to the problem. Most recent databases (and postgresql in particular does this) will be able to optimize the performance these fields so that they have the same performance as referencing small fixed length strings, with regard to the SQL query.
So in short. Most of the slowness is due to: (1) shared server environment hosting a number of active projects, (2) volume of existing data. There are some places to improve things, but we haven't had the cycles yet to investigate them too much.
>> For example, each HDF5 build includes on the order of 100 test executables, and we run 50 or so configurations each night. How would that compare with the OpenMPI test results database?
> Good question. I'm CC'ing the mtt-devel list to see if Josh or Ethan could comment on this more intelligently than me -- they did almost all of the database work, not me.
> I'm *guessing* that it won't come anywhere close to the size of the Open MPI database (we haven't trimmed the data in the OMPI database since we started gathering data in the database several years ago).
An interesting site that might be useful to give you a feeling of the volume and type of data being submitted is the 'stats' page: www.open-mpi.org/mtt/stats
We don't publicly link to this page since it is not really useful for anyone except MTT maintainers.
I have a script that maintains stats on the database that we can use as a metric. It is a special table in the database that is updated about every night. It is a nice way to get insight into the distribution of testing (for instance about 90 % of Open MPI testing is on Linux, 8 % on Solaris, 1 % on each of OS X and cygwin).
For example, on Oct. 25, 2010 (put '2010-10-25 - 2010-10-25' in the Date Range) there were:
691 MPI Install variations
658 Test Builds
78,539 Test Run results
437 Performance results
Since MTT has the capability to tell if there is a 'new' tarball to test or not, some organizations (like Cisco) only run MTT when there is a new tarball while others (like IU) run every night even if it is against an old tarball.
So the current database is holding today about 186 million test records. The weekly contribution normally ranges from 1.25 - 0.5 million tests submitted (range depends on how many 'new' tarballs are created in the week).
Hopefully my comments help more than confuse. If it would be useful to chat on the phone sometime, I'm sure we could setup something.
> Jeff Squyres
> For corporate legal information go to:
> mtt-devel mailing list
Postdoctoral Research Associate
Oak Ridge National Laboratory