Open MPI logo

MTT Devel Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all MTT Devel mailing list

Subject: Re: [MTT devel] Reporter Slowness
From: Josh Hursey (jjhursey_at_[hidden])
Date: 2008-01-31 12:00:07


Ok so the script is done. It took a bit longer than I had expected,
but when it finished then things sped back up ('24 hours' of data in 6
sec). There are a few more maintenance operations I want to run which
will help out a bit more, but I'll push those to this weekend.

Thanks for your patience, and let me know if it feels sluggish again.
So as of this email things should be back to normal.

Cheers,
Josh

On Jan 30, 2008, at 5:09 PM, Josh Hursey wrote:

> I've started the script running.
>
> Below is a short version, and a trilogy of the gory details. I wanted
> to write up the details so if it ever happens again to us (or someone
> else) they can see what we did to fix it.
>
>
> The Short Version:
> ------------------
> The Slowness(tm) was caused by the recent shifting of data in the
> database to resolve the partition table problems seen earlier this
> month.
>
> The bad news is that it will take about 14 hours to finish.
>
> The good news is that I confirmed that this will fix the performance
> problem that we are seeing. In the small run this technique reduce the
> '24 hour' query execution time from ~40 sec back down to ~8 sec.
>
> This may slow down client submits this evening, but should not prevent
> them from being able to submit. The 'DELETE' operations do not require
> an exclusive lock, so the 'INSERT' operations should proceed fine
> concurrently. The 'INSERT' operations will need to be blocked while
> the 'VACUUM FULL' operation is progressing since it *does* require an
> exclusive lock. The 'INSERT' operations will proceed normally once
> this lock is released resulting in a temporary slowdown for clients
> that submit during these windows of time (about 20 min or so).
>
>
>
> The Details: Part 1: What I did earlier this week:
> (more than you wanted to know for prosperity purposes)
> --------------------------------------------------
> The original problem was that the master partition tables accidently
> started storing data because I forgot to load the 2008 partition
> tables into the database before the first of the year. :( So we loaded
> the partition tables, but we still needed to move the misplaced data.
>
> To move the misplaced data we have to duplicate the row (so it is
> stored properly this time), but we also need to take care in assigning
> row IDs to the duplicate rows. We cannot give the dup'ed rows the same
> ID or we will be unable to differentiate the original and the dup'ed
> row. So I created a dummy table for mpi_install/test_build/test_run to
> translate between the orig row ID and the dup'ed row ID. I used the
> nextval on the sequence to populate the values for the dup'ed rows in
> the dummy table.
>
> Now that I had translation I joined the dummy table with it's
> corresponding master table (e.g. "mpi_install join mpi_install_dummy
> on mpi_install.mpi_install_id = mpi_install_dummy.orig_id"), and
> instead of selecting the original ID from the dummy table I selected
> the new dup'ed ID. I inserted this selection back in to the
> mpi_install table. (Cool little trick that PostgreSQL lets you get
> away with sometimes).
>
> Once I have duplicated all of the effected rows, then I updated all
> references to the original ID and set it to the duplicated ID in the
> test_build/test_run tables. This removed all internal reference to the
> original ID, and replaced it with the duplicated so we retain
> integrity of the data.
>
> Once I have verified that no tables references the original row I
> delete those rows from the mpi_install/test_build/test_run tables.
>
>
>
> The Details: Part 2: What I forgot to do:
> -----------------------------------------
> When rows are deleted from PostgreSQL the disk space used continues to
> be reserved for this table, and is not reclaimed unless you 'VACUUM
> FULL' this table. PostgreSQL does this for many good reasons which are
> described in their documentation. However in the case of the master
> partition tables we want them to release all of their disk space since
> we should never be storing data in this particular table.
>
> I did a 'VACUUM FULL' on the mpi_install and test_build tables
> originally, but did not do it on the test_run table since this
> operation requires an exclusive lock on the table and can take a long
> time to finish. Further I only completed about 1% of the deletions for
> test_run before I stopped this operation choosing to wait for the
> weekend since it will take a long time to complete.
>
> By only deleting part of the test_run master table (which contained
> about 1.2 Million rows) this caused the queries on this table to slow
> down considerably. The Query Planner estimated the execution of the
> '24 hour' query at 322,924 and it completed in about 40 seconds. I ran
> 'VACUUM FULL test_run' which only Vacuums the master table, and then
> re-ran the query. This time the Query Planner estimated the execution
> at 151,430 and it completed in about 8 seconds.
>
>
>
> The Details: Part 3: What I am doing now:
> -----------------------------------------
> Currently I am deleting the rest of the old rows from test_run. There
> are approx. 1.2 million rows, and this should complete in about 13
> hours.
>
> After every 100 K deletions I'm running a 'VACUUM FULL' on test_run.
> My hope is that by doing it this way instead of just once at the end
> of all 1.2 M will cause each one to take less time. I hope this will
> limit the interruptions seen by the MTT clients submitting results
> this evening.
>
> I'll send email once the script is complete, and things seem back to
> normal.
>
> Cheers,
> Josh
>
> On Jan 30, 2008, at 4:12 PM, Jeff Squyres wrote:
>
>> I'd go ahead and do it now.
>>
>> On Jan 30, 2008, at 4:04 PM, Josh Hursey wrote:
>>
>>> It seems the reporter has gotten slower :( Now it is working in the
>>> range of 40 - 50 seconds for the 24 hour query which is not
>>> reasonable. This should be much lower.
>>>
>>> Looking at the explain of the query I have some ideas on how to make
>>> things better, but this will slow things down a for a while as I do
>>> this work (maybe a day or two, can't say for sure).
>>>
>>> The question is should I wait until Friday COB to start this or
>>> should
>>> I do it immediately?
>>>
>>> Let me know,
>>> Josh
>>> _______________________________________________
>>> mtt-devel mailing list
>>> mtt-devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel
>>
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>> _______________________________________________
>> mtt-devel mailing list
>> mtt-devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel
>
> _______________________________________________
> mtt-devel mailing list
> mtt-devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel