From: Andrew Friedley (afriedle_at_[hidden])
Date: 2006-06-29 11:35:44


Jeff Squyres (jsquyres) wrote:
> The apache upload limit is there for a good reason (hogging of
> resources), so instead of simply increasing it, perhaps we should be a
> bit smarter about how we send data back. Indeed, every web server is
> going to have some finite limit. And realistically, we should really
> try to send [much] less data than the upload limit, anyway.
>
> Right now, perfbase.php is basically a pass-through to the back-end
> perfbase command line application. Whatever we receive via HTTP is sent
> back to perfbase for processing. I'm wondering if we should insert a
> thin processing layer *before* perfbase is invoked. Something that's
> relatively simple, but could save us on bandwidth and apache processing
> (which is always good; even though IU is not lacking in bandwidth,
> milliways is also our production mail and web server, so cutting down
> its work is always a Good Thing).
>
> I'm thinking something like the following:
>
> - the client takes its total data that it needs to submit for one
> perfbase action, compresses it (since text compresses extremely well),
> and breaks it up into multiple uploads if necessary. It serially
> uploads each portion of the compressed file.
>
> - the server php becomes even dumber than it is now: it simply accepts
> the upload and puts it into a temporary storage place on the server (we
> can put limits on this to ensure we don't run out of disk space, etc.)
> and returns a handle to the client (e.g., the filename).
>
> - the client gets the handle for each portion of the upload (say it
> uploaded 3 parts) and then transmits a final "action" upload that
> indicates what to do with the uploaded parts. Something along the lines
> of "combine <handle A>, <handle B>, and <handle C> and make them one
> submission to perfbase."
>
> - the server php saves this as a special "action" file.
>
> - a cron job periodically sweeps the storage space on the server looking
> for action files. It processes them as it sees them (e.g.,
> uncompressing and collating <A>, <B>, and <C>, submitting them to
> perfbase, and removing all temporary files).
>
> - for good measure, we should also sweep the storage space to remove
> old/stale temporary files (e.g., if a client never submitted a
> corresponding action file).
>
> This solves several problems:
>
> 1. We can compress data sent to the server, which could save a *lot* of
> bandwidth.
>
> 2. Since the server can apply some intelligence before submitting data
> to perfbase, the client can also send *abbreviated* data. Specifically,
> the client can send *one* copy of all the platform, architecture,
> compiler, mpirun params, etc., and then all the test results that were
> generated with that (remember that we currently send all that header
> data with *every* test result). The server side can then reconstruct
> this into the format that perfbase needs (e.g., adding the same header
> to every data portion before submitting to perfbase). I'm guessing that
> this will actually be a massive savings in bandwidth.
>
> 3. We can send arbitrarily large upload files (although since text
> compresses so well, this might not be much of an issue), regardless of
> the apache limit
>
> 4. All the intelligence of perfbase processing moves out of apache
>
> This doesn't seem too complicated to write, either.
>
> Additionally, the bandwidth savings from #2 may make #1 unnecessary (at
> least initially -- sending plain, uncompressed text will probably make
> debugging at least slightly easier). More specifically, we can always
> add compression later if we want/need to.
>
> Comments?
>

This was pretty much the original plan for the server side.. don't
remember why we didn't do it.. complications maybe? I seem to remember
Brian hating all over it too :)

Seems like a good solution to me, though I don't volunteer to code this
right now.

Something to think about is security and validity - authentication is
mostly handled by .htaccess now, so we have something there. But I
wonder if we should be validating the data going into perfbase in some
way? Not sure, just a random idea.

One thing that might be useful is that apache often does compression of
outgoing web pages on the fly at the browser's request (in fact I think
browsers request this by default nowadays? not sure). I imagine the
perl LWP stuff supports this on the fly as well. If this works for HTTP
POST, would make things a lot easier. Though it also means apache is
doing compression work.

Andrew