From: Jeff Squyres \(jsquyres\) (jsquyres_at_[hidden])
Date: 2006-06-29 08:23:11


The apache upload limit is there for a good reason (hogging of
resources), so instead of simply increasing it, perhaps we should be a
bit smarter about how we send data back. Indeed, every web server is
going to have some finite limit. And realistically, we should really
try to send [much] less data than the upload limit, anyway.

Right now, perfbase.php is basically a pass-through to the back-end
perfbase command line application. Whatever we receive via HTTP is sent
back to perfbase for processing. I'm wondering if we should insert a
thin processing layer *before* perfbase is invoked. Something that's
relatively simple, but could save us on bandwidth and apache processing
(which is always good; even though IU is not lacking in bandwidth,
milliways is also our production mail and web server, so cutting down
its work is always a Good Thing).

I'm thinking something like the following:

- the client takes its total data that it needs to submit for one
perfbase action, compresses it (since text compresses extremely well),
and breaks it up into multiple uploads if necessary. It serially
uploads each portion of the compressed file.

- the server php becomes even dumber than it is now: it simply accepts
the upload and puts it into a temporary storage place on the server (we
can put limits on this to ensure we don't run out of disk space, etc.)
and returns a handle to the client (e.g., the filename).

- the client gets the handle for each portion of the upload (say it
uploaded 3 parts) and then transmits a final "action" upload that
indicates what to do with the uploaded parts. Something along the lines
of "combine <handle A>, <handle B>, and <handle C> and make them one
submission to perfbase."

- the server php saves this as a special "action" file.

- a cron job periodically sweeps the storage space on the server looking
for action files. It processes them as it sees them (e.g.,
uncompressing and collating <A>, <B>, and <C>, submitting them to
perfbase, and removing all temporary files).

- for good measure, we should also sweep the storage space to remove
old/stale temporary files (e.g., if a client never submitted a
corresponding action file).

This solves several problems:

1. We can compress data sent to the server, which could save a *lot* of
bandwidth.

2. Since the server can apply some intelligence before submitting data
to perfbase, the client can also send *abbreviated* data. Specifically,
the client can send *one* copy of all the platform, architecture,
compiler, mpirun params, etc., and then all the test results that were
generated with that (remember that we currently send all that header
data with *every* test result). The server side can then reconstruct
this into the format that perfbase needs (e.g., adding the same header
to every data portion before submitting to perfbase). I'm guessing that
this will actually be a massive savings in bandwidth.

3. We can send arbitrarily large upload files (although since text
compresses so well, this might not be much of an issue), regardless of
the apache limit

4. All the intelligence of perfbase processing moves out of apache

This doesn't seem too complicated to write, either.

Additionally, the bandwidth savings from #2 may make #1 unnecessary (at
least initially -- sending plain, uncompressed text will probably make
debugging at least slightly easier). More specifically, we can always
add compression later if we want/need to.

Comments?

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems