Subject: Re: [MTT devel] MTT GDS -- one more...
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-02-12 11:44:07

On Feb 12, 2010, at 11:35 AM, Andrew Senin wrote:

> I worked with Igor on the GDS framework (although Igor knows more tech
> details than me). Let me put my two cents to the discussion.


> > 1. It looks like the main benefits of using the Google App Engine --
> specifically for MTT -- is that we can use the GDS and/or we can host an
> application on their web servers. Is that correct?
> I think yes. Also GDS should work faster than a relational DB on large
> amounts of data.

Cool. The speed is also a good/important point for us -- our current SQL server is kinda creaking under the load. Josh spent quite a bit of time optimizing the database that we have now (you should have seen how slow it used to be!), so moving to a faster platform is desirable.

> > 2. In reading through the Google Appengine docs, the GDS stuff looks like
> we mainly can access the data through GQL. I don't see any mention of doing
> map/reduce kinds of computations (Ethan and I were talking on the phone
> today about MTT Appengine possibilities). I'm new to all this stuff, so
> it's quite possible that a) I missed it, or b) I just don't understand what
> I'm seeing/reading yet. Or does GQL do map/reduce on the back end to do its
> magic? Is GQL the main/only way we have to access GDS?
> As far as I and Igor know there are no way of doing Map/Reduce with GDS. And
> GQL (or filters which is practically synonym) is the main and only way to
> access GDS data.

Ok, good. Just wanted to make sure we understood that point properly and weren't missing anything.

> > 3. Is there a reason that doesn't use the python API to directly
> talk to GDS? I.e., what is the rationale for using a web app on appengine?
> Is the web app doing stuff that we can't do at the client? Ditto for
> and (these questions are partially fueled by my
> curiosity and concern about why we're using so much CPU at Google)
> There are a few reasons of doing it. The first is speed. When we post new
> data we firstly try to find if there is a copy of corresponding MpiInfo,
> ClustreInfo and other *Info classes. If we did it directly from client
> scripts the delays would be higher (depending on Internet connection speed).
> Price of it is additional CPU cycles on google servers.

FWIW, I don't think I'm concerned about the speed of submitting. MTT runs can go for hours. If it takes 2 seconds to submit or 20, I'm not concerned about it -- a few round-trip latencies + some GQL lookups are still a very small fraction of the overall MTT run time. If CPU is going to be an issue, I wouldn't mind doing some of these lookups from the client (and potentially even caching some of the IDs on the client -- like we do on the SQL submission reporter), and then just submitting those IDs in the "main submit".

> The second and more
> important is that when we have such logic on server we (instead of GDS
> clients) are responsible for maintaining correct structure of links between
> objects. If such logic was implemented on client side user could (by mistake
> or on purpose) break links between objects.

Ah yes, this is a very good reason.

I would also imagine that without the web interface, we would be limited to talking to the GDS under a single username/password (i.e., the owner of the appspot), which is also undesirable.

Thanks for the info!

Jeff Squyres
