Yeah I think this sounds like a good way to move forward with this
work. The database schema is pretty complex. If you need help on the
database side of things let me know.
To get started, would it be useful to have a meeting over the phone/
telepresence to design the datastore layout? This gives us an
opportunity to start from a blank slate with regards to the
datastore, so it may be useful brainstorm a bit beforehand.
The Google Apps account is under my personal Google account, so I'm
reluctant to use it. I think the reason it took so long for me, was
because when I originally signed up it was in limited beta. I think
the approval time is much shorter now (maybe a day?), and we can make
an openmpi or mtt account that we can use.
With regard to Hadoop, I don't think that IU has a set of machines
that would work, but I can ask around. We could always try Hadoop on
a single machine if people wanted to play around with data querying/
I don't have a strong preference either way, but Google Apps may
provide us with a lower overhead solution for the long run even
though it costs $$.
On Mar 19, 2009, at 11:06 AM, Jeff Squyres wrote:
> On Mar 19, 2009, at 10:51 AM, Mike Dubman wrote:
>> I think we can switch to desired framework (datastore+mapreduce)
>> gradually in the background:
>> Here is a short battle plan:
>> 1. create datastore (google`s or similar)
>> 2. design datastore layout (what to keep, how to keep, objects &
>> 3. create cmd line tool to submit results into datastore
>> 4. integrate (3) into mtt
>> 5. Milestone: we have tool to submit run results into two DBs
>> (currents & datastore)
> Agreed -- this is very do-able.
>> 6. Create mpi-aware cmd line tool to query submitted results. Tool
>> should allow query and fetch selected results.
>> 7. Milestone: we have cmd line tool to query performance results.
>> This tool can be used by community to play with custom scripts for
>> fetching results and generating custom reports.
>> 8. here we can collect 3rd party/contributed scripts to create
>> various visual reports based on perf results.
>> what do you think?
>> I think we can provide some dark forces here to perform most of
>> the steps.
> Awesome! I can say that if this stuff becomes available, Cisco
> will start "double submitting" -- do the currently-official
> postgres db (i.e., same as today), and to the new/experimental
>> Will it be possible to host datastore on openmpi.org and open
>> access to it?
> I think we have 2 options here:
> 1. Google's datastore/app engine. That requires signing up for a
> Google Apps account with Google Engine access. Josh has one of
> these (anyone can get a Google Apps account; as I understand it,
> you have to apply for Google Engine access and approval can take a
> looooong time -- Josh just got approved after nearly a year). Josh
> -- could we use your account, perchance? (I'm not sure if this is
> Josh's main/personal Google account, or a generic account he created)
> 2. Hadoop. This is the open source project that is modeled off
> Google's papers that they published about map/reduce. We'd have to
> host the hadoop data store somewhere (e.g., IU), but it benefits
> from having multiple machines to store data, such as a data farm.
> I do not believe that IU has such a resource.
> There are definite similarities between the two choices, but I
> believe the APIs are different -- so we have to code for one or the
> I think I would prefer #1 in order to take care of the hosting
> issue. If we get past the proof-of-concept stage, I'm guessing
> it'll be pretty easy to get the funding to get a real Google Apps
> account (it's $50/user/year -- darn cheap).
> Jeff Squyres
> Cisco Systems
> mtt-devel mailing list