Open MPI logo

MTT Devel Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all MTT Devel mailing list

Subject: Re: [MTT devel] GSOC application
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-03-19 11:06:39

On Mar 19, 2009, at 10:51 AM, Mike Dubman wrote:

> I think we can switch to desired framework (datastore+mapreduce)
> gradually in the background:
> Here is a short battle plan:
> 1. create datastore (google`s or similar)
> 2. design datastore layout (what to keep, how to keep, objects &
> attributes)
> 3. create cmd line tool to submit results into datastore
> 4. integrate (3) into mtt
> 5. Milestone: we have tool to submit run results into two DBs
> (currents & datastore)

Agreed -- this is very do-able.

> 6. Create mpi-aware cmd line tool to query submitted results. Tool
> should allow query and fetch selected results.
> 7. Milestone: we have cmd line tool to query performance results.
> This tool can be used by community to play with custom scripts for
> fetching results and generating custom reports.
> 8. here we can collect 3rd party/contributed scripts to create
> various visual reports based on perf results.
> what do you think?
> I think we can provide some dark forces here to perform most of the
> steps.

Awesome! I can say that if this stuff becomes available, Cisco will
start "double submitting" -- do the currently-official postgres db
(i.e., same as today), and to the new/experimental datastore.

> Will it be possible to host datastore on and open access
> to it?

I think we have 2 options here:

1. Google's datastore/app engine. That requires signing up for a
Google Apps account with Google Engine access. Josh has one of these
(anyone can get a Google Apps account; as I understand it, you have to
apply for Google Engine access and approval can take a looooong time
-- Josh just got approved after nearly a year). Josh -- could we use
your account, perchance? (I'm not sure if this is Josh's main/
personal Google account, or a generic account he created)

2. Hadoop. This is the open source project that is modeled off
Google's papers that they published about map/reduce. We'd have to
host the hadoop data store somewhere (e.g., IU), but it benefits from
having multiple machines to store data, such as a data farm. I do not
believe that IU has such a resource.

There are definite similarities between the two choices, but I believe
the APIs are different -- so we have to code for one or the other.

I think I would prefer #1 in order to take care of the hosting issue.
If we get past the proof-of-concept stage, I'm guessing it'll be
pretty easy to get the funding to get a real Google Apps account (it's
$50/user/year -- darn cheap).

Jeff Squyres
Cisco Systems