This is very interesting. I've been working on getting one of our
clustering programs (
http://grids.ucs.indiana.edu/ptliupages/publications/DAVS_IEEE.pdf) to work
with OpenMPI Java binding and we obtained very good speedup and scalability
when run on HPC clusters with Infiniband. We are working on a report with
performance results and will make it available here soon.
This is again interesting as we have a series of MapReduce applications
that we have developed in analyzing gene sequences (
which could benefit from having MPI support. Also, as you have mentioned,
we run all these MapReduce jobs on HPC clusters.
I am very eager to try 4.) and wonder if you could kindly provide some
pointers on how to get it working.
On Mon, Feb 24, 2014 at 10:30 AM, Ralph Castain <rhc_at_[hidden]> wrote:
> On Feb 23, 2014, at 10:42 AM, Saliya Ekanayake <esaliya_at_[hidden]> wrote:
> This is to get some info on the subject and not directly a question on
> I've Jeff's blog post on integrating OpenMPI with Hadoop (
> http://blogs.cisco.com/performance/resurrecting-mpi-and-java/) and wanted
> to check if this is related with the Jira at
> Somewhat. A little history might help. I was asked a couple of years ago
> to work on integrating MPI support with Hadoop. At that time, the thought
> of those asking for my help was that we would enable YARN to support MPI,
> which was captured in 2911. However, after working on it for a few months,
> it became apparent to me that this was a mistake. YARN's architecture makes
> support of MPI very difficult (but achievable - I did it with OMPI, and
> someone else has now done it with MPICH), and the result exhibits horrible
> scaling and relatively poor performance by HPC standards. So if you want to
> run a very small MPI job under YARN, you can do it with a custom
> application manager and JNI wrappers around every MPI call - just don't
> expect great performance.
> What I did instead was to pivot direction and focus on porting Hadoop to
> the HPC environment. Thought here was that, if we could get the Hadoop
> classes working with a regular HPC environment, then all the HPC world's
> tools and programming models become available. This is what we have done,
> and it comes in four parts:
> 1. Java MPI bindings that are very close to C-level performance. These are
> being released in the 1.7 series of OMPI and are unique to OMPI at this
> time. Jose Roman and Oscar Vega continue to close the performance gap.
> 2. Integration to HPC resource managers such as Slurm and Moab. Intel has
> taken the lead there and announced that support at SC13 - in beta test now
> 3. Integration to HPC file systems such as Lustre. Intel again took the
> lead here and has a Lustre adaptor in beta test
> 4. Equivalent of an application manager to stage map-reduce executions. I
> updated OMPI's "mpirun" to handle that - available in the current 1.7
> release series. It fully understands "staged" execution and also notifies
> the associated processes when MPI is feasible (i.e., all the procs in
> comm_world are running).
> We continue to improve the Hadoop support - Cisco and I are collaborating
> on a new "dynamic MPI" capability that will allow the procs to interact
> without imposing the barrier at MPI_Init, for example. So I expect that
> this summer will demonstrate a pretty robust capability in that area.
> After all, there is no reason you shouldn't be able to run Hadoop on an
> HPC cluster :-)
> Also, is there a place I can get more info on this effort?
> Thank you,
> Saliya Ekanayake esaliya_at_[hidden]
> Cell 812-391-4914 Home 812-961-6383
> users mailing list
> users mailing list
Saliya Ekanayake esaliya_at_[hidden]
Cell 812-391-4914 Home 812-961-6383