Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OpenMPI + Hadoop
From: Ralph Castain (rhc_at_[hidden])
Date: 2014-02-24 12:23:51


On Feb 24, 2014, at 7:55 AM, Saliya Ekanayake <esaliya_at_[hidden]> wrote:

> This is very interesting. I've been working on getting one of our clustering programs (http://grids.ucs.indiana.edu/ptliupages/publications/DAVS_IEEE.pdf) to work with OpenMPI Java binding and we obtained very good speedup and scalability when run on HPC clusters with Infiniband. We are working on a report with performance results and will make it available here soon.

Great! Will look forward to seeing it.

>
> This is again interesting as we have a series of MapReduce applications that we have developed in analyzing gene sequences (http://grids.ucs.indiana.edu/ptliupages/publications/DACIDR_camera_ready_v0.3.pdf), which could benefit from having MPI support. Also, as you have mentioned, we run all these MapReduce jobs on HPC clusters.

The folks at TACC are doing the Intel beta on a mouse genome, and will also be publishing their results comparing Hadoop performance under YARN/HDFS vs Slurm/Lustre.

>
> I am very eager to try 4.) and wonder if you could kindly provide some pointers on how to get it working.

The current release contains the initial "staged" execution support, but not the dynamic extension I described. To use staged execution, all you have to do is:

(a) express your mapper and reducer stages as separate app_contexts on the command line; and

(b) add --staged to the cmd line to request staged execution.

So it looks something like this:

mpirun --staged -n 10 ./mapper; -n 4 ./reducer

Depending on the allocation, mpirun will stage execution of the mappers and reducers, connecting the stdout of the first to the stdin of the second. There is also support for localized file systems (see the orte/mca/dfs framework) that allows you to transparently access/move data across the network, and of course mpirun supports pre-positioning of files via the --preload-files option.

HTH - feel free to ask questions and we'll be happy to help. Also, if you want to collaborate on the dynamic extension, we'd welcome the assist. Both Jeff and I have been somewhat swamped with other priorities and so progress on that last step is lagging.

Ralph

>
> Thank you,
> Saliya
>
>
>
> On Mon, Feb 24, 2014 at 10:30 AM, Ralph Castain <rhc_at_[hidden]> wrote:
>
> On Feb 23, 2014, at 10:42 AM, Saliya Ekanayake <esaliya_at_[hidden]> wrote:
>
>> Hi,
>>
>> This is to get some info on the subject and not directly a question on OpenMPI.
>>
>> I've Jeff's blog post on integrating OpenMPI with Hadoop (http://blogs.cisco.com/performance/resurrecting-mpi-and-java/) and wanted to check if this is related with the Jira at https://issues.apache.org/jira/browse/MAPREDUCE-2911
>
> Somewhat. A little history might help. I was asked a couple of years ago to work on integrating MPI support with Hadoop. At that time, the thought of those asking for my help was that we would enable YARN to support MPI, which was captured in 2911. However, after working on it for a few months, it became apparent to me that this was a mistake. YARN's architecture makes support of MPI very difficult (but achievable - I did it with OMPI, and someone else has now done it with MPICH), and the result exhibits horrible scaling and relatively poor performance by HPC standards. So if you want to run a very small MPI job under YARN, you can do it with a custom application manager and JNI wrappers around every MPI call - just don't expect great performance.
>
> What I did instead was to pivot direction and focus on porting Hadoop to the HPC environment. Thought here was that, if we could get the Hadoop classes working with a regular HPC environment, then all the HPC world's tools and programming models become available. This is what we have done, and it comes in four parts:
>
> 1. Java MPI bindings that are very close to C-level performance. These are being released in the 1.7 series of OMPI and are unique to OMPI at this time. Jose Roman and Oscar Vega continue to close the performance gap.
>
> 2. Integration to HPC resource managers such as Slurm and Moab. Intel has taken the lead there and announced that support at SC13 - in beta test now
>
> 3. Integration to HPC file systems such as Lustre. Intel again took the lead here and has a Lustre adaptor in beta test
>
> 4. Equivalent of an application manager to stage map-reduce executions. I updated OMPI's "mpirun" to handle that - available in the current 1.7 release series. It fully understands "staged" execution and also notifies the associated processes when MPI is feasible (i.e., all the procs in comm_world are running).
>
> We continue to improve the Hadoop support - Cisco and I are collaborating on a new "dynamic MPI" capability that will allow the procs to interact without imposing the barrier at MPI_Init, for example. So I expect that this summer will demonstrate a pretty robust capability in that area.
>
> After all, there is no reason you shouldn't be able to run Hadoop on an HPC cluster :-)
>
> HTH
> Ralph
>
>>
>> Also, is there a place I can get more info on this effort?
>>
>> Thank you,
>> Saliya
>>
>> --
>> Saliya Ekanayake esaliya_at_[hidden]
>> Cell 812-391-4914 Home 812-961-6383
>> http://saliya.org
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> --
> Saliya Ekanayake esaliya_at_[hidden]
> Cell 812-391-4914 Home 812-961-6383
> http://saliya.org
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users