Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Programming with Big Data in R
From: Daniels, Marcus G (mdaniels_at_[hidden])
Date: 2013-02-26 14:43:36


On Feb 26, 2013, at 12:17 PM, Ralph Castain wrote:

> I have someone who is interested in knowing if anyone is currently working with pbdR:
>

It looks to me like an evolution of the capabilities in the `snow' wrapper of `Rmpi', but the addition of the BLACS/PBLAS/ScaLAPACK interfaces data structure accessors. I've used the former quite a bit, but not pbdR itself.

Take a look at http://cran.r-project.org/web/views/HighPerformanceComputing.html to get a sense of the kind of packages that are available; there's a lot of overlap, unfortunately.

R itself is not a compiled language, but it incorporates routines, standard libraries, and third party packages that package-up C, C++, and Fortran behind the scenes. To the extent one can find a `worker' that ends-up being a mostly native code implementation and runs for a long time, MPI or socket messaging can be useful. Scalars are just length 1 vectors in R, so there's at least the possibility of getting performance by being highly vectorized. pbdR and the others usually provide an `apply' routine that maps a function over a vector. Performance-wise think Python or Perl speed.

In contrast to the MPI or sockets, there's a standard package in the distribution called `parallel' that does `fork' of the R process on multicore machines. This works surprisingly well, and if you have a fat node (e.g. 48 processors), it would be my first choice. It's easier to use.

Marcus