Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Newbie question continues, a step toward real app
From: Tena Sakai (tsakai_at_[hidden])
Date: 2011-01-13 20:34:48


Hi Gus,

> Did you speak to the Rmpi author about this?

No, I haven't, but here's what the author wrote:
https://stat.ethz.ch/pipermail/r-sig-hpc/2009-February/000104.html
in which he states:
   ...The way of spawning R slaves under LAM is not working
   any more under OpenMPI. Under LAM, one just uses
     R -> library(Rmpi) -> mpi.spawn.Rslaves()
   as long as host file is set. Under OpenMPI this leads only one R slave on
   the master host no matter how many remote hosts are specified in OpenMPI
   hostfile. ...
His README file doesn't tell what I need to know. In the light of
LAM MPI being "absorbed" into openMPI, I find this unfortunate.

There are other ways to achieve parallelism from R. The most recent
Offering is from Revolution Analytics:
  http://www.revolutionanalytics.com/products/revolution-r.php
They have package called foreach which can use different parallel underling,
doSNOW, doRedis, etc, but not openMPI (or any other mpi variants). In fact,
I saw someone posting:

  Just wanted to share a working example of doSNOW and foreach for an
  openMPI cluster. The function eddcmp() is just an example and returns
  some inocuous warnings. The example first has each node return its
  nodename, then runs an example comparing dopar, do and a for loop. In the
  directory containing rtest.R it is run from the command line with:
  "mpirun -n --hostfile /home/hostfile --no-save -f rtest.R"
  ...
  ...
  <I will forward the full posting from him in a separate mail.>

What I discovered was that on my version of openMPI (v 1.4.3), this command
line doesn't work. I need to add 1 after -n and get rid of --no-save and
-f then it runs, but generates something a bit traumatic:
  [compute-0-0.local:16448] [[42316,0],1]->[[42316,0],0]
mca_oob_tcp_msg_send_handler: writev failed: Bad file descriptor (9) [sd =
9]
  [compute-0-0.local:16448] [[42316,0],1] routed:binomial: Connection to
lifeline [[42316,0],0] lost

The long and short of it is that the mechanism you showed me works for
me and (while I want to keep my eyes open for other mechanism/methods)
I want to get on to solve my science. (And I haven't forgotten to look
into Torque.)

Regards,

Tena Sakai
tsakai_at_[hidden]

On 1/13/11 4:18 PM, "Gus Correa" <gus_at_[hidden]> wrote:

> Tena Sakai wrote:
>> Fantastic, Gus! Now I think I got framework pretty much done.
>> The rest is to work on 'problem solving' end with R.
>>
>> Many thanks for your insight and kindness. I really appreciate it.
>>
>> Regards,
>>
>> Tena Sakai
>> tsakai_at_[hidden]
>>
> Hi Tena
>
> I'm glad that it helped somebody at the other side of the country,
> but solving a problem (MIMD) so close to ours here at home.
>
> Still thinking of what could one do to fix the Rmpi guts,
> to work nicely with OpenMPI, MPICH2, etc.
> The hint I took from your postings was that the whole
> issue revolves around the mechanism to launch MPI jobs
> (the whole mambo jumbo of start LAM boot, and stuff like that,
> that is no longer there).
> I think typically this is where the MPIs differ,
> and the difficulties in portability appear.
> Did you speak to the Rmpi author about this?
> If I only had the time to learn some R and take a look at Rmpi
> I might give it a try.
> The MIMD trick will do for the embarrassingly parallel problem you
> mentioned, but it would be nice to have Rmpi working for when
> parallelism is essential.
> Nobody uses R here (but do in the Statistics Department),
> probably because they're used to other tools (Matlab, etc).
> However, there is plenty of statistics of climate and other
> Earth Science data that goes on here,
> hence R might be used also.
>
> Good luck with your research and with "R on the cloud"!
>
> Regards,
> Gus Correa
> ---------------------------------------------------------------------
> Gustavo Correa
> Lamont-Doherty Earth Observatory - Columbia University
> Palisades, NY, 10964-8000 - USA
> ---------------------------------------------------------------------
>
>>
>> On 1/13/11 2:40 PM, "Gus Correa" <gus_at_[hidden]> wrote:
>>
>>> Tena Sakai wrote:
>>>> Hi,
>>>>
>>>> I have a script I call fib.r. It looks like:
>>>>
>>>> #!/usr/bin/env r
>>>>
>>>> fib <- function( n ) {
>>>> a <- 0
>>>> b <- 1
>>>> for ( i in 1:n ) {
>>>> t <- b
>>>> b <- a
>>>> a <- a + t
>>>> }
>>>> a
>>>> }
>>>>
>>>> print( fib(argv[1]) )
>>>>
>>>> When I run this script with a parameter, it generates a fibonocci number:
>>>>
>>>> $ fib.r 5
>>>> 5
>>>> $ fib.r 6
>>>> 8
>>>>
>>>> and if I stick this into <program> part of MIMD example I have used
>>>> previously:
>>>>
>>>> $ mpirun -H vixen -np 1 hostname : --hostfile myhosts -np 8 fib.r 7
>>>>
>>>> I get:
>>>>
>>>> vixen.egcrc.org
>>>> [1] 13
>>>> [1] 13
>>>> [1] 13
>>>> [1] 13
>>>> [1] 13
>>>> [1] 13
>>>> [1] 13
>>>> [1] 13
>>>>
>>>> This is good as proof of concept, but what I really want to do is to
>>>> have that 7
>>>> different for each (slave) process. Ie., I want to run ³rfib 5² on node
>>>> 0, ³rfib 6²
>>>> on node 1, ³rfib 7² on node 2, and so on. Is there any way to give a
>>>> different
>>>> parameter(s) to different process/slot?
>>>>
>>>> I thought maybe I can use ­rf option to do this, but I am leaning toward
>>>> ­app
>>>> option. Unfortunately, I see no example for the application context
>>>> file. Would
>>>> someone kindly explain how I can do what I describe?
>>>>
>>>> Thank you.
>>>>
>>>> Tena Sakai
>>>> tsakai_at_[hidden]
>>>>
>>> Hi Tena
>>>
>>> We ran MPMD/MIMD programs here using in the past.
>>> Coupled climate modes: atmosphere, ocean, sea ice, etc, each one one
>>> executable, communicating via MPI.
>>> Actually this was with MPICH1, somewhat different syntax than OpenMPI,
>>> the flag/file was called '-pgfile' not '-app',
>>> but I see no reason why it shouldn't work in your case with OpenMPI.
>>>
>>> I think if you create a 'appfile' with this content:
>>>
>>> -H node0 -np 1 rfib 5
>>> -H node0 -np 1 rfib 6
>>> ...
>>>
>>> and launch mpirun with
>>>
>>> mpirun -app appfile
>>>
>>> it is likely to work.
>>>
>>> Under Torque I cannot test this very easily,
>>> because I need to parse the Torque file that gives me the nodes,
>>> then write down the 'appfile' on the fly (which is what I used to
>>> do for the coupled climate models).
>>>
>>> However, I tried on a standalone machine (where the -H nodename didn't
>>> make sense, and was not used) and it worked.
>>> My appfile test was like this:
>>> -np 1 ls appfile
>>> -np 1 hostname
>>> -np 2 date
>>> -np 4 who
>>>
>>> You can add your -H nodename to each line.
>>>
>>> I hope this helps,
>>> Gus Correa
>>> ---------------------------------------------------------------------
>>> Gustavo Correa
>>> Lamont-Doherty Earth Observatory - Columbia University
>>> Palisades, NY, 10964-8000 - USA
>>> ---------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users