Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] mpirun not working on more than one node
From: Laurin Müller (laurin.mueller_at_[hidden])
Date: 2009-11-17 10:45:32


>>> Ralph Castain 11/17/09 4:04 PM >>>
>Your cmd line is telling OMPI to run 17 processes. Since your hostfile
indicates that only 16 of them are to >run on 10.4.23.107 (which I
assume is your PS3 node?), 1 process is going to be run on 10.4.1.23 (I
assume >this is node1?).
node1 has 16 Cores (4 x AMD Quad Core Processors)

node2 is the ps3 with two processors (slots)

>I would guess that the executable is compiled to run on the PS3 given
your specified path, so I would >expect it to bomb on node1 - which is
exactly what appears to be happening.
the executable is compiled on each node separately and lies at each node
in the same directory
 /mnt/projects/PS3Cluster/Benchmark/pi
on each node different directories are mounted. so there exists a
separate executable file compiled at each node.

in the end i want to ran R on this cluster with Rmpi - as i get a
similar problem there i rist wanted to try with an c programm.

with r happens the same thing it works when i start it on each node but
if i want to start more than 16 processes on node one in exits.

On Nov 17, 2009, at 1:59 AM, Laurin Müller wrote:

Hi,
 
i want to build a cluster with openmpi.
 
2 nodes:
node 1: 4 x Amd Quad Core, ubuntu 9.04, openmpi 1.3.2
node 2: Sony PS3, ubuntu 9.04, openmpi 1.3
 
both can connect with ssh to each other and to itself without passwd.
 
I can run the sample proramm pi.c on both nodes seperatly (see below).
But if i try to start it on node1 with --hostfile option to use node 2
"remote" i got this error:
 
cluster_at_bioclust:~$ mpirun --hostfile
/etc/openmpi/openmpi-default-hostfile -np 17
/mnt/projects/PS3Cluster/Benchmark/pi
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------

my hostfile:
cluster_at_bioclust:~$ cat /etc/openmpi/openmpi-default-hostfile
10.4.23.107 slots=16
10.4.1.23 slots=2

i can see with top that the processors of node2 begin to work shortly,
then it apports on node1.
 
I use this sample/test program:
#include
#include
#include "mpi.h"
int main(int argc, char *argv[])
{
      int i, n;
      double h, pi, x;
      int me, nprocs;
      double piece;
/* --------------------------------------------------- */
      MPI_Init (&argc, &argv);
      MPI_Comm_size (MPI_COMM_WORLD, &nprocs);
      MPI_Comm_rank (MPI_COMM_WORLD, &me);
/* --------------------------------------------------- */
      if (me == 0)
      {
         printf("%s", "Input number of intervals:\n");
         scanf ("%d", &n);
      }
/* --------------------------------------------------- */
      MPI_Bcast (&n, 1, MPI_INT, 0, MPI_COMM_WORLD);
/* --------------------------------------------------- */
      h = 1. / (double) n;
      piece = 0.;
      for (i=me+1; i <= n; i+=nprocs)
      {
           x = (i-1)*h;
           piece = piece + ( 4/(1+(x)*(x)) + 4/(1+(x+h)*(x+h))) / 2 * h;
      }
      printf("%d: pi = %25.15f\n", me, piece);
/* --------------------------------------------------- */
      MPI_Reduce (&piece, *, 1, MPI_DOUBLE,
                  MPI_SUM, 0, MPI_COMM_WORLD);
/* --------------------------------------------------- */
      if (me == 0)
      {
         printf("pi = %25.15f\n", pi);
      }
/* --------------------------------------------------- */
     MPI_Finalize();
      return 0;
}

it works on each node.
node1:
cluster_at_bioclust:~$ mpirun -np 4
/mnt/projects/PS3Cluster/Benchmark/piInput number of intervals:
20
0: pi = 0.822248040052981
2: pi = 0.773339953424083
3: pi = 0.747089984650041
1: pi = 0.798498008827023
pi = 3.141175986954128
 
node2:
cluster_at_kasimir:~$ mpirun -np 2 /mnt/projects/PS3Cluster/Benchmark/pi
Input number of intervals:
5
1: pi = 1.267463056905495
0: pi = 1.867463056905495
pi = 3.13
4926113810990
cluster_at_kasimir:~$
 
Thx in advance,
Laurin

 
 

_______________________________________________
users mailing list
users_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/users


  • text/html attachment: HTML