Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] mpirun not working on more than one node
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-11-17 10:01:14


Your cmd line is telling OMPI to run 17 processes. Since your hostfile indicates that only 16 of them are to run on 10.4.23.107 (which I assume is your PS3 node?), 1 process is going to be run on 10.4.1.23 (I assume this is node1?).

I would guess that the executable is compiled to run on the PS3 given your specified path, so I would expect it to bomb on node1 - which is exactly what appears to be happening.

On Nov 17, 2009, at 1:59 AM, Laurin Müller wrote:

> Hi,
>
> i want to build a cluster with openmpi.
>
> 2 nodes:
> node 1: 4 x Amd Quad Core, ubuntu 9.04, openmpi 1.3.2
> node 2: Sony PS3, ubuntu 9.04, openmpi 1.3
>
> both can connect with ssh to each other and to itself without passwd.
>
> I can run the sample proramm pi.c on both nodes seperatly (see below). But if i try to start it on node1 with --hostfile option to use node 2 "remote" i got this error:
>
> cluster_at_bioclust:~$ mpirun --hostfile /etc/openmpi/openmpi-default-hostfile -np 17 /mnt/projects/PS3Cluster/Benchmark/pi
> --------------------------------------------------------------------------
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --------------------------------------------------------------------------
> my hostfile:
> cluster_at_bioclust:~$ cat /etc/openmpi/openmpi-default-hostfile
> 10.4.23.107 slots=16
> 10.4.1.23 slots=2
> i can see with top that the processors of node2 begin to work shortly, then it apports on node1.
>
> I use this sample/test program:
> #include <stdio.h>
> #include <stdlib.h>
> #include "mpi.h"
> int main(int argc, char *argv[])
> {
> int i, n;
> double h, pi, x;
> int me, nprocs;
> double piece;
> /* --------------------------------------------------- */
> MPI_Init (&argc, &argv);
> MPI_Comm_size (MPI_COMM_WORLD, &nprocs);
> MPI_Comm_rank (MPI_COMM_WORLD, &me);
> /* --------------------------------------------------- */
> if (me == 0)
> {
> printf("%s", "Input number of intervals:\n");
> scanf ("%d", &n);
> }
> /* --------------------------------------------------- */
> MPI_Bcast (&n, 1, MPI_INT, 0, MPI_COMM_WORLD);
> /* --------------------------------------------------- */
> h = 1. / (double) n;
> piece = 0.;
> for (i=me+1; i <= n; i+=nprocs)
> {
> x = (i-1)*h;
> piece = piece + ( 4/(1+(x)*(x)) + 4/(1+(x+h)*(x+h))) / 2 * h;
> }
> printf("%d: pi = %25.15f\n", me, piece);
> /* --------------------------------------------------- */
> MPI_Reduce (&piece, &pi, 1, MPI_DOUBLE,
> MPI_SUM, 0, MPI_COMM_WORLD);
> /* --------------------------------------------------- */
> if (me == 0)
> {
> printf("pi = %25.15f\n", pi);
> }
> /* --------------------------------------------------- */
> MPI_Finalize();
> return 0;
> }
> it works on each node.
> node1:
> cluster_at_bioclust:~$ mpirun -np 4 /mnt/projects/PS3Cluster/Benchmark/piInput number of intervals:
> 20
> 0: pi = 0.822248040052981
> 2: pi = 0.773339953424083
> 3: pi = 0.747089984650041
> 1: pi = 0.798498008827023
> pi = 3.141175986954128
>
> node2:
> cluster_at_kasimir:~$ mpirun -np 2 /mnt/projects/PS3Cluster/Benchmark/pi
> Input number of intervals:
> 5
> 1: pi = 1.267463056905495
> 0: pi = 1.867463056905495
> pi = 3.134926113810990
> cluster_at_kasimir:~$
>
> Thx in advance,
> Laurin
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users