Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Segmentation fault when Send/Recv on heterogeneous cluster (32/64 bit machines)
From: TRINH Minh Hieu (mhtrinh_at_[hidden])
Date: 2010-02-28 13:22:27


I have some problems running MPI on my heterogeneous cluster. More
precisley i got segmentation fault when sending a large array (about
10000) of double from a i686 machine to a x86_64 machine. It does not
happen with small array. Here is the send/recv code source (complete
source is in attached file) :
========code ================
    if (me == 0 ) {
        for (int pe=1; pe<nprocs; pe++)
                printf("Receiving from proc %d : ",pe); fflush(stdout);
            d=(double *)malloc(sizeof(double)*n);
            printf("OK\n"); fflush(stdout);
        printf("All done.\n");
    else {
      d=(double *)malloc(sizeof(double)*n);
======== code ================

I got segmentation fault with n=10000 but no error with n=1000
I have 2 machines :
sbtn155 : Intel Xeon, x86_64
sbtn211 : Intel Pentium 4, i686

The code is compiled in x86_64 and i686 machine, using OpenMPI 1.4.1,
installed in /tmp/openmpi :
[mhtrinh_at_sbtn211 heterogenous]$ make hetero
gcc -Wall -I. -std=c99 -O3 -I/tmp/openmpi/include -c hetero.c -o hetero.i686.o
/tmp/openmpi/bin/mpicc -Wall -I. -std=c99 -O3 -I/tmp/openmpi/include
hetero.i686.o -o hetero.i686 -lm

[mhtrinh_at_sbtn155 heterogenous]$ make hetero
gcc -Wall -I. -std=c99 -O3 -I/tmp/openmpi/include -c hetero.c -o hetero.x86_64.o
/tmp/openmpi/bin/mpicc -Wall -I. -std=c99 -O3 -I/tmp/openmpi/include
hetero.x86_64.o -o hetero.x86_64 -lm

I run with the code using appfile and got thoses error :
$ cat appfile
--host sbtn155 -np 1 hetero.x86_64
--host sbtn155 -np 1 hetero.x86_64
--host sbtn211 -np 1 hetero.i686

$ mpirun -hetero --app appfile
Input array length :
Receiving from proc 1 : OK
Receiving from proc 2 : [sbtn155:26386] *** Process received signal ***
[sbtn155:26386] Signal: Segmentation fault (11)
[sbtn155:26386] Signal code: Address not mapped (1)
[sbtn155:26386] Failing at address: 0x200627bd8
[sbtn155:26386] [ 0] /lib64/ [0x3fa4e0e540]
[sbtn155:26386] [ 1] /tmp/openmpi/lib/openmpi/ [0x2aaaad8d7908]
[sbtn155:26386] [ 2] /tmp/openmpi/lib/openmpi/ [0x2aaaae2fc6e3]
[sbtn155:26386] [ 3] /tmp/openmpi/lib/ [0x2aaaaafe39db]
[sbtn155:26386] [ 4]
/tmp/openmpi/lib/ [0x2aaaaafd8b9e]
[sbtn155:26386] [ 5] /tmp/openmpi/lib/openmpi/ [0x2aaaad8d4b25]
[sbtn155:26386] [ 6] /tmp/openmpi/lib/
[sbtn155:26386] [ 7] hetero.x86_64(main+0xde) [0x400cbe]
[sbtn155:26386] [ 8] /lib64/ [0x3fa421e074]
[sbtn155:26386] [ 9] hetero.x86_64 [0x400b29]
[sbtn155:26386] *** End of error message ***
mpirun noticed that process rank 0 with PID 26386 on node sbtn155
exited on signal 11 (Segmentation fault).

Am I missing an option in order to run in heterogenous cluster ?
MPI_Send/Recv have limit array size when using heterogeneous cluster ?
Thanks for your help. Regards

   M. TRINH Minh Hieu
   F-30207 Bagnols-sur-Cèze, FRANCE