Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Heterogeneous SLURM cluster segfaults on large transfers
From: James Gao (james_at_[hidden])
Date: 2009-08-28 16:48:52


Hi everyone, I've been having a pretty odd issue with Slurm and
openmpi the last few days. I just set up a heterogeneous cluster with
Slurm consisting of P4 32 bit machines and a few new i7 64 bit
machines, all running the latest version of Ubuntu linux. I compiled
the latest OpenMPI 1.3.3 with the flags

./configure --enable-heterogeneous --with-threads --with-slurm
--with-memory-manager --with-openib --without-udapl
--disable-openib-ibcm

I also made a trivial test program:
#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>

#define LEN 12000000

int main (int argc, char *argv[]) {
        int size, rank, i, len = LEN;
        MPI_Init(&argc, &argv);
        MPI_Comm_rank(MPI_COMM_WORLD, &rank);

        if (argc > 1) len = atoi(argv[1]);
        printf("Size: %d, ", len);
        char *greeting = malloc(sizeof(char)*len);
        
        if (rank == 0) {
                for ( i = 0; i < len-1; i++)
                        greeting[i] = ' ';
                greeting[len-1] = '\0';
        }
        MPI_Bcast(greeting, len, MPI_BYTE, 0, MPI_COMM_WORLD);
        printf("rank: %d\n", rank);
        
        MPI_Finalize();
        free(greeting);
        return 0;
}

I run this with salloc -n 28 mpirun -n 28 mpitest on my slurm cluster.
At 12,000,000 characters, this command works exactly as expected, no
issues at all. However, beyond a certain critical limit somewhere
around 16,000,000 characters, the program will consistently segfault
with this error message:

salloc -n 28 -p all mpiexec -n 28 mpitest 16500000
salloc: Granted job allocation 234
[ibogaine:24883] *** Process received signal ***
[ibogaine:24883] Signal: Segmentation fault (11)
[ibogaine:24883] Signal code: Address not mapped (1)
[ibogaine:24883] Failing at address: 0x101a60f58
[ibogaine:24883] [ 0] /lib/libpthread.so.0 [0x7f6c00405080]
[ibogaine:24883] [ 1] /usr/local/lib/openmpi/mca_pml_ob1.so [0x7f6bfd9dff68]
[ibogaine:24883] [ 2] /usr/local/lib/openmpi/mca_btl_tcp.so [0x7f6bfcf3ec7c]
[ibogaine:24883] [ 3] /usr/local/lib/libopen-pal.so.0 [0x7f6c00ed5ee8]
[ibogaine:24883] [ 4]
/usr/local/lib/libopen-pal.so.0(opal_progress+0xa1) [0x7f6c00eca7b1]
[ibogaine:24883] [ 5] /usr/local/lib/libmpi.so.0 [0x7f6c013a185d]
[ibogaine:24883] [ 6] /usr/local/lib/openmpi/mca_coll_tuned.so [0x7f6bfc10c29c]
[ibogaine:24883] [ 7] /usr/local/lib/openmpi/mca_coll_tuned.so [0x7f6bfc10c9eb]
[ibogaine:24883] [ 8] /usr/local/lib/openmpi/mca_coll_tuned.so [0x7f6bfc10295c]
[ibogaine:24883] [ 9] /usr/local/lib/openmpi/mca_coll_sync.so [0x7f6bfc31b35a]
[ibogaine:24883] [10] /usr/local/lib/libmpi.so.0(MPI_Bcast+0xa3)
[0x7f6c013b78c3]
[ibogaine:24883] [11] mpitest(main+0xd4) [0x400bc0]
[ibogaine:24883] [12] /lib/libc.so.6(__libc_start_main+0xe6) [0x7f6c000a25a6]
[ibogaine:24883] [13] mpitest [0x400a29]
[ibogaine:24883] *** End of error message ***

As far as I can tell, the segfault occurs on the root node doing the
broadcast. This error only occurs when I try to send across
heterogeneous sections. If I only communicate between homogeneous
subsets of the cluster, I can go as far as 120,000,000 characters
without issue. However, a hard "limit" seems to occur somewhere just
under 16,000,000 characters across the heterogeneous cluster. Any
ideas?