Dear OpenMPI developers,

Some people (Michele Martone Rome University Tor Vergata) found a bug present in openmpi (1.2.5 and 1.2.6) compiled with PGI (7.1-4 and 7.2).

This bug doesn’t involve fabric interconnection (infiniband or GE or other) because is regard just only a simple memory allocation.

You can reproduce the bug with this simple code:

#include <stdio.h>
#include <stdlib.h>

int main( int argc, char *argv[])
{
        /*
         *  memory allocations simulation for ~50M nonzeros:
         *  nd=180 md=350 mdy=420
         *
         *  if this program crashes, there is a compiler problem
         */
        printf("memory allocations simulation for ~50M nonzeros:  nd=180 md=350 mdy=420\n");
        printf("if this program crashes, there check your compiler/environment configuration\n");

        printf("sizeof(int)    %d\n",sizeof(int));
        printf("sizeof(int*)   %d\n",sizeof(int*));
        printf("sizeof(size_t) %d\n",sizeof(size_t));

        if( sizeof(size_t)<8 || sizeof(int*)<8 )
        {
                printf("please compile this program for a 64 bit environment!\n");
        return -1;
        }
        int * p;
        printf("allocation 1/4..\n");
        p = calloc(47109185,16);
        if(!p)printf("..failed.\n");
        printf("allocation 2/4..\n");
        p = calloc(47109185,4);
        if(!p)printf("..failed.\n");
        printf("allocation 3/4..\n");
        p = calloc(47109185,4);
        if(!p)printf("..failed.\n");
        printf("allocation 4/4..\n");
        p = calloc(622947588,16);
        if(!p)printf("..failed.\n");
        if(!p) return -1;

        printf("allocations test passed (no crash)\n");
        return 0;
}


So we test:

  1. the above code compiled with gcc4 and PGI (7.1-4 or 7.2) is ok
  2. the above code compiled with openmpi (1.2.5 or 1.2.6) with gcc4 is ok
  3. the above code compiled with openmpi (1.2.5 or 1.2.6) with PGI (7.1-4 or 7.2) the test doesn’t  pass  (Segmentation fault)

Some output of ldd:

> >         libmpi.so.0 => /opt/mpi/openmpi-1.2.5/pgi/lib/libmpi.so.0
> > (0x0000002a95558000)
> >         libopen-rte.so.0 => /opt/mpi/openmpi-1.2.5/pgi/lib/libopen-rte.so.0
> > (0x0000002a957b2000)
> >         libopen-pal.so.0 => /opt/mpi/openmpi-1.2.5/pgi/lib/libopen-pal.so.0
> > (0x0000002a9599c000)
> >         libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x0000003d7b600000)
> >         librt.so.1 => /lib64/tls/librt.so.1 (0x0000003d80d00000)
> >         libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x0000002a95b30000)
> >         libdl.so.2 => /lib64/libdl.so.2 (0x0000003d7bd00000)
> >         libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003d81500000)
> >         libutil.so.1 => /lib64/libutil.so.1 (0x0000002a95c35000)
> >         libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x0000003d7c100000)
> >         libm.so.6 => /lib64/tls/libm.so.6 (0x0000003d7bb00000)
> >         libc.so.6 => /lib64/tls/libc.so.6 (0x0000003d7b800000)
> >         libpgc.so =>
> > /afs/efda-itm.eu/project/compilers/pgi/linux86-64/7.1-4/libso/libpgc.so
> > (0x0000002a95d3a000)
> >         /lib64/ld-linux-x86-64.so.2 (0x0000003d7b400000

I think it is a bug to wrap the calloc function.

greetings

Dr. Francesco Iannone
Associazione EURATOM-ENEA sulla Fusione
C.R. ENEA Frascati
Via E. Fermi 45
00044 Frascati (Roma) Italy
phone 00-39-06-9400-5124
fax 00-39-06-9400-5524
mailto:francesco.iannone@frascati.enea.it
http://www.afs.enea.it/iannone