Dear OpenMPI developers,
Some people (Michele Martone Rome University Tor Vergata) found a bug present in openmpi (1.2.5 and 1.2.6) compiled with PGI (7.1-4 and 7.2).
This bug doesn’t involve fabric interconnection (infiniband or GE or other) because is regard just only a simple memory allocation.
You can reproduce the bug with this simple code:
#include <stdio.h>
#include <stdlib.h>
int main( int argc, char *argv[])
{
/*
* memory allocations simulation for ~50M nonzeros:
* nd=180 md=350 mdy=420
*
* if this program crashes, there is a compiler problem
*/
printf("memory allocations simulation for ~50M nonzeros: nd=180 md=350 mdy=420\n");
printf("if this program crashes, there check your compiler/environment configuration\n");
printf("sizeof(int) %d\n",sizeof(int));
printf("sizeof(int*) %d\n",sizeof(int*));
printf("sizeof(size_t) %d\n",sizeof(size_t));
if( sizeof(size_t)<8 || sizeof(int*)<8 )
{
printf("please compile this program for a 64 bit environment!\n");
return -1;
}
int * p;
printf("allocation 1/4..\n");
p = calloc(47109185,16);
if(!p)printf("..failed.\n");
printf("allocation 2/4..\n");
p = calloc(47109185,4);
if(!p)printf("..failed.\n");
printf("allocation 3/4..\n");
p = calloc(47109185,4);
if(!p)printf("..failed.\n");
printf("allocation 4/4..\n");
p = calloc(622947588,16);
if(!p)printf("..failed.\n");
if(!p) return -1;
printf("allocations test passed (no crash)\n");
return 0;
}
So we test:
- the above code compiled with gcc4 and PGI (7.1-4 or 7.2) is ok
- the above code compiled with openmpi (1.2.5 or 1.2.6) with gcc4 is ok
- the above code compiled with openmpi (1.2.5 or 1.2.6) with PGI (7.1-4 or 7.2) the test doesn’t pass (Segmentation fault)
Some output of ldd:
> > libmpi.so.0 => /opt/mpi/openmpi-1.2.5/pgi/lib/libmpi.so.0
> > (0x0000002a95558000)
> > libopen-rte.so.0 => /opt/mpi/openmpi-1.2.5/pgi/lib/libopen-rte.so.0
> > (0x0000002a957b2000)
> > libopen-pal.so.0 => /opt/mpi/openmpi-1.2.5/pgi/lib/libopen-pal.so.0
> > (0x0000002a9599c000)
> > libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x0000003d7b600000)
> > librt.so.1 => /lib64/tls/librt.so.1 (0x0000003d80d00000)
> > libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x0000002a95b30000)
> > libdl.so.2 => /lib64/libdl.so.2 (0x0000003d7bd00000)
> > libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003d81500000)
> > libutil.so.1 => /lib64/libutil.so.1 (0x0000002a95c35000)
> > libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x0000003d7c100000)
> > libm.so.6 => /lib64/tls/libm.so.6 (0x0000003d7bb00000)
> > libc.so.6 => /lib64/tls/libc.so.6 (0x0000003d7b800000)
> > libpgc.so =>
> > /afs/efda-itm.eu/project/compilers/pgi/linux86-64/7.1-4/libso/libpgc.so
> > (0x0000002a95d3a000)
> > /lib64/ld-linux-x86-64.so.2 (0x0000003d7b400000
I think it is a bug to wrap the calloc function.
greetings
Dr. Francesco Iannone
Associazione EURATOM-ENEA sulla Fusione
C.R. ENEA Frascati
Via E. Fermi 45
00044 Frascati (Roma) Italy
phone 00-39-06-9400-5124
fax 00-39-06-9400-5524
mailto:francesco.iannone@frascati.enea.it
http://www.afs.enea.it/iannone