Subject: [OMPI devel] Possible buffer overrun bug in opal_free_list_grow, called by MPI::Init
From: Patrick Farrell (patrick.farrell06_at_[hidden])
Date: 2008-08-23 13:57:06


I think I have found a buffer overrun in a function
called by MPI::Init, though explanations of why I am
wrong are welcome.

I am using the openmpi included in Ubuntu Hardy,
version 1.2.5, though I have inspected the latest trunk by eye
and I don't believe the relevant code has changed.

I was trying to use Electric Fence, a memory debugging library,
to debug a suspected buffer overrun in my own program.
Electric Fence works by replacing malloc/free in such
a way that bounds violation errors issue a segfault.
While running my program under Electric Fence, I found
that I got a segfault issued at:

0xb5cdd334 in opal_free_list_grow (flist=0xb2b46a50, num_elements=1) at
113 OBJ_CONSTRUCT_INTERNAL(item, flist->fl_elem_class);
(gdb) bt
#0 0xb5cdd334 in opal_free_list_grow (flist=0xb2b46a50, num_elements=1)
at class/opal_free_list.c:113
#1 0xb5cdd479 in opal_free_list_init (flist=0xb2b46a50, elem_size=56,
elem_class=0xb2b46e20, num_elements_to_alloc=73,
max_elements_to_alloc=-1, num_elements_per_alloc=1) at
#2 0xb2b381aa in ompi_osc_pt2pt_component_init
(enable_progress_threads=false, enable_mpi_threads=false) at
#3 0xb792b67c in ompi_osc_base_find_available
(enable_progress_threads=false, enable_mpi_threads=false) at
#4 0xb78e6abe in ompi_mpi_init (argc=5, argv=0xbfd61f84, requested=0,
provided=0xbfd61e78) at runtime/ompi_mpi_init.c:411
#5 0xb7911a87 in PMPI_Init (argc=0xbfd61f00, argv=0xbfd61f04) at pinit.c:71
#6 0x0811ca6c in MPI::Init ()
#7 0x08118b8a in main ()

To investigate further, I replaced the OBJ_CONSTRUCT_INTERNAL
macro with its definition in opal/class/opal_object.h, and ran it again.
It appears that the invalid memory access is happening
on the instruction

((opal_object_t *) (item))->obj_class = (flist->fl_elem_class);

Investigating further, I modified the source to opal_free_list
with the attached patch. It adds a few debugging printfs to
diagnose exactly what the code is doing. The output of the debugging
statements are:

mpidebug: allocating 216
mpidebug: allocated at memory address 0xb62bdf28
mpidebug: accessing address 0xb62be000

Now, 0xb62be000 - 0xb62bdf28 = 216, which is
the size of the buffer allocated, and so I think
this is a buffer overrun.

Steps to reproduce:

a) Install Electric Fence
b) Compile the following program

#include <stdlib.h>
#include <unistd.h>

#include <mpi.h>

int main(int argc, char **argv)
   MPI::Init(argc, argv);

   return 0;


mpiCC -o test ./test.cpp

c) gdb ./test
d) set environment LD_PRELOAD /usr/lib/
e) run

Hope this helps,

Patrick Farrell

