Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: [OMPI devel] Possible buffer overrun bug in opal_free_list_grow, called by MPI::Init
From: Patrick Farrell (patrick.farrell06_at_[hidden])
Date: 2008-08-23 13:57:06


Hi,

I think I have found a buffer overrun in a function
called by MPI::Init, though explanations of why I am
wrong are welcome.

I am using the openmpi included in Ubuntu Hardy,
version 1.2.5, though I have inspected the latest trunk by eye
and I don't believe the relevant code has changed.

I was trying to use Electric Fence, a memory debugging library,
to debug a suspected buffer overrun in my own program.
Electric Fence works by replacing malloc/free in such
a way that bounds violation errors issue a segfault.
While running my program under Electric Fence, I found
that I got a segfault issued at:

0xb5cdd334 in opal_free_list_grow (flist=0xb2b46a50, num_elements=1) at
class/opal_free_list.c:113
113 OBJ_CONSTRUCT_INTERNAL(item, flist->fl_elem_class);
(gdb) bt
#0 0xb5cdd334 in opal_free_list_grow (flist=0xb2b46a50, num_elements=1)
at class/opal_free_list.c:113
#1 0xb5cdd479 in opal_free_list_init (flist=0xb2b46a50, elem_size=56,
elem_class=0xb2b46e20, num_elements_to_alloc=73,
max_elements_to_alloc=-1, num_elements_per_alloc=1) at
class/opal_free_list.c:78
#2 0xb2b381aa in ompi_osc_pt2pt_component_init
(enable_progress_threads=false, enable_mpi_threads=false) at
osc_pt2pt_component.c:173
#3 0xb792b67c in ompi_osc_base_find_available
(enable_progress_threads=false, enable_mpi_threads=false) at
base/osc_base_open.c:84
#4 0xb78e6abe in ompi_mpi_init (argc=5, argv=0xbfd61f84, requested=0,
provided=0xbfd61e78) at runtime/ompi_mpi_init.c:411
#5 0xb7911a87 in PMPI_Init (argc=0xbfd61f00, argv=0xbfd61f04) at pinit.c:71
#6 0x0811ca6c in MPI::Init ()
#7 0x08118b8a in main ()

To investigate further, I replaced the OBJ_CONSTRUCT_INTERNAL
macro with its definition in opal/class/opal_object.h, and ran it again.
It appears that the invalid memory access is happening
on the instruction

((opal_object_t *) (item))->obj_class = (flist->fl_elem_class);

Investigating further, I modified the source to opal_free_list
with the attached patch. It adds a few debugging printfs to
diagnose exactly what the code is doing. The output of the debugging
statements are:

mpidebug: allocating 216
mpidebug: allocated at memory address 0xb62bdf28
mpidebug: accessing address 0xb62be000
[segfault]

Now, 0xb62be000 - 0xb62bdf28 = 216, which is
the size of the buffer allocated, and so I think
this is a buffer overrun.

Steps to reproduce:

a) Install Electric Fence
b) Compile the following program

#include <stdlib.h>
#include <unistd.h>

#include <mpi.h>

int main(int argc, char **argv)
{
   MPI::Init(argc, argv);
   MPI::Finalize();

   return 0;
}

with

mpiCC -o test ./test.cpp

c) gdb ./test
d) set environment LD_PRELOAD /usr/lib/libefence.so.0.0
e) run

Hope this helps,

Patrick Farrell

--
Patrick Farrell
PhD student
Imperial College London