Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] Some questions about checkpoint/restart (15)
From: Takayuki Seki (seki_at_[hidden])
Date: 2010-10-18 02:25:36


I have another question about checkpoint/restart of Open MPI.

The source file : ompi/runtime/ompi_cr.c
The function name : notify_collectives

In notify_collectives function, it seems to find modules and call ft_event functions per communicators
using the for statement.
A variable "modules" used in the for statement is an array which has 16 elements.

Source code is as follows:

#define NUM_COLLECTIVES 16

#define SIGNAL(comm, modules, highest_module, msg, ret, func) \
    do { \
        bool found = false; \
        int k; \
        mca_coll_base_module_t *my_module = \
            comm->c_coll.coll_ ## func ## _module; \
        if (NULL != my_module) { \
            for (k = 0 ; k < highest_module ; ++k) { \
                if (my_module == modules[k]) found = true; \
            } \
            if (!found) { \
                modules[highest_module++] = my_module; \
                if (NULL != my_module->ft_event) { \
                    ret = my_module->ft_event(msg); \
                } \
            } \
        } \
    } while (0)

static int
notify_collectives(int msg)
{
    mca_coll_base_module_t *modules[NUM_COLLECTIVES];
    int i, max, ret, highest_module = 0;

    memset(&modules, 0, sizeof(mca_coll_base_module_t*) * NUM_COLLECTIVES);

    max = opal_pointer_array_get_size(&ompi_mpi_communicators);
    for (i = 0 ; i < max ; ++i) {
        ompi_communicator_t *comm =
            (ompi_communicator_t *)opal_pointer_array_get_item(&ompi_mpi_communicators, i);
        if (NULL == comm) continue;

        SIGNAL(comm, modules, highest_module, msg, ret, allgather);
        SIGNAL(comm, modules, highest_module, msg, ret, allgatherv);

In the for statement, the subscript of the array "modules" is incremented if new module is found in macro named "SIGNAL".

I have two questions about this source.

1. I think variable "highest_module", which is a subscript variable of the array "modules",
   should be initialized at every communicator.
   If many communicators are created, does the code attempt to access array elements which are
   outside the bounds of the array "modules" declaration?

2. I think it works well if adding initialization of subscript variable "highest_module" to the for statement
   even if many communicators are created.
   Is that correct?
   For example:

    for (i = 0 ; i < max ; ++i) {
        ompi_communicator_t *comm =
            (ompi_communicator_t *)opal_pointer_array_get_item(&ompi_mpi_communicators, i);

        highest_module = 0; /* <- add initialization of subscript variable "highest_module" */

        if (NULL == comm) continue;