Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Bug when mixing sent types in version 1.6
From: BOUVIER Benjamin (benjamin.bouvier_at_[hidden])
Date: 2012-06-08 09:43:58

Hi everybody,

I have currently a bug when launching a very simple MPI program with mpirun, on connected nodes. This happens when I send an INT and then some CHAR strings from a master node to a worker node.
Here is the minimal code to reproduce the bug :

# include <mpi.h>
# include <stdio.h>
# include <string.h>

int main(int argc, char **argv)
    int rank, size;
    const char someString[] = "Can haz cheezburgerz?";

    MPI_Init(&argc, &argv);

    MPI_Comm_rank( MPI_COMM_WORLD, & rank );
    MPI_Comm_size( MPI_COMM_WORLD, & size );

    if ( rank == 0 )
        int len = strlen( someString );
        int i;
        for( i = 1; i < size; ++i)
            MPI_Send( &len, 1, MPI_INT, i, 0, MPI_COMM_WORLD );
            MPI_Send( &someString, len+1, MPI_CHAR, i, 0, MPI_COMM_WORLD );
    } else {
        char buffer[ 128 ];
        int receivedLen;
        MPI_Status stat;
        MPI_Recv( &receivedLen, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &stat );
        printf( "[Worker] Length : %d\n", receivedLen );
        MPI_Recv( buffer, receivedLen+1, MPI_CHAR, 0, 0, MPI_COMM_WORLD, &stat);
        printf( "[Worker] String : %s\n", buffer );


I know that there is a better way to send a string, by giving a maximum buffer size at the second MPI_Recv, but there is no the main topic here.
The launch works locally (i.e when the 2 processes are launched on one machine), but doesn't work when the 2 processes are dispatched in 2 machines through network (i.e one per host). In this case, the worker correctly reads the INT, and then master and worker block on the next call.
I have no issue when sending only char strings or only numbers. This only happens when sending char strings then numbers, or in the other order.

I'm using OpenMPI version 1.6, locally compiled.
$ uname -a
Linux trtp7097 2.6.32-220.13.1.el6.x86_64 #1 SMP Thu Mar 29 11:46:40 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/redhat-release
Red Hat Enterprise Linux Workstation release 6.2 (Santiago)

Is it a bad use of the framework or could it be a bug ?

Thank you in advance.