Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Rolf.Vandevaart_at_[hidden]
Date: 2007-10-02 15:46:14


My guess is you must have a mismatched MPI_Bcast somewhere
in the code. Presumably, there is a call to MPI_Bcast on the head
node that broadcasts something larger than 1 MPI_INT and does not
have the matching call on the worker nodes. Then, when the MPI_Bcast
on the worker nodes is called, they are matching up with the previous
call on the head node.

Since the head node is the root of the MPI_Bcast, then it can
blast its data to the other nodes and then continue on. I have
attached a simple program that shows the same behavior.

And yes, the Send and Recv can work independently from the
Broadcast as they are using different tags to match up their data.

Rolf

PS: Simple program at end of message.

Marcin Skoczylas wrote:

>Sorry I forgot to mention: Open MPI version 1.2.4
>
>
>Marcin Skoczylas wrote:
>
>
>>Hello,
>>
>>After whole day of coding I'm fighting little bit with one small
>>fragment which seems strange for me.
>>For testing I have one head node and two worker nodes on localhost.
>>Having this code (with debug stuff added like sleeps, barriers, etc):
>>
>>void CImageData::SpreadToNodes()
>>{
>> sleep(5);
>> logger->debug("CImageData::SpreadToNodes, w=%d h=%d type=%d",
>> this->width, this->height, this->type);
>>
>> logger->debug("head barrier");
>> MPI_Barrier(MPI_COMM_WORLD);
>> sleep(2);
>> MPI_Barrier(MPI_COMM_WORLD);
>>
>> // debug 'sync' test
>> logger->debug("head send SYNC str");
>> char buf[5];
>> buf[0] = 'S'; buf[1] = 'Y'; buf[2] = 'N'; buf[3] = 'C';
>> for (int nodeId = 1; nodeId < g_NumProcesses; nodeId++)
>> {
>> MPI_Send(buf, 4, MPI_CHAR, nodeId, DEF_MSG_TAG, MPI_COMM_WORLD);
>> }
>>
>> logger->debug("head bcast width: %d", this->width);
>> MPI_Bcast(&(this->width), 1, MPI_INT, 0, MPI_COMM_WORLD);
>> logger->debug("head bcast height: %d", this->height);
>> MPI_Bcast(&(this->height), 1, MPI_INT, 0, MPI_COMM_WORLD);
>> logger->debug("head bcast type: %d", this->type);
>> MPI_Bcast(&(this->type), 1, MPI_UNSIGNED_CHAR, 0, MPI_COMM_WORLD);
>>
>> logger->debug("head sleep 10s");
>> sleep(10);
>>
>> logger->debug("finished CImageData::SpreadToNodes");
>>}
>>
>>// this function is decleared static:
>>CImageData *CImageData::ReceiveFromHead()
>>{
>> sleep(5);
>>
>> logger->debug("CImageData::ReceiveFromHead");
>> MPI_Status status;
>> int _width;
>> int _height;
>> byte _type;
>>
>> logger->debug("worker barrier");
>> MPI_Barrier(MPI_COMM_WORLD);
>> sleep(2);
>> MPI_Barrier(MPI_COMM_WORLD);
>>
>> char buf[5];
>> MPI_Recv(buf, 4, MPI_CHAR, HEAD_NODE, DEF_MSG_TAG, MPI_COMM_WORLD,
>>&status);
>> logger->debug("worker received sync str: '%c' '%c' '%c' '%c'",
>>buf[0], buf[1], buf[2], buf[3]);
>>
>> logger->debug("worker bcast width");
>> MPI_Bcast(&(_width), 1, MPI_INT, 0, MPI_COMM_WORLD);
>> logger->debug("worker bcast height");
>> MPI_Bcast(&(_height), 1, MPI_INT, 0, MPI_COMM_WORLD);
>> logger->debug("worker bcast type");
>> MPI_Bcast(&(_type), 1, MPI_UNSIGNED_CHAR, 0, MPI_COMM_WORLD);
>>
>> logger->debug("width=%d height=%d type=%d", _width, _height, _type);
>>
>> // TODO: create CImageData object, return...
>> return NULL;
>>}
>>
>>
>>That part of code gives me an error:
>>RANK 0 -> PID 17115
>>RANK 1 -> PID 17116
>>RANK 2 -> PID 17117
>>
>>2007-10-02 19:50:37,829 [17115] DEBUG: CImageData::SpreadToNodes, w=768
>>h=576 type=1
>>2007-10-02 19:50:37,829 [17117] DEBUG: CImageData::ReceiveFromHead
>>2007-10-02 19:50:37,829 [17115] DEBUG: head barrier
>>2007-10-02 19:50:37,829 [17116] DEBUG: CImageData::ReceiveFromHead
>>2007-10-02 19:50:37,829 [17116] DEBUG: worker barrier
>>2007-10-02 19:50:37,829 [17117] DEBUG: worker barrier
>>2007-10-02 19:50:39,836 [17115] DEBUG: head send SYNC str
>>2007-10-02 19:50:39,836 [17115] DEBUG: head bcast width: 768
>>2007-10-02 19:50:39,836 [17115] DEBUG: head bcast height: 576
>>2007-10-02 19:50:39,836 [17115] DEBUG: head bcast type: 1
>>2007-10-02 19:50:39,836 [17115] DEBUG: head sleep 10s
>>2007-10-02 19:50:39,836 [17116] DEBUG: worker received sync str: 'S' 'Y'
>>'N' 'C'
>>2007-10-02 19:50:39,836 [17116] DEBUG: worker bcast width
>>[pc801:17116] *** An error occurred in MPI_Bcast
>>[pc801:17116] *** on communicator MPI_COMM_WORLD
>>[pc801:17116] *** MPI_ERR_TRUNCATE: message truncated
>>[pc801:17116] *** MPI_ERRORS_ARE_FATAL (goodbye)
>>2007-10-02 19:50:39,836 [17117] DEBUG: worker received sync str: 'S' 'Y'
>>'N' 'C'
>>2007-10-02 19:50:39,836 [17117] DEBUG: worker bcast width
>>[pc801:17117] *** An error occurred in MPI_Bcast
>>[pc801:17117] *** on communicator MPI_COMM_WORLD
>>[pc801:17117] *** MPI_ERR_TRUNCATE: message truncated
>>[pc801:17117] *** MPI_ERRORS_ARE_FATAL (goodbye)
>>mpirun noticed that job rank 0 with PID 17115 on node pc801 exited on
>>signal 15 (Terminated).
>>
>>
>>Could it be that somewhere before this part the data stream was out of
>>sync? The project is quite large and I have a lot of communication
>>between processes before CImageData::SpreadToNodes() so whole debugging
>>could take hours/days, however it seems that data flow before this
>>particular fragment is ok. How could it be that MPI_Send/Recv gave me
>>good buffer (4 chars - SYNC) and MPI_Bcast of MPI_INT is truncated? I
>>tested the code on Valgrind - it didn't complain and gave me exactly the
>>same result. Can I assume that possibly I have somewhere memory-acces
>>error before this part and I destroyed the MPI structures? How exactly
>>MPI_Bcast is working?
>>
>>Sorry for disturb, but I'm little bit confused.
>>Thank you & greetings, Marcin
>>
>>_______________________________________________
>>users mailing list
>>users_at_[hidden]
>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>>
>
>_______________________________________________
>users mailing list
>users_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
#include <stdio.h>
#include <mpi.h>

int
main(int argc, char **argv)
{
    int rank;
    int np; /* number of processes in job */
    int foo[2];

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &np);

    /*
     * Here is an errant MPI_Bcast. Since this is the root, it
     * can just send the data, and continue on. A MPI_Bcast is
     * not necessarily a synchronizing collective.
     */
    if (rank == 0) {
    printf("Rank 0 broadcasting 2 MPI_INTs...\n");
    MPI_Bcast(foo, 2, MPI_INT, 0, MPI_COMM_WORLD);
    }

    MPI_Barrier(MPI_COMM_WORLD);
    if (rank == 0) {
    printf("\nEveryone has now synchronized.\n");
    }

    /*
     * Now, everyone calls a broadcast again. However, the non-zero
     * ranks are matching up with the previous broadcast.
     */
    printf("rank %d is calling MPI_Bcast...\n", rank);

    MPI_Bcast(foo, 1, MPI_INT, 0, MPI_COMM_WORLD);

    MPI_Finalize();
    return 0;
}