Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Marcin Skoczylas (Marcin.Skoczylas_at_[hidden])
Date: 2007-10-03 05:42:37


Thank you very much for detailed explanation! I was affraid that
previous mismatch in MPI_Bcast could be the problem, I hoped that whole
data stream goes in the main tag so I can catch problems easily just by
sending synced data. I assume that Bcast is performed that way to speed
up the communication. So I postponed one module where I always thought I
have some mismatched things (however it works) and now broadcasting of
the image data is correct. I have wrappers everywhere anyway, so to
catch the bug there I'll just have to look at all MPI_Bcasts or exchange
them temporary to MPI_Send/Recv to keep the flow synced. Thanks again!

Rolf.Vandevaart_at_[hidden] wrote:
>
> My guess is you must have a mismatched MPI_Bcast somewhere
> in the code. Presumably, there is a call to MPI_Bcast on the head
> node that broadcasts something larger than 1 MPI_INT and does not
> have the matching call on the worker nodes. Then, when the MPI_Bcast
> on the worker nodes is called, they are matching up with the previous
> call on the head node.
>
> Since the head node is the root of the MPI_Bcast, then it can
> blast its data to the other nodes and then continue on. I have
> attached a simple program that shows the same behavior.
>
> And yes, the Send and Recv can work independently from the
> Broadcast as they are using different tags to match up their data.
>
> Rolf
>
> PS: Simple program at end of message.
>
> Marcin Skoczylas wrote:
>
>> Sorry I forgot to mention: Open MPI version 1.2.4
>>
>>
>> Marcin Skoczylas wrote:
>>
>>
>>> Hello,
>>>
>>> After whole day of coding I'm fighting little bit with one small
>>> fragment which seems strange for me.
>>> For testing I have one head node and two worker nodes on localhost.
>>> Having this code (with debug stuff added like sleeps, barriers, etc):
>>>
>>> void CImageData::SpreadToNodes()
>>> {
>>> sleep(5);
>>> logger->debug("CImageData::SpreadToNodes, w=%d h=%d type=%d",
>>> this->width, this->height, this->type);
>>> logger->debug("head barrier");
>>> MPI_Barrier(MPI_COMM_WORLD);
>>> sleep(2);
>>> MPI_Barrier(MPI_COMM_WORLD);
>>> // debug 'sync' test
>>> logger->debug("head send SYNC str");
>>> char buf[5];
>>> buf[0] = 'S'; buf[1] = 'Y'; buf[2] = 'N'; buf[3] = 'C';
>>> for (int nodeId = 1; nodeId < g_NumProcesses; nodeId++)
>>> { MPI_Send(buf, 4, MPI_CHAR, nodeId, DEF_MSG_TAG,
>>> MPI_COMM_WORLD);
>>> }
>>> logger->debug("head bcast width: %d", this->width);
>>> MPI_Bcast(&(this->width), 1, MPI_INT, 0, MPI_COMM_WORLD);
>>> logger->debug("head bcast height: %d", this->height);
>>> MPI_Bcast(&(this->height), 1, MPI_INT, 0, MPI_COMM_WORLD);
>>> logger->debug("head bcast type: %d", this->type);
>>> MPI_Bcast(&(this->type), 1, MPI_UNSIGNED_CHAR, 0, MPI_COMM_WORLD);
>>> logger->debug("head sleep 10s");
>>> sleep(10);
>>> logger->debug("finished CImageData::SpreadToNodes"); }
>>>
>>> // this function is decleared static:
>>> CImageData *CImageData::ReceiveFromHead()
>>> {
>>> sleep(5);
>>>
>>> logger->debug("CImageData::ReceiveFromHead");
>>> MPI_Status status;
>>> int _width;
>>> int _height;
>>> byte _type;
>>> logger->debug("worker barrier");
>>> MPI_Barrier(MPI_COMM_WORLD);
>>> sleep(2);
>>> MPI_Barrier(MPI_COMM_WORLD);
>>>
>>> char buf[5];
>>> MPI_Recv(buf, 4, MPI_CHAR, HEAD_NODE, DEF_MSG_TAG,
>>> MPI_COMM_WORLD, &status);
>>> logger->debug("worker received sync str: '%c' '%c' '%c' '%c'",
>>> buf[0], buf[1], buf[2], buf[3]);
>>>
>>> logger->debug("worker bcast width");
>>> MPI_Bcast(&(_width), 1, MPI_INT, 0, MPI_COMM_WORLD);
>>> logger->debug("worker bcast height");
>>> MPI_Bcast(&(_height), 1, MPI_INT, 0, MPI_COMM_WORLD);
>>> logger->debug("worker bcast type");
>>> MPI_Bcast(&(_type), 1, MPI_UNSIGNED_CHAR, 0, MPI_COMM_WORLD);
>>> logger->debug("width=%d height=%d type=%d", _width, _height,
>>> _type);
>>>
>>> // TODO: create CImageData object, return...
>>> return NULL;
>>> }
>>>
>>>
>>> That part of code gives me an error:
>>> RANK 0 -> PID 17115
>>> RANK 1 -> PID 17116
>>> RANK 2 -> PID 17117
>>>
>>> 2007-10-02 19:50:37,829 [17115] DEBUG: CImageData::SpreadToNodes,
>>> w=768 h=576 type=1
>>> 2007-10-02 19:50:37,829 [17117] DEBUG: CImageData::ReceiveFromHead
>>> 2007-10-02 19:50:37,829 [17115] DEBUG: head barrier
>>> 2007-10-02 19:50:37,829 [17116] DEBUG: CImageData::ReceiveFromHead
>>> 2007-10-02 19:50:37,829 [17116] DEBUG: worker barrier
>>> 2007-10-02 19:50:37,829 [17117] DEBUG: worker barrier
>>> 2007-10-02 19:50:39,836 [17115] DEBUG: head send SYNC str
>>> 2007-10-02 19:50:39,836 [17115] DEBUG: head bcast width: 768
>>> 2007-10-02 19:50:39,836 [17115] DEBUG: head bcast height: 576
>>> 2007-10-02 19:50:39,836 [17115] DEBUG: head bcast type: 1
>>> 2007-10-02 19:50:39,836 [17115] DEBUG: head sleep 10s
>>> 2007-10-02 19:50:39,836 [17116] DEBUG: worker received sync str: 'S'
>>> 'Y' 'N' 'C'
>>> 2007-10-02 19:50:39,836 [17116] DEBUG: worker bcast width
>>> [pc801:17116] *** An error occurred in MPI_Bcast
>>> [pc801:17116] *** on communicator MPI_COMM_WORLD
>>> [pc801:17116] *** MPI_ERR_TRUNCATE: message truncated
>>> [pc801:17116] *** MPI_ERRORS_ARE_FATAL (goodbye)
>>> 2007-10-02 19:50:39,836 [17117] DEBUG: worker received sync str: 'S'
>>> 'Y' 'N' 'C'
>>> 2007-10-02 19:50:39,836 [17117] DEBUG: worker bcast width
>>> [pc801:17117] *** An error occurred in MPI_Bcast
>>> [pc801:17117] *** on communicator MPI_COMM_WORLD
>>> [pc801:17117] *** MPI_ERR_TRUNCATE: message truncated
>>> [pc801:17117] *** MPI_ERRORS_ARE_FATAL (goodbye)
>>> mpirun noticed that job rank 0 with PID 17115 on node pc801 exited
>>> on signal 15 (Terminated).
>>>
>>>
>>> Could it be that somewhere before this part the data stream was out
>>> of sync? The project is quite large and I have a lot of
>>> communication between processes before CImageData::SpreadToNodes()
>>> so whole debugging could take hours/days, however it seems that data
>>> flow before this particular fragment is ok. How could it be that
>>> MPI_Send/Recv gave me good buffer (4 chars - SYNC) and MPI_Bcast of
>>> MPI_INT is truncated? I tested the code on Valgrind - it didn't
>>> complain and gave me exactly the same result. Can I assume that
>>> possibly I have somewhere memory-acces error before this part and I
>>> destroyed the MPI structures? How exactly MPI_Bcast is working?
>>>
>>> Sorry for disturb, but I'm little bit confused.
>>> Thank you & greetings, Marcin
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
> #include <stdio.h>
> #include <mpi.h>
>
> int
> main(int argc, char **argv)
> {
> int rank;
> int np; /* number of processes in job */
> int foo[2];
>
> MPI_Init(&argc, &argv);
> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
> MPI_Comm_size(MPI_COMM_WORLD, &np);
>
> /*
> * Here is an errant MPI_Bcast. Since this is the root, it
> * can just send the data, and continue on. A MPI_Bcast is
> * not necessarily a synchronizing collective.
> */
> if (rank == 0) {
> printf("Rank 0 broadcasting 2 MPI_INTs...\n");
> MPI_Bcast(foo, 2, MPI_INT, 0, MPI_COMM_WORLD);
> }
>
> MPI_Barrier(MPI_COMM_WORLD);
> if (rank == 0) {
> printf("\nEveryone has now synchronized.\n");
> }
>
> /*
> * Now, everyone calls a broadcast again. However, the non-zero
> * ranks are matching up with the previous broadcast.
> */
> printf("rank %d is calling MPI_Bcast...\n", rank);
>
> MPI_Bcast(foo, 1, MPI_INT, 0, MPI_COMM_WORLD);
>
> MPI_Finalize();
> return 0;
> }
>