Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] UDP like messaging with MPI
From: Jeremiah Willcock (jewillco_at_[hidden])
Date: 2011-11-21 14:50:22


On Mon, 21 Nov 2011, Mudassar Majeed wrote:

> Thank you for your answer. Actually, I used the term UDP to show the non-connection oriented messaging. TCP creates connection between two parties (who
> communicate) but in UDP a message can be sent to any IP/port where a process/thread is listening to, and if the process is busy in doing something, all the
> received messages are queued for it and when ever it calls the recv function one message is taken from the queue.

That is how MPI message matching works; messages sit in a queue until you
call MPI_Irecv (or MPI_Recv or MPI_Probe, etc.) to get them. Unlike UDP,
MPI messages do not need to complete on the sender until they are
received, so you will probably need to use MPI_Isend to avoid deadlocks.

> I am implementing a distributed algorithm that will provide communication sensitive load balancing for computational loads. For example, if we have 10 nodes each
> containing 10 cores (100 cores in total). So when MPI application will start (let say with 1000) processes (more than 1 process per core) then I will run my
> distributed algorithm MPI_Balance (sorry for giving MPI_ prefix as it is not a part of MPI, but I am trying to make it the part of MPI ;) ). So that algorithm
> will take those processes that communicate more in the same node (keeping the computational load on 10 cores on that node balanced).
>
> So that was the little bit explanation. So for that my distributed algorithm requires that some processes communicate with each other to collaborate on something.
> So I need a kind of messaging that I explained above. It is kind of UDP messaging (no connection before sending a message, and message is always queued on the
> receiver's side and sender is not blocked, it just sends the message and the receiver takes it when it gets free from other task).

The one difficulty in doing this is to manage the MPI requests from the
sends and poll them with MPI_Test periodically. You can just keep the
requests in an array (std::vector in C++) which can be expanded when
needed; to send a message, call MPI_Isend and put the request into the
array, and periodically call MPI_Testany or MPI_Testsome on the array to
find completed requests. Note that you will need to keep the data being
sent intact in its buffer until the request completes. Here's a naive
version that does extra copies and doesn't clean out its arrays of
requests or buffers:

class message_send_engine {
   vector<MPI_Request> requests;
   vector<vector<char> > buffers;

   public:
   void send(void* buf, int byte_len, int dest, int tag) {
     MPI_Request req;
     size_t buf_num = buffers.size();
     buffers.resize(buf_num + 1);
     buffers[buf_num].assign((char*)buf, (char*)buf + byte_len);
     requests.resize(buf_num + 1);
     MPI_Isend(&buffers[buf_num][0], byte_len, MPI_BYTE, dest, tag, MPI_COMM_WORLD, &requests[buf_num]);
   }

   void poll() { // Call this periodically
     while (true) {
       int index, flag;
       MPI_Testany((int)requests.size(), &requests[0], &index, &flag, MPI_STATUS_IGNORE);
       if (flag && index != MPI_UNDEFINED) {
         buffers[index].clear(); // Free memory
       } else {
         break;
       }
     }
   }
};

bool test_for_message(void* buf, int max_len, MPI_Status& st) {
   int flag;
   MPI_Iprobe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &flag, &st);
   return (flag != 0);
}

If test_for_message returns true, you can then use MPI_Recv to get the
message.

> I have tried to use the combination of MPI_Send, MPI_Recv, MPI_Iprobe,
> MPI_Isend, MPI_Irecv, MPI_Test etc, but I am not getting that thing that
> I am looking for. I think MPI should also provide that way. May be it is
> not in my knowledge. That's why I am asking the experts. I am still
> looking for it :(

-- Jeremiah Willcock