Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] (no subject)
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-04-23 12:34:04


Because on-node communication typically uses shared memory, so we
currently have to poll. Additionally, when using mixed on/off-node
communication, we have to alternate between polling shared memory and
polling the network.

Additionally, we actively poll because it's the best way to lower
latency. MPI implementations are almost always first judged on their
latency, not [usually] their CPU utilization. Going to sleep in a
blocking system call will definitely negatively impact latency.

We have plans for implementing the "spin for a while and then block"
technique (as has been used in other MPI's and middleware layers), but
it hasn't been a high priority.

On Apr 23, 2008, at 12:19 PM, Alberto Giannetti wrote:

> Thanks Torje. I wonder what is the benefit of looping on the incoming
> message-queue socket rather than using system I/O signals, like read
> () or select().
>
> On Apr 23, 2008, at 12:10 PM, Torje Henriksen wrote:
>> Hi Alberto,
>>
>> The blocked processes are in fact spin-waiting. While they don't have
>> anything better to do (waiting for that message), they will check
>> their incoming message-queues in a loop.
>>
>> So the MPI_Recv()-operation is blocking, but it doesn't mean that the
>> processes are blocked by the OS scheduler.
>>
>>
>> I hope that made some sense :)
>>
>>
>> Best regards,
>>
>> Torje
>>
>>
>> On Apr 23, 2008, at 5:34 PM, Alberto Giannetti wrote:
>>
>>> I have simple MPI program that sends data to processor rank 0. The
>>> communication works well but when I run the program on more than 2
>>> processors (-np 4) the extra receivers waiting for data run on > 90%
>>> CPU load. I understand MPI_Recv() is a blocking operation, but why
>>> does it consume so much CPU compared to a regular system read()?
>>>
>>>
>>>
>>> #include <sys/types.h>
>>> #include <unistd.h>
>>> #include <stdio.h>
>>> #include <stdlib.h>
>>> #include <mpi.h>
>>>
>>> void process_sender(int);
>>> void process_receiver(int);
>>>
>>>
>>> int main(int argc, char* argv[])
>>> {
>>> int rank;
>>>
>>> MPI_Init(&argc, &argv);
>>> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>>
>>> printf("Processor %d (%d) initialized\n", rank, getpid());
>>>
>>> if( rank == 1 )
>>> process_sender(rank);
>>> else
>>> process_receiver(rank);
>>>
>>> MPI_Finalize();
>>> }
>>>
>>>
>>> void process_sender(int rank)
>>> {
>>> int i, j, size;
>>> float data[100];
>>> MPI_Status status;
>>>
>>> printf("Processor %d initializing data...\n", rank);
>>> for( i = 0; i < 100; ++i )
>>> data[i] = i;
>>>
>>> MPI_Comm_size(MPI_COMM_WORLD, &size);
>>>
>>> printf("Processor %d sending data...\n", rank);
>>> MPI_Send(data, 100, MPI_FLOAT, 0, 55, MPI_COMM_WORLD);
>>> printf("Processor %d sent data\n", rank);
>>> }
>>>
>>>
>>> void process_receiver(int rank)
>>> {
>>> int count;
>>> float value[200];
>>> MPI_Status status;
>>>
>>> printf("Processor %d waiting for data...\n", rank);
>>> MPI_Recv(value, 200, MPI_FLOAT, MPI_ANY_SOURCE, 55,
>>> MPI_COMM_WORLD, &status);
>>> printf("Processor %d Got data from processor %d\n", rank,
>>> status.MPI_SOURCE);
>>> MPI_Get_count(&status, MPI_FLOAT, &count);
>>> printf("Processor %d, Got %d elements\n", rank, count);
>>> }
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems