Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] (no subject)
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-04-23 15:00:18


OMPI doesn't use SYSV shared memory; it uses mmaped files.

ompi_info will tell you all about the components installed. If you
see a BTL component named "sm", then shared memory support is
installed. I do not believe that we conditionally install sm on Linux
or OS X systems -- it should always be installed.

ompi_info | grep btl

On Apr 23, 2008, at 2:55 PM, Alberto Giannetti wrote:

> I am running the test program on Darwin 8.11.1, 1.83 Ghz Intel dual
> core. My Open MPI install is 1.2.4.
> I can't see any allocated shared memory segment on my system (ipcs -
> m), although the receiver opens a couple of TCP sockets in listening
> mode. It looks like my implementation does not use shared memory. Is
> this a configuration issue?
>
>> a.out 5628 albertogiannetti 3u unix R,W,NB
>> 0x380b198 0t0 ->0x41ced48
>> a.out 5628 albertogiannetti 4u unix R,W
>> 0x41ced48 0t0 ->0x380b198
>> a.out 5628 albertogiannetti 5u IPv4 R,W,NB
>> 0x3d4d920 0t0 TCP *:50969 (LISTEN)
>> a.out 5628 albertogiannetti 6u IPv4 R,W,NB
>> 0x3e62394 0t0 TCP 192.168.0.10:50970->192.168.0.10:50962
>> (ESTABLISHED)
>> a.out 5628 albertogiannetti 7u IPv4 R,W,NB
>> 0x422d228 0t0 TCP *:50973 (LISTEN)
>> a.out 5628 albertogiannetti 8u IPv4 R,W,NB
>> 0x2dfd394 0t0 TCP 192.168.0.10:50969->192.168.0.10:50975
>> (ESTABLISHED)
>
>
> On Apr 23, 2008, at 12:34 PM, Jeff Squyres wrote:
>> Because on-node communication typically uses shared memory, so we
>> currently have to poll. Additionally, when using mixed on/off-node
>> communication, we have to alternate between polling shared memory and
>> polling the network.
>>
>> Additionally, we actively poll because it's the best way to lower
>> latency. MPI implementations are almost always first judged on their
>> latency, not [usually] their CPU utilization. Going to sleep in a
>> blocking system call will definitely negatively impact latency.
>>
>> We have plans for implementing the "spin for a while and then block"
>> technique (as has been used in other MPI's and middleware layers),
>> but
>> it hasn't been a high priority.
>>
>>
>> On Apr 23, 2008, at 12:19 PM, Alberto Giannetti wrote:
>>
>>> Thanks Torje. I wonder what is the benefit of looping on the
>>> incoming
>>> message-queue socket rather than using system I/O signals, like read
>>> () or select().
>>>
>>> On Apr 23, 2008, at 12:10 PM, Torje Henriksen wrote:
>>>> Hi Alberto,
>>>>
>>>> The blocked processes are in fact spin-waiting. While they don't
>>>> have
>>>> anything better to do (waiting for that message), they will check
>>>> their incoming message-queues in a loop.
>>>>
>>>> So the MPI_Recv()-operation is blocking, but it doesn't mean that
>>>> the
>>>> processes are blocked by the OS scheduler.
>>>>
>>>>
>>>> I hope that made some sense :)
>>>>
>>>>
>>>> Best regards,
>>>>
>>>> Torje
>>>>
>>>>
>>>> On Apr 23, 2008, at 5:34 PM, Alberto Giannetti wrote:
>>>>
>>>>> I have simple MPI program that sends data to processor rank 0. The
>>>>> communication works well but when I run the program on more than 2
>>>>> processors (-np 4) the extra receivers waiting for data run on >
>>>>> 90%
>>>>> CPU load. I understand MPI_Recv() is a blocking operation, but why
>>>>> does it consume so much CPU compared to a regular system read()?
>>>>>
>>>>>
>>>>>
>>>>> #include <sys/types.h>
>>>>> #include <unistd.h>
>>>>> #include <stdio.h>
>>>>> #include <stdlib.h>
>>>>> #include <mpi.h>
>>>>>
>>>>> void process_sender(int);
>>>>> void process_receiver(int);
>>>>>
>>>>>
>>>>> int main(int argc, char* argv[])
>>>>> {
>>>>> int rank;
>>>>>
>>>>> MPI_Init(&argc, &argv);
>>>>> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>>>>
>>>>> printf("Processor %d (%d) initialized\n", rank, getpid());
>>>>>
>>>>> if( rank == 1 )
>>>>> process_sender(rank);
>>>>> else
>>>>> process_receiver(rank);
>>>>>
>>>>> MPI_Finalize();
>>>>> }
>>>>>
>>>>>
>>>>> void process_sender(int rank)
>>>>> {
>>>>> int i, j, size;
>>>>> float data[100];
>>>>> MPI_Status status;
>>>>>
>>>>> printf("Processor %d initializing data...\n", rank);
>>>>> for( i = 0; i < 100; ++i )
>>>>> data[i] = i;
>>>>>
>>>>> MPI_Comm_size(MPI_COMM_WORLD, &size);
>>>>>
>>>>> printf("Processor %d sending data...\n", rank);
>>>>> MPI_Send(data, 100, MPI_FLOAT, 0, 55, MPI_COMM_WORLD);
>>>>> printf("Processor %d sent data\n", rank);
>>>>> }
>>>>>
>>>>>
>>>>> void process_receiver(int rank)
>>>>> {
>>>>> int count;
>>>>> float value[200];
>>>>> MPI_Status status;
>>>>>
>>>>> printf("Processor %d waiting for data...\n", rank);
>>>>> MPI_Recv(value, 200, MPI_FLOAT, MPI_ANY_SOURCE, 55,
>>>>> MPI_COMM_WORLD, &status);
>>>>> printf("Processor %d Got data from processor %d\n", rank,
>>>>> status.MPI_SOURCE);
>>>>> MPI_Get_count(&status, MPI_FLOAT, &count);
>>>>> printf("Processor %d, Got %d elements\n", rank, count);
>>>>> }
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems