Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] execuation time is not stable with 2 processes
From: Eugene Loh (eugene.loh_at_[hidden])
Date: 2010-08-04 11:32:26


Mark Potts wrote:

> Hi,
> I'd opt for the fact that tv0 is given value only on rank 0 and
> tv1 is
> only given value on rank 1. Kind of hard to get a diff betwn the two
> on either rank with that setup. You need to determine the tv0 and
> tv1
> on both ranks.

I don't understand this. It appears to me that tv1-tv0 is computed and
reported on each process. This seems okay.

> In addition, there are a number of other errors in the code (such as
> MPI_Finalize() as an errant function outside of main), etc.

Yes, there seem to be many small errors, such as tag undeclared, extra
bracket so that MPI_Finalize is outside main, MPI_Send on rank 1 sending
to itself, etc.

Regarding performance methodology, you should consider adding another
loop so that your program reports multiple timings instead of just one.
Also, use MPI_Wtime for your timer since it's more likely to pick up a
portable, high-resolution timer.

> Ralph Castain wrote:
>
>> Did you bind the processes? If not you may be seeing the impact of
>> having processes bouncing between cpus, and/or processes not being
>> local to their memory. Try adding -bind-to-core or -bind-to-socket to
>> your cmd line and see if things smooth out. I'm assuming, of course,
>> that you are running on a system that supports binding...
>
And know what result you are expecting. You are reporting the total
number of microseconds for 2000 round trips. If we divide 3000 and 100
by 2000, that's 1.5 usec and 0.05 usec for latency. The first is
reasonable for shared memory. The second sounds much too short.
Perhaps your timer has too high granularity?

>> The time can also be impacted by other things running on your cpu -
>> could be context switching.
>
It seems to me that your results are not too slow but too fast. Again,
high-granularity timings may be at fault. Might need to time a larger
number of iterations within the timer and report multiple measurements.

>> Final point: since both processes are running on the same node, IB
>> will have no involvement - the messages are going to flow over shared
>> memory.
>
+1

>>
>>
>> On Aug 4, 2010, at 6:51 AM, Tad Lake wrote:
>>
>>> Hi,
>>> I have a little program for execution time.
>>> =================================================
>>> #include "mpi.h"
>>> #include <stdio.h>
>>> #include <stdlib.h>
>>> #include <string.h>
>>> #include <math.h>
>>> int main (int argc, char *argv[]) {
>>> MPI_Status Stat;
>>> struct timeval tv0, tv1;
>>>
>>> long int totaltime = 0;
>>> int i, j;
>>> int buf[10240];
>>> int numtasks, rank;
>>>
>>> MPI_Init (&argc, &argv);
>>> MPI_Comm_size (MPI_COMM_WORLD, &numtasks);
>>> MPI_Comm_rank (MPI_COMM_WORLD, &rank);
>>>
>>>
>>> if (rank == 0) {
>>> gettimeofday("&tv0, NULL); for(i=0;i<1000;i++){
>>> MPI_Send (buf, 10240, MPI_INT, 1, tag, MPI_COMM_WORLD);
>>> MPI_Recv (buf, 10240, MPI_INT, 1, tag,MPI_COMM_WORLD, &Stat);
>>> }
>>> gettimeofday (&tv1, NULL);
>>> }else{
>>> gettimeofday(&tv0, NULL);
>>> for(i=0;i<1000;i++){
>>> MPI_Recv(buf, 10240,MPI_INT, 0, tag, MPI_COMM_WORLD, &Stat);
>>> MPI_Send(buf, 10240, MPI_INT, 1, tag, MPI_COMM_WORLD);
>>> }
>>> gettimeofday(&tv1, NULL);
>>> }
>>>
>>> totaltime = (tv1.tv_sec - tv0.tv_sec) * 1000000 + (tv1.tv_usec -
>>> tv0.tv_usec);
>>> fprintf (stdout, "rank %d with total time is %d",rank, totaltime);
>>> }
>>>
>>> MPI_Finalize ();
>>>
>>> return 0;
>>> } =======================================================
>>>
>>> I run it :
>>> mpirun -np 2 --host node2 ./a.out
>>>
>>> But the result of time is not stable with difference of 100 times.
>>> For example, the max value of time can be 3000, meanwhile the min is
>>> 100.
>>>
>>> Is there anything wrong ?
>>> I am using 1.4.2 and openib.
>>