Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] benchmark - mpi_reduce() called only once but takes long time - proportional to calculation time
From: Eugene Loh (Eugene.Loh_at_[hidden])
Date: 2009-11-25 12:45:01


Your processes are probably running asynchronously. You could perhaps
try tracing program execution and look at the timeline. E.g.,
http://www.open-mpi.org/faq/?category=perftools#free-tools . Or, where
you have MPI_Wtime calls, just capture those timestamps on each process
and dump the results at the end of your run. Or, report timings for all
ranks instead of just for rank 0.

Put another way, rank 0 must broadcast n. So, no one starts computation
until they get the Bcast result. Rank 0 probably starts its
computations before anyone else does. So, it gets to the Reduce before
anyone else does, but it can't exit until other ranks have finished
their computations. So, the Reduce time on rank 0 includes some amount
of other ranks' compute times.

Yet another approach is to insert MPI_Barrier calls at each phase of the
program so that the various phases are synchronized. This adds some
overhead to the program, but helps simplify interpretation of the timing
results.

Qing Pang wrote:

> I'm running the popular Calculate PI program on a 2 node setting
> running ubuntu 8.10 and openmpi1.3.3(with default settings).
> Password-less ssh is set up but no cluster management program such as
> network file system, network time protocol, resource management,
> scheduler, etc. The two nodes are connected though TCP/IP only.
>
> When I tried to benchmark the program, it shows that the time spent on
> MPI_Reduce(), is proportional to the Number-of-Intervals (n) used in
> calculation. For example, when n = 1,000,000, MPI_Reduce costs 15.65
> milliseconds; while n= 1,000,000,000, MPI_Reduce costs 15526
> milliseconds.
>
> This confused me - in this Calc-PI program, MPI_Reduce is used only
> once - no matter what number of intervals is used, MPI_Reduce is
> called after both nodes got the result, to merge the result - just
> once. So the time cost by MPI_Reduce (all though it might be slow
> through TCP/IP connection) should be somewhat consistent. But
> obviously it's not what I saw.
>
> Had anyone have the similar problem before? I'm not sure how
> MPI_Reduce() work internally. Does the fact that I don't have network
> file system, network time protocol, resource management, scheduler,
> etc installed matters?
>
> Below is the program - I did feed "n" to it more than once to warm it up.
>
> #include "mpi.h"
> #include <stdio.h>
> #include <math.h>
>
> int main(int argc, char *argv[]) { int numprocs, myid, rc;
> double ACCUPI = 3.1415926535897932384626433832795;
> double mypi, pi, h, sum, x;
> int n, i;
> double starttime, endtime;
> double time,told,bcasttime,reducetime,comptime,totaltime;
>
> rc = MPI_Init(&argc,&argv);
> if (rc != MPI_SUCCESS) {
> printf("Error starting MPI program. Terminating.\n");
> MPI_Abort(MPI_COMM_WORLD, rc);
> }
> MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
> MPI_Comm_rank(MPI_COMM_WORLD,&myid);
>
> while (1) {
> if (myid == 0) {
> printf("Enter the number of intervals: (0 quits) \n");
> scanf("%d",&n);
> starttime = MPI_Wtime();
> }
>
> time = MPI_Wtime();
> MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD);
>
> told = time;
> time = MPI_Wtime();
> bcasttime = time - told;
>
> if (n == 0)
> break;
> else {
> h = 1.0/(double)n;
> sum = 0.0;
> for (i = myid + 1; i <= n; i += numprocs) {
> x = h*((double)i - 0.5);
> sum += (4.0/(1.0 + x*x));
> }
> mypi = sum*h;
>
> told = time;
> time = MPI_Wtime();
> comptime = time - told;
>
> MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0,
> MPI_COMM_WORLD);
>
> told = time;
> time = MPI_Wtime();
> reducetime = time - told;
>
> if (myid == 0) {
> totaltime = MPI_Wtime() - starttime;
> printf("\nElapsed time (total): %f
> milliseconds\n",totaltime*1000);
> printf("Elapsed time (Bcast): %f milliseconds
> (%5.2f%%)\n",bcasttime*1000,bcasttime*100/totaltime);
> printf("Elapsed time (Reduce): %f milliseconds
> (%5.2f%%)\n",reducetime*1000,reducetime*100/totaltime);
> printf("Elapsed time (Comput): %f milliseconds
> (%5.2f%%)\n",comptime*1000,comptime*100/totaltime);
> printf("\nApproximated pi is %.16f, Error is %.4e\n", pi,
> fabs(pi - ACCUPI));
> }
> }
> }
>
> MPI_Finalize(); }