Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Lower performance on a Gigabit node compared toinfiniband node
From: Sangamesh B (forum.san_at_[hidden])
Date: 2009-03-12 02:55:04


Hello INK,

   I've run couple of jobs with different mpirun options.

CRITERIA 1:

On one of the nodes - connected to infiniband network:

Job No 1:

mpirun command: /opt/mpi/openmpi/1.3/intel/bin/mpirun --mca btl
^openib -np $NSLOTS -hostfile $TMPDIR/machines
/opt/apps/cpmd/3.11/ompi-atl
as/SOURCE/cpmd311-ompi-atlas.x job.in $PP_LIBRARY > job_nn_out_omp_$JOB_ID

       CPU TIME : 0 HOURS 10 MINUTES 11.58 SECONDS
   ELAPSED TIME : 0 HOURS 10 MINUTES 30.51 SECONDS
 *** CPMD| SIZE OF THE PROGRAM IS 123384/ 384344 kBYTES ***

 PROGRAM CPMD ENDED AT: Wed Mar 11 12:38:48 2009

 ================================================================
 = COMMUNICATION TASK AVERAGE MESSAGE LENGTH NUMBER OF CALLS =
 = SEND/RECEIVE 116817. BYTES 891. =
 = BROADCAST 123195. BYTES 284. =
 = GLOBAL SUMMATION 32926. BYTES 404. =
 = GLOBAL MULTIPLICATION 0. BYTES 1. =
 = ALL TO ALL COMM 2799401. BYTES 1226. =
 = PERFORMANCE TOTAL TIME =
 = SEND/RECEIVE 1040.965 MB/S 0.100 SEC =
 = BROADCAST 388.748 MB/S 0.090 SEC =
 = GLOBAL SUMMATION 0.924 MB/S 28.780 SEC =
 = GLOBAL MULTIPLICATION 0.000 MB/S 0.001 SEC =
 = ALL TO ALL COMM 121.233 MB/S 28.310 SEC =
 = SYNCHRONISATION 0.010 SEC =
 ================================================================

Job No 2:

/opt/mpi/openmpi/1.3/intel/bin/mpirun --mca btl ^tcp -np $NSLOTS
-hostfile $TMPDIR/machines /opt/apps/cpmd/3.11/ompi-atlas/
SOURCE/cpmd311-ompi-atlas.x job.in $PP_LIBRARY > job_nn_omp_tcp$JOB_ID

       CPU TIME : 0 HOURS 10 MINUTES 42.46 SECONDS
   ELAPSED TIME : 0 HOURS 10 MINUTES 43.76 SECONDS
 *** CPMD| SIZE OF THE PROGRAM IS 300480/ 567860 kBYTES ***

 PROGRAM CPMD ENDED AT: Wed Mar 11 12:43:06 2009

 ================================================================
 = COMMUNICATION TASK AVERAGE MESSAGE LENGTH NUMBER OF CALLS =
 = SEND/RECEIVE 116817. BYTES 891. =
 = BROADCAST 123195. BYTES 284. =
 = GLOBAL SUMMATION 32926. BYTES 404. =
 = GLOBAL MULTIPLICATION 0. BYTES 1. =
 = ALL TO ALL COMM 2799401. BYTES 1226. =
 = PERFORMANCE TOTAL TIME =
 = SEND/RECEIVE 1487.163 MB/S 0.070 SEC =
 = BROADCAST 388.751 MB/S 0.090 SEC =
 = GLOBAL SUMMATION 1.899 MB/S 14.010 SEC =
 = GLOBAL MULTIPLICATION 0.000 MB/S 0.001 SEC =
 = ALL TO ALL COMM 264.404 MB/S 12.980 SEC =
 = SYNCHRONISATION 0.001 SEC =
 ================================================================

Job No 3:

/opt/mpi/openmpi/1.3/intel/bin/mpirun -np $NSLOTS -hostfile
$TMPDIR/machines /opt/apps/cpmd/3.11/ompi-atlas/SOURCE/cpmd311-
ompi-atlas.x job.in $PP_LIBRARY > job_nn_out_omp_$JOB_ID

       CPU TIME : 0 HOURS 9 MINUTES 31.99 SECONDS
   ELAPSED TIME : 0 HOURS 9 MINUTES 33.37 SECONDS
 *** CPMD| SIZE OF THE PROGRAM IS 301192/ 571044 kBYTES ***

 PROGRAM CPMD ENDED AT: Wed Mar 11 20:25:12 2009

 ================================================================
 = COMMUNICATION TASK AVERAGE MESSAGE LENGTH NUMBER OF CALLS =
 = SEND/RECEIVE 116817. BYTES 891. =
 = BROADCAST 123195. BYTES 284. =
 = GLOBAL SUMMATION 32926. BYTES 404. =
 = GLOBAL MULTIPLICATION 0. BYTES 1. =
 = ALL TO ALL COMM 2799401. BYTES 1226. =
 = PERFORMANCE TOTAL TIME =
 = SEND/RECEIVE 2600.799 MB/S 0.040 SEC =
 = BROADCAST 349.872 MB/S 0.100 SEC =
 = GLOBAL SUMMATION 3.811 MB/S 6.980 SEC =
 = GLOBAL MULTIPLICATION 0.000 MB/S 0.001 SEC =
 = ALL TO ALL COMM 286.729 MB/S 11.970 SEC =
 = SYNCHRONISATION 0.010 SEC =
 ================================================================

CRITERIA 2:

On one of the nodes connected to Gigabit network:

Job No 1:

/opt/mpi/openmpi/1.3/intel/bin/mpirun -np $NSLOTS -hostfile
$TMPDIR/machines /opt/apps/cpmd/3.11/ompi-atlas/SOURCE/cpmd311-
ompi-atlas.x job.in $PP_LIBRARY > job_nn_GB_out_omp_$JOB_ID

       CPU TIME : 0 HOURS 5 MINUTES 57.45 SECONDS
   ELAPSED TIME : 0 HOURS 6 MINUTES 10.21 SECONDS
 *** CPMD| SIZE OF THE PROGRAM IS 123392/ 384344 kBYTES ***

 PROGRAM CPMD ENDED AT: Wed Mar 11 20:07:52 2009

 ================================================================
 = COMMUNICATION TASK AVERAGE MESSAGE LENGTH NUMBER OF CALLS =
 = SEND/RECEIVE 116817. BYTES 891. =
 = BROADCAST 123195. BYTES 284. =
 = GLOBAL SUMMATION 32926. BYTES 404. =
 = GLOBAL MULTIPLICATION 0. BYTES 1. =
 = ALL TO ALL COMM 2799401. BYTES 1226. =
 = PERFORMANCE TOTAL TIME =
 = SEND/RECEIVE 2081.711 MB/S 0.050 SEC =
 = BROADCAST 583.121 MB/S 0.060 SEC =
 = GLOBAL SUMMATION 3.514 MB/S 7.570 SEC =
 = GLOBAL MULTIPLICATION 0.000 MB/S 0.001 SEC =
 = ALL TO ALL COMM 438.891 MB/S 7.820 SEC =
 = SYNCHRONISATION 0.010 SEC =
 ================================================================

Job No 2:

/opt/mpi/openmpi/1.3/intel/bin/mpirun --mca btl sm,self,tcp -np
$NSLOTS -hostfile $TMPDIR/machines /opt/apps/cpmd/3.11/ompi-
atlas/SOURCE/cpmd311-ompi-atlas.x job.in $PP_LIBRARY >
job_nn_GB_out_omp_$JOB_ID

       CPU TIME : 0 HOURS 6 MINUTES 37.24 SECONDS
   ELAPSED TIME : 0 HOURS 6 MINUTES 49.97 SECONDS
 *** CPMD| SIZE OF THE PROGRAM IS 123416/ 384344 kBYTES ***

 PROGRAM CPMD ENDED AT: Wed Mar 11 20:09:32 2009

 ================================================================
 = COMMUNICATION TASK AVERAGE MESSAGE LENGTH NUMBER OF CALLS =
 = SEND/RECEIVE 116817. BYTES 891. =
 = BROADCAST 123195. BYTES 284. =
 = GLOBAL SUMMATION 32926. BYTES 404. =
 = GLOBAL MULTIPLICATION 0. BYTES 1. =
 = ALL TO ALL COMM 2799401. BYTES 1226. =
 = PERFORMANCE TOTAL TIME =
 = SEND/RECEIVE 2080.441 MB/S 0.050 SEC =
 = BROADCAST 583.130 MB/S 0.060 SEC =
 = GLOBAL SUMMATION 2.043 MB/S 13.020 SEC =
 = GLOBAL MULTIPLICATION 0.000 MB/S 0.001 SEC =
 = ALL TO ALL COMM 338.792 MB/S 10.130 SEC =
 = SYNCHRONISATION 0.001 SEC =
 ================================================================

Observations:

For all jobs 4 processes are used and have run on a single node.

This time gigabit jobs are performing far better than infiniband jobs.
i.e gigabit jobs have taken 6 minutes and infiniband jobs 10 minutes
approximately.

What factors may be causing this change?

During these jobs execution, there were no jobs running on gigabit
network - nodes were completely free. But the infiniband nodes were
almost filled up with other jobs. Is this causing the lower
performance of ib jobs?

Note that, all jobs have submitted through gridengine from master
node. In this case, eventhough 4 processes are running on a single
node will there be a communication/link between master node and
execution node?

Thanks,
Sangamesh

On Tue, Mar 10, 2009 at 4:46 PM, Igor Kozin <i.n.kozin_at_[hidden]> wrote:
> Hi Sangamesh,
> As far as I can tell there should be no difference if you run CPMD on a
> single node whether with or without ib. One easy thing that you could do is
> to repeat your runs on the infiniband node(s) with and without infiniband
> using --mca btl ^tcp and --mca btl ^openib respectively. But since you are
> using a single node I doubt it will make any difference.
>
> I agree with Jeff that there are many factors you need to be sure of. Please
> note that not only your elapsed times but also your CPU times are different.
> Furthermore the difference in communication times as indicated in your CPMD
> outputs can not be the only reason for the difference in the elapsed times.
> CPMD, MKL, and compiler versions, memory bandwidth, i/o and rogue processes
> running on a node could be the differentiating factors.
>
> The standard wat32 benchmark is a good test for a single node. You can find
> our benchmarking results here if you want to compare yours
> http://www.cse.scitech.ac.uk/disco/dbd/index.html
>
> Regards,
>
> INK
>
> 2009/3/10 Sangamesh B <forum.san_at_[hidden]>
>>
>> Hello Ralph & Jeff,
>>
>>    This is the same issue - but this time the job is running on a single
>> node.
>>
>> The two systems on which the jobs are run, have the same hardware/OS
>> configuration. The only differences are:
>>
>> One node has 4 GB RAM and it is part of infiniband connected nodes.
>>
>> The other node has 8 GB RAM and it is part of gigabit connected nodes.
>>
>> For both jobs only 4 processes are used.
>>
>> All the processes are run on a single node.
>>
>> But why the GB node is taking more time than IB node?
>>
>> {ELAPSED TIME = WALL CLOCK TIME}
>>
>> Hope you are now clear with the problem.
>>
>> Thanks,
>> Sangamesh
>> On Mon, Mar 9, 2009 at 10:56 AM, Jeff Squyres <jsquyres_at_[hidden]> wrote:
>> > It depends on the characteristics of the nodes in question.  You mention
>> > the
>> > CPU speeds and the RAM, but there are other factors as well: cache size,
>> > memory architecture, how many MPI processes you're running, etc.  Memory
>> > access patterns, particularly across UMA machines like clovertown and
>> > follow-in intel architectures can really get bogged down by the RAM
>> > bottlneck (all 8 cores hammering on memory simultaneously via a single
>> > memory bus).
>> >
>> >
>> >
>> > On Mar 9, 2009, at 10:30 AM, Sangamesh B wrote:
>> >
>> >> Dear Open MPI team,
>> >>
>> >>      With Open MPI-1.3, the fortran application CPMD is installed on
>> >> Rocks-4.3 cluster - Dual Processor Quad core Xeon @ 3 GHz. (8 cores
>> >> per node)
>> >>
>> >> Two jobs (4 processes job) are run on two nodes, separately - one node
>> >> has a ib connection ( 4 GB RAM)  and the other node has gigabit
>> >> connection (8 GB RAM).
>> >>
>> >> Note that, the network-connectivity may not be or not required to be
>> >> used as the two jobs are running in stand alone mode.
>> >>
>> >> Since the jobs are running on single node - no intercommunication
>> >> between nodes - so the performance of both the jobs should be same
>> >> irrespective of network connectivity. But here this is not the case.
>> >> The gigabit job is taking double the time of infiniband job.
>> >>
>> >> Following are the details of two jobs:
>> >>
>> >> Infiniband Job:
>> >>
>> >>      CPU TIME :    0 HOURS 10 MINUTES 21.71 SECONDS
>> >>   ELAPSED TIME :    0 HOURS 10 MINUTES 23.08 SECONDS
>> >>  ***      CPMD| SIZE OF THE PROGRAM IS  301192/ 571044 kBYTES ***
>> >>
>> >> Gigabit Job:
>> >>
>> >>       CPU TIME :    0 HOURS 12 MINUTES  7.93 SECONDS
>> >>   ELAPSED TIME :    0 HOURS 21 MINUTES  0.07 SECONDS
>> >>  ***      CPMD| SIZE OF THE PROGRAM IS  123420/ 384344 kBYTES ***
>> >>
>> >> More details are attached here in a file.
>> >>
>> >> Why there is a long difference between CPU TIME and ELAPSED TIME for
>> >> Gigabit job?
>> >>
>> >> This could be an issue with Open MPI itself. What could be the reason?
>> >>
>> >> Is there any flags need to be set?
>> >>
>> >> Thanks in advance,
>> >> Sangamesh
>> >>
>> >> <cpmd_gb_ib_1node><ATT3915213.txt>
>> >
>> >
>> > --
>> > Jeff Squyres
>> > Cisco Systems
>> >
>> > _______________________________________________
>> > users mailing list
>> > users_at_[hidden]
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>