Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Lower performance on a Gigabit node compared toinfiniband node
From: Igor Kozin (i.n.kozin_at_[hidden])
Date: 2009-03-10 07:16:38


Hi Sangamesh,
As far as I can tell there should be no difference if you run CPMD on a
single node whether with or without ib. One easy thing that you could do is
to repeat your runs on the infiniband node(s) with and without infiniband
using --mca btl ^tcp and --mca btl ^openib respectively. But since you are
using a single node I doubt it will make any difference.

I agree with Jeff that there are many factors you need to be sure of. Please
note that not only your elapsed times but also your CPU times are different.
Furthermore the difference in communication times as indicated in your CPMD
outputs can not be the only reason for the difference in the elapsed times.
CPMD, MKL, and compiler versions, memory bandwidth, i/o and rogue processes
running on a node could be the differentiating factors.

The standard wat32 benchmark is a good test for a single node. You can find
our benchmarking results here if you want to compare yours
http://www.cse.scitech.ac.uk/disco/dbd/index.html

Regards,

INK

2009/3/10 Sangamesh B <forum.san_at_[hidden]>

> Hello Ralph & Jeff,
>
> This is the same issue - but this time the job is running on a single
> node.
>
> The two systems on which the jobs are run, have the same hardware/OS
> configuration. The only differences are:
>
> One node has 4 GB RAM and it is part of infiniband connected nodes.
>
> The other node has 8 GB RAM and it is part of gigabit connected nodes.
>
> For both jobs only 4 processes are used.
>
> All the processes are run on a single node.
>
> But why the GB node is taking more time than IB node?
>
> {ELAPSED TIME = WALL CLOCK TIME}
>
> Hope you are now clear with the problem.
>
> Thanks,
> Sangamesh
> On Mon, Mar 9, 2009 at 10:56 AM, Jeff Squyres <jsquyres_at_[hidden]> wrote:
> > It depends on the characteristics of the nodes in question. You mention
> the
> > CPU speeds and the RAM, but there are other factors as well: cache size,
> > memory architecture, how many MPI processes you're running, etc. Memory
> > access patterns, particularly across UMA machines like clovertown and
> > follow-in intel architectures can really get bogged down by the RAM
> > bottlneck (all 8 cores hammering on memory simultaneously via a single
> > memory bus).
> >
> >
> >
> > On Mar 9, 2009, at 10:30 AM, Sangamesh B wrote:
> >
> >> Dear Open MPI team,
> >>
> >> With Open MPI-1.3, the fortran application CPMD is installed on
> >> Rocks-4.3 cluster - Dual Processor Quad core Xeon @ 3 GHz. (8 cores
> >> per node)
> >>
> >> Two jobs (4 processes job) are run on two nodes, separately - one node
> >> has a ib connection ( 4 GB RAM) and the other node has gigabit
> >> connection (8 GB RAM).
> >>
> >> Note that, the network-connectivity may not be or not required to be
> >> used as the two jobs are running in stand alone mode.
> >>
> >> Since the jobs are running on single node - no intercommunication
> >> between nodes - so the performance of both the jobs should be same
> >> irrespective of network connectivity. But here this is not the case.
> >> The gigabit job is taking double the time of infiniband job.
> >>
> >> Following are the details of two jobs:
> >>
> >> Infiniband Job:
> >>
> >> CPU TIME : 0 HOURS 10 MINUTES 21.71 SECONDS
> >> ELAPSED TIME : 0 HOURS 10 MINUTES 23.08 SECONDS
> >> *** CPMD| SIZE OF THE PROGRAM IS 301192/ 571044 kBYTES ***
> >>
> >> Gigabit Job:
> >>
> >> CPU TIME : 0 HOURS 12 MINUTES 7.93 SECONDS
> >> ELAPSED TIME : 0 HOURS 21 MINUTES 0.07 SECONDS
> >> *** CPMD| SIZE OF THE PROGRAM IS 123420/ 384344 kBYTES ***
> >>
> >> More details are attached here in a file.
> >>
> >> Why there is a long difference between CPU TIME and ELAPSED TIME for
> >> Gigabit job?
> >>
> >> This could be an issue with Open MPI itself. What could be the reason?
> >>
> >> Is there any flags need to be set?
> >>
> >> Thanks in advance,
> >> Sangamesh
> >>
> >> <cpmd_gb_ib_1node><ATT3915213.txt>
> >
> >
> > --
> > Jeff Squyres
> > Cisco Systems
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>