It depends on the characteristics of the nodes in question. You
mention the CPU speeds and the RAM, but there are other factors as
well: cache size, memory architecture, how many MPI processes you're
running, etc. Memory access patterns, particularly across UMA
machines like clovertown and follow-in intel architectures can really
get bogged down by the RAM bottlneck (all 8 cores hammering on memory
simultaneously via a single memory bus).
On Mar 9, 2009, at 10:30 AM, Sangamesh B wrote:
> Dear Open MPI team,
> With Open MPI-1.3, the fortran application CPMD is installed on
> Rocks-4.3 cluster - Dual Processor Quad core Xeon @ 3 GHz. (8 cores
> per node)
> Two jobs (4 processes job) are run on two nodes, separately - one node
> has a ib connection ( 4 GB RAM) and the other node has gigabit
> connection (8 GB RAM).
> Note that, the network-connectivity may not be or not required to be
> used as the two jobs are running in stand alone mode.
> Since the jobs are running on single node - no intercommunication
> between nodes - so the performance of both the jobs should be same
> irrespective of network connectivity. But here this is not the case.
> The gigabit job is taking double the time of infiniband job.
> Following are the details of two jobs:
> Infiniband Job:
> CPU TIME : 0 HOURS 10 MINUTES 21.71 SECONDS
> ELAPSED TIME : 0 HOURS 10 MINUTES 23.08 SECONDS
> *** CPMD| SIZE OF THE PROGRAM IS 301192/ 571044 kBYTES ***
> Gigabit Job:
> CPU TIME : 0 HOURS 12 MINUTES 7.93 SECONDS
> ELAPSED TIME : 0 HOURS 21 MINUTES 0.07 SECONDS
> *** CPMD| SIZE OF THE PROGRAM IS 123420/ 384344 kBYTES ***
> More details are attached here in a file.
> Why there is a long difference between CPU TIME and ELAPSED TIME for
> Gigabit job?
> This could be an issue with Open MPI itself. What could be the reason?
> Is there any flags need to be set?
> Thanks in advance,