Yes,  we find its best to let users benchmark their code (if they have it already)  Or a code that uses similar algorithms.  And then have the user run on some machines we set aside.

While we are on the benchmark topic,  Users might be interested, we just installed a new set of Opteron 2220se's,  We used HPL with GOTO blas and on 58 machines (232 cpus)  achieved 1.099 Tflop,  (85% of theory)  
On one node using 4 cpus (duel core duel socket)  I could only get 88% so for a machine that had __no tuning__ of the IB network or the sysctl,  We were very happy.  

Boy i love that compile one run on any network of Openmpi.

Info:

OS:  RHEL4
Compiler:  pgi/6.2
mpi:    openmpi/1.2.0
BLAS:  GOTO-1.15
Cisco Topspin infiniband using openIB provided by redhat.

Thanks for all the help list :-)

Brock Palen
Center for Advanced Computing
brockp@umich.edu
(734)936-1985


On Jun 11, 2007, at 9:06 AM, Jeff Pummill wrote:

Glad to contribute Victor!

I am running on a home workstation that uses an AMD 3800 cpu attached to 2 gigs of ram.
My timings for FT were 175 secs with one core and 110 on two cores with -O3 and -mtune=amd64 as tuning options.

Brock, Terry and Jeff are all exactly correct in their comments regarding benchmarks. There are simply too many variables to contend with. In addition, one and two core runs on a single workstation probably isn't the best evaluation of OpenMPI. As you expand to more devices and generate bigger problems (HPL or HPCC for example), a better overall picture will emerge.


Jeff F. Pummill
Senior Linux Cluster Administrator
University of Arkansas



victor marian wrote:
  Thank you everybody for the advices.
  I ran the NAS benchmark class B and it runs in 181
seconds on one core and in 90 seconds on two cores, so
it scales almost perfectly.
  What were your timings, Jeff, and what processor do
you exactly have?
  Mine is a Pentium D at 2.8GHz.

                                         Victor


--- Jeff Pummill <jpummil@uark.edu> wrote:

  
Victor,

Build the FT benchmark and build it as a class B
problem. This will run 
in the 1-2 minute range instead of 2-4 seconds the
CG class A benchmark 
does.


Jeff F. Pummill
Senior Linux Cluster Administrator
University of Arkansas



Terry Frankcombe wrote:
    
Hi Victor

I'd suggest 3 seconds of CPU time is far, far to
      
small a problem to do
    
scaling tests with.  Even with only 2 CPUs, I
      
wouldn't go below 100
    
times that.


On Mon, 2007-06-11 at 01:10 -0700, victor marian
      
wrote:
    
        
Hi Jeff

I ran the NAS Parallel Bechmark and it gives for
        
me
    
-bash%/export/home/vmarian/fortran/benchmarks/NPB3.2/NPB3.2-MPI/bin$
  
mpirun -np 1 cg.A.1

        
--------------------------------------------------------------------------
  
[0,1,0]: uDAPL on host SERVSOLARIS was unable to
        
find
    
any NICs.
Another transport will be used instead, although
        
this
    
may result in
lower performance.

        
--------------------------------------------------------------------------
  
 NAS Parallel Benchmarks 3.2 -- CG Benchmark

 Size:      14000
 Iterations:    15
 Number of active processes:     1
 Number of nonzeroes per row:       11
 Eigenvalue shift: .200E+02
 Benchmark completed
 VERIFICATION SUCCESSFUL
 Zeta is      0.171302350540E+02
 Error is     0.512264003323E-13


 CG Benchmark Completed.
 Class           =                        A
 Size            =                    14000
 Iterations      =                       15
 Time in seconds =                     3.02
 Total processes =                        1
 Compiled procs  =                        1
 Mop/s total     =                   495.93
 Mop/s/process   =                   495.93
 Operation type  =           floating point
 Verification    =               SUCCESSFUL
 Version         =                      3.2
 Compile date    =              11 Jun 2007



        
-bash%/export/home/vmarian/fortran/benchmarks/NPB3.2/NPB3.2-MPI/bin$
  
mpirun -np 2 cg.A.2

        
--------------------------------------------------------------------------
  
[0,1,0]: uDAPL on host SERVSOLARIS was unable to
        
find
    
any NICs.
Another transport will be used instead, although
        
this
    
may result in
lower performance.

        
--------------------------------------------------------------------------
  
--------------------------------------------------------------------------
  
[0,1,1]: uDAPL on host SERVSOLARIS was unable to
        
find
    
any NICs.
Another transport will be used instead, although
        
this
    
may result in
lower performance.

        
--------------------------------------------------------------------------
  
 NAS Parallel Benchmarks 3.2 -- CG Benchmark

 Size:      14000
 Iterations:    15
 Number of active processes:     2
 Number of nonzeroes per row:       11
 Eigenvalue shift: .200E+02

 Benchmark completed
 VERIFICATION SUCCESSFUL
 Zeta is      0.171302350540E+02
 Error is     0.522633719989E-13


 CG Benchmark Completed.
 Class           =                        A
 Size            =                    14000
 Iterations      =                       15
 Time in seconds =                     2.47
 Total processes =                        2
 Compiled procs  =                        2
 Mop/s total     =                   606.32
 Mop/s/process   =                   303.16
 Operation type  =           floating point
 Verification    =               SUCCESSFUL
 Version         =                      3.2
 Compile date    =              11 Jun 2007


    You can remark that the scalling is not so
        
good
    
like yours. Maibe I am having comunications
        
problems
    
between processors.
   You can also remark that I am faster on one
        
process
    
concared to your processor.

                                       Victor





--- Jeff Pummill <jpummil@uark.edu> wrote:

    
        
Perfect! Thanks Jeff!

The NAS Parallel Benchmark on a dual core AMD
machine now returns this...
[jpummil@localhost bin]$ mpirun -np 1 cg.A.1
NAS Parallel Benchmarks 3.2 -- CG Benchmark
CG Benchmark Completed.
 Class           =                        A
 Size            =                    14000
 Iterations      =                       15
 Time in seconds =                     4.75
 Total processes =                        1
 Compiled procs  =                        1
 Mop/s total     =                   315.32

...and...

[jpummil@localhost bin]$ mpirun -np 2 cg.A.2
NAS Parallel Benchmarks 3.2 -- CG Benchmark
 CG Benchmark Completed.
 Class           =                        A
 Size            =                    14000
 Iterations      =                       15
 Time in seconds =                     2.48
 Total processes =                        2
 Compiled procs  =                        2
 Mop/s total     =                   604.46

Not quite linear, but one must account for all
          
of
    
the OS traffic that 
one core or the other must deal with.


Jeff F. Pummill
Senior Linux Cluster Administrator
University of Arkansas
Fayetteville, Arkansas 72701
(479) 575 - 4590
http://hpc.uark.edu

"A supercomputer is a device for turning
compute-bound
problems into I/O-bound problems." -Seymour Cray


Jeff Squyres wrote:
      
          
Just remove the -L and -l arguments -- OMPI's
        
            
"mpif90" (and other  
      
          
wrapper compilers) will do all that magic for
            
you.

    
=== message truncated ===>
_______________________________________________
  
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
    
       
____________________________________________________________________________________
Be a better Globetrotter. Get better travel answers from someone who knows. Yahoo! Answers - Check it out.
http://answers.yahoo.com/dir/?link=list&sid=396545469
_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
  
_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users