Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] 1.7rc8 is posted
From: Pavel Mezentsev (pavel.mezentsev_at_[hidden])
Date: 2013-02-27 19:36:40


I've tried the new rc. Here is what I got:

1) I've successfully built it with intel-13.1 and gcc-4.7.2. But I've
failed while using open64-4.5.2 and ekopath-5.0.0 (pathscale). The problems
are in the fortran part. In each case I've used the following configuration
line:
CC=$CC CXX=$CXX F77=$F77 FC=$FC ./configure --prefix=$prefix
--with-knem=$knem_path
Open64 failed during configuration with the following:
*** Fortran compiler
checking whether we are using the GNU Fortran compiler... yes
checking whether openf95 accepts -g... yes
configure: WARNING: Open MPI now ignores the F77 and FFLAGS environment
variables; only the FC and FCFLAGS environment variables are used.
checking whether ln -s works... yes
checking if Fortran compiler works... yes
checking for extra arguments to build a shared library... none needed
checking for Fortran flag to compile .f files... none
checking for Fortran flag to compile .f90 files... none
checking to see if Fortran compilers need additional linker flags... none
checking external symbol convention... double underscore
checking if C and Fortran are link compatible... yes
checking to see if Fortran compiler likes the C++ exception flags...
skipped (no C++ exceptions flags)
checking to see if mpifort compiler needs additional linker flags... none
checking if Fortran compiler supports CHARACTER... yes
checking size of Fortran CHARACTER... 1
checking for C type corresponding to CHARACTER... char
checking alignment of Fortran CHARACTER... 1
checking for corresponding KIND value of CHARACTER... C_SIGNED_CHAR
checking KIND value of Fortran C_SIGNED_CHAR... no ISO_C_BINDING -- fallback
checking Fortran value of selected_int_kind(4)... no
configure: WARNING: Could not determine KIND value of C_SIGNED_CHAR
configure: WARNING: See config.log for more details
configure: error: Cannot continue

Ekopath failed during make with the following error:
 PPFC mpi-f08-sizeof.lo
  PPFC mpi-f08.lo
In file included from mpi-f08.F90:37:
mpi-f-interfaces-bind.h:1908: warning: extra tokens at end of #endif
directive
mpi-f-interfaces-bind.h:2957: warning: extra tokens at end of #endif
directive
In file included from mpi-f08.F90:38:
pmpi-f-interfaces-bind.h:1911: warning: extra tokens at end of #endif
directive
pmpi-f-interfaces-bind.h:2963: warning: extra tokens at end of #endif
directive
pathf95-1044 pathf95: INTERNAL OMPI_OP_CREATE_F, File =
mpi-f-interfaces-bind.h, Line = 955, Column = 29
  Internal : Unexpected ATP_PGM_UNIT in check_interoperable_pgm_unit()
make[2]: *** [mpi-f08.lo] Error 1
make[2]: Leaving directory
`/tmp/mpi_install_tmp1400/openmpi-1.7rc8/ompi/mpi/fortran/use-mpi-f08'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/tmp/mpi_install_tmp1400/openmpi-1.7rc8/ompi'
make: *** [all-recursive] Error 1

It seems to be different from the error I got last time with rc7. And again
I'm not a fortran guy to understand this error. I've used the following
version of the compiler:
http://c591116.r16.cf2.rackcdn.com/ekopath/nightly/Linux/ekopath-2013-02-26-installer.run

2) I've ran a couple of tests (IMB) with the new version. I ran this on a
system consisting of 10 nodes with Intel SB processor and fdr ConnectX3
infiniband adapters.
First I've tried the following parameters:
mpirun -np $NP -hostfile hosts --mca btl
openib,sm,self --bind-to-core -npernode 16 --mca mpi_leave_pinned
1 ./IMB-MPI1 -npmin $NP -mem 4G $COLL
This combination complained about mca_leave_pinned. The same line works for
1.6.3. Is something different in the new release and I've missed it?
--------------------------------------------------------------------------
A process attempted to use the "leave pinned" MPI feature, but no
memory registration hooks were found on the system at run time. This
may be the result of running on a system that does not support memory
hooks or having some other software subvert Open MPI's use of the
memory hooks. You can disable Open MPI's use of memory hooks by
setting both the mpi_leave_pinned and mpi_leave_pinned_pipeline MCA
parameters to 0.

Open MPI will disable any transports that are attempting to use the
leave pinned functionality; your job may still run, but may fall back
to a slower network transport (such as TCP).

  Mpool name: grdma
  Process: [[13305,1],1]
  Local host: b23
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: There is at least one OpenFabrics device found but there are
no active ports detected (or Open MPI was unable to use them). This
is most certainly not what you wanted. Check your cables, subnet
manager configuration, etc. The openib BTL will be ignored for this
job.

  Local host: b23
--------------------------------------------------------------------------
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications. This means that no Open MPI device has indicated
that it can be used to communicate between these processes. This is
an error; Open MPI requires that all MPI processes be able to reach
each other. This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[13305,1],0]) is on host: b22
  Process 2 ([[13305,1],1]) is on host: b23
  BTLs attempted: self sm

Your MPI job is now going to abort; sorry.
...

Then I ran a couple of P2P and collective tests. In general the performance
improved compared to 1.6.3. But there are several cases where it got worse.
Perhaps I need to use some tuning, could you please tell me what parameters
would suite me better then the default.
Here is what I got for PingPong and PingPing in 1.7rc8 (the above
parameters changed to have "-npernode 1"):
#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
#---------------------------------------------------
       #bytes #repetitions t[usec] Mbytes/sec
            0 1000 1.39 0.00
            1 1000 1.50 0.64
            2 1000 1.10 1.73
            4 1000 1.10 3.46
            8 1000 1.12 6.80
           16 1000 1.12 13.62
           32 1000 1.14 26.75
           64 1000 1.18 51.92
          128 1000 1.73 70.42
          256 1000 1.85 132.04
          512 1000 1.98 247.16
         1024 1000 2.26 431.52
         2048 1000 2.85 684.58
         4096 1000 3.49 1118.63
         8192 1000 4.48 1741.96
        16384 1000 9.58 1630.92
        32768 1000 14.27 2189.46
        65536 640 23.03 2713.71
       131072 320 35.55 3515.73
       262144 160 57.65 4336.77
       524288 80 101.42 4930.05
      1048576 40 188.00 5319.18
      2097152 20 521.70 3833.61
      4194304 10 1118.20 3577.19

#---------------------------------------------------
# Benchmarking PingPing
# #processes = 2
#---------------------------------------------------
       #bytes #repetitions t[usec] Mbytes/sec
            0 1000 1.26 0.00
            1 1000 1.32 0.72
            2 1000 1.32 1.44
            4 1000 1.35 2.84
            8 1000 1.38 5.53
           16 1000 1.13 13.51
           32 1000 1.13 26.96
           64 1000 1.17 51.95
          128 1000 1.72 70.96
          256 1000 1.80 135.63
          512 1000 1.94 251.17
         1024 1000 2.23 437.51
         2048 1000 2.88 677.47
         4096 1000 3.49 1119.28
         8192 1000 4.75 1643.41
        16384 1000 9.90 1578.12
        32768 1000 14.54 2149.25
        65536 640 24.04 2599.79
       131072 320 37.00 3378.35
       262144 160 60.25 4149.39
       524288 80 105.74 4728.77
      1048576 40 196.73 5083.23
      2097152 20 785.79 2545.20
      4194304 10 1790.19 2234.40

And 1.6.3 gave the following:
#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
#---------------------------------------------------
       #bytes #repetitions t[usec] Mbytes/sec
            0 1000 1.06 0.00
            1 1000 0.94 1.01
            2 1000 0.95 2.02
            4 1000 0.95 4.01
            8 1000 0.97 7.90
           16 1000 0.98 15.63
           32 1000 0.99 30.86
           64 1000 1.02 59.60
          128 1000 1.58 77.23
          256 1000 1.71 142.73
          512 1000 1.86 263.15
         1024 1000 2.13 459.35
         2048 1000 2.72 718.31
         4096 1000 3.27 1194.74
         8192 1000 4.33 1802.57
        16384 1000 6.20 2521.78
        32768 1000 8.84 3535.46
        65536 640 14.28 4376.82
       131072 320 24.97 5005.06
       262144 160 44.94 5562.46
       524288 80 86.76 5763.29
      1048576 40 168.73 5926.77
      2097152 20 333.65 5994.32
      4194304 10 666.09 6005.16

#---------------------------------------------------
# Benchmarking PingPing
# #processes = 2
#---------------------------------------------------
       #bytes #repetitions t[usec] Mbytes/sec
            0 1000 0.93 0.00
            1 1000 0.97 0.98
            2 1000 0.97 1.97
            4 1000 0.97 3.94
            8 1000 0.99 7.70
           16 1000 0.99 15.34
           32 1000 1.01 30.21
           64 1000 1.05 58.13
          128 1000 1.61 75.82
          256 1000 1.73 141.20
          512 1000 1.88 259.87
         1024 1000 2.17 450.21
         2048 1000 2.83 691.13
         4096 1000 3.45 1131.26
         8192 1000 4.76 1639.88
        16384 1000 7.76 2014.01
        32768 1000 10.34 3021.35
        65536 640 16.29 3836.55
       131072 320 26.72 4678.40
       262144 160 48.83 5120.31
       524288 80 91.85 5443.61
      1048576 40 178.65 5597.63
      2097152 20 351.31 5692.98
      4194304 10 701.69 5700.53

The sendrecv and exchange also got worse. I can send additional data if
needed.

The performance on collectives generally has slightly improved comparing to
1.6.3 or remained the same. But in certain cases I got much better results
with tuned_collectives. In particular those suited my system better:
--mca coll_tuned_barrier_algorithm 6 (default and tuned):
#---------------------------------------------------
# Benchmarking Barrier
# #processes = 160
#---------------------------------------------------
 #repetitions t_min[usec] t_max[usec] t_avg[usec]
         1000 49.75 49.77 49.76
#---------------------------------------------------
# Benchmarking Barrier
# #processes = 160
#---------------------------------------------------
 #repetitions t_min[usec] t_max[usec] t_avg[usec]
         1000 12.74 12.74 12.74

Bcast for small messages
--mca coll_tuned_bcast_algorithm 3 (default and tuned):
#----------------------------------------------------------------
# Benchmarking Bcast
# #processes = 160
#----------------------------------------------------------------
       #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]
            0 1000 0.01 0.02 0.02
            1 1000 9.87 9.96 9.92
            2 1000 10.44 10.51 10.47
            4 1000 10.30 10.37 10.34
            8 1000 10.34 10.43 10.38
           16 1000 10.39 10.48 10.43
           32 1000 10.36 10.43 10.40
           64 1000 10.38 10.44 10.41
          128 1000 10.11 10.22 10.17
          256 1000 11.37 11.54 11.48
          512 1000 14.09 14.25 14.19
         1024 1000 18.77 19.03 18.94
         2048 1000 13.47 13.63 13.58
         4096 1000 25.39 25.60 25.55
         8192 1000 50.80 51.11 51.04
        16384 1000 102.64 103.53 103.38
        32768 1000 280.86 281.80 281.62
        65536 640 387.10 391.90 391.26
       131072 320 779.58 796.04 794.30
       262144 160 1526.52 1597.39 1590.31
       524288 80 355.67 379.06 375.27
      1048576 40 702.95 753.65 736.29
      2097152 20 1518.11 1580.85 1551.57
      4194304 10 3183.22 3931.81 3676.94

#----------------------------------------------------------------
# Benchmarking Bcast
# #processes = 160
#----------------------------------------------------------------
       #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]
            0 1000 0.01 0.02 0.02
            1 1000 4.54 5.13 4.85
            2 1000 4.50 5.11 4.81
            4 1000 4.50 5.09 4.80
            8 1000 4.48 5.09 4.79
           16 1000 4.49 5.09 4.79
           32 1000 4.55 5.15 4.86
           64 1000 4.52 5.14 4.83
          128 1000 4.66 5.28 4.98
          256 1000 4.78 5.40 5.09
          512 1000 4.89 5.52 5.21
         1024 1000 5.15 5.81 5.48
         2048 1000 5.60 6.30 5.94
         4096 1000 8.25 8.67 8.46
         8192 1000 10.49 11.01 10.76
        16384 1000 20.05 20.87 20.50
        32768 1000 30.11 31.41 30.80
        65536 640 46.08 48.94 47.54
       131072 320 75.53 84.98 80.26
       262144 160 134.26 169.44 151.92
       524288 80 240.34 372.76 307.80
      1048576 40 427.00 951.02 699.41
      2097152 20 933.41 3170.45 2076.21
      4194304 10 2682.40 16020.39 9718.86

and AllGatherv:
--mca coll_tuned_allgatherv_algorithm 5 (default and tuned):
#----------------------------------------------------------------
# Benchmarking Allgatherv
# #processes = 160
#----------------------------------------------------------------
       #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]
            0 1000 0.06 0.07 0.06
            1 1000 54.11 54.15 54.13
            2 1000 52.74 52.78 52.76
            4 1000 55.09 55.13 55.11
            8 1000 58.48 58.52 58.50
           16 1000 61.99 62.03 62.01
           32 1000 69.31 69.35 69.32
           64 1000 88.13 88.18 88.16
          128 1000 126.62 126.71 126.68
          256 1000 215.26 215.34 215.31
          512 1000 832.54 833.01 832.57
         1024 1000 928.81 929.31 928.86
         2048 1000 1072.77 1073.35 1072.85
         4096 1000 1222.82 1223.42 1222.90
         8192 1000 1713.46 1714.13 1713.87
        16384 1000 2596.87 2598.31 2597.40
        32768 1000 4153.70 4154.09 4153.92
        65536 640 6795.04 6796.32 6795.83
       131072 320 12076.74 12083.04 12080.28
       262144 160 23120.98 23153.76 23138.10
       524288 80 49077.99 49204.79 49142.48
      1048576 40 132120.25 132675.60 132400.38
      2097152 20 240537.20 241821.05 241138.53
      4194304 10 457125.71 459065.10 458035.03

#----------------------------------------------------------------
# Benchmarking Allgatherv
# #processes = 160
#----------------------------------------------------------------
       #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]
            0 1000 0.06 0.07 0.06
            1 1000 0.47 0.56 0.52
            2 1000 0.47 0.57 0.51
            4 1000 0.48 0.56 0.52
            8 1000 0.46 0.56 0.51
           16 1000 0.47 0.57 0.52
           32 1000 0.47 0.56 0.52
           64 1000 0.47 0.57 0.52
          128 1000 0.50 0.62 0.57
          256 1000 0.58 0.68 0.63
          512 1000 0.62 0.81 0.70
         1024 1000 0.71 0.97 0.80
         2048 1000 0.89 1.24 1.05
         4096 1000 2.21 2.58 2.40
         8192 1000 3.08 3.55 3.30
        16384 1000 4.77 5.56 5.11
        32768 1000 7.99 9.75 8.90
        65536 640 15.81 19.35 17.69
       131072 320 34.18 39.74 36.95
       262144 160 71.72 80.37 76.06
       524288 80 143.64 161.81 152.36
      1048576 40 781.10 868.80 825.57
      2097152 20 2594.30 2795.45 2672.58
      4194304 10 5185.79 5451.20 5298.98

This time I only ran the test on 160 processes but before I've done more
testing with 1.6 on different number of processes (from 16 to 320) and
those tuned parameters helped almost each time. I don't know what are
default parameters tuned for but perhaps it may be a good idea to change
the defaults for the kind of system I use.

I can perform some additional tests if necessary or give more information
on the problems that I've came across.

Regards, Pavel Mezentsev.

2013/2/27 Jeff Squyres (jsquyres) <jsquyres_at_[hidden]>

> The goal is to release 1.7 (final) by the end of this week. New rc posted
> with fairly small changes:
>
> http://www.open-mpi.org/software/ompi/v1.7/
>
> - Fix wrong header file / compilation error in bcol
> - Support MXM STREAM for isend and irecv
> - Make sure "mpirun <dirname>" fails with $status!=0
> - Bunches of cygwin minor fixes
> - Make sure the fortran compiler supports BIND(C) with LOGICAL for the F08
> bindings
> - Fix --disable-mpi-io with the F08 bindings
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>