Try manually specifying the collective component "-mca coll tuned"
You seem to be using the "sync" collective component, any stale mca param files lying around ?

--Nysal

On Tue, Jan 11, 2011 at 6:28 PM, Doron Shoham <doron.ompi@gmail.com> wrote:
Hi
  
All machines on the setup are IDataPlex with Nehalem 12 cores per node, 24GB  memory.

 

·         Problem 1 – OMPI 1.4.3 hangs in gather:

 

I’m trying to run IMB and gather operation with OMPI 1.4.3 (Vanilla).

It happens when np >= 64 and message size exceed 4k:

mpirun -np 64 -machinefile voltairenodes -mca btl sm,self,openib  imb/src-1.4.2/IMB-MPI1 gather –npmin 64

 

voltairenodes consists of 64 machines.

 

#----------------------------------------------------------------

# Benchmarking Gather

# #processes = 64

#----------------------------------------------------------------

       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]

            0         1000         0.02         0.02         0.02

            1          331        14.02        14.16        14.09

            2          331        12.87        13.08        12.93

            4          331        14.29        14.43        14.34

            8          331        16.03        16.20        16.11

           16          331        17.54        17.74        17.64

           32          331        20.49        20.62        20.53

           64          331        23.57        23.84        23.70

          128          331        28.02        28.35        28.18

          256          331        34.78        34.88        34.80

          512          331        46.34        46.91        46.60

         1024          331        63.96        64.71        64.33

         2048          331       460.67       465.74       463.18

         4096          331       637.33       643.99       640.75

 

This the padb output:

padb –A –x –Ormgr=mpirun –tree:

 

=~=~=~=~=~=~=~=~=~=~=~= PuTTY log 2011.01.06 14:33:17 =~=~=~=~=~=~=~=~=~=~=~=

 

Warning, remote process state differs across ranks

state : ranks

R (running) : [1,3-6,8,10-13,16-20,23-28,30-32,34-42,44-45,47-49,51-53,56-59,61-63]

S (sleeping) : [0,2,7,9,14-15,21-22,29,33,43,46,50,54-55,60]

Stack trace(s) for thread: 1

-----------------

[0-63] (64 processes)

-----------------

main() at ?:?

  IMB_init_buffers_iter() at ?:?

    IMB_gather() at ?:?

      PMPI_Gather() at pgather.c:175

        mca_coll_sync_gather() at coll_sync_gather.c:46

          ompi_coll_tuned_gather_intra_dec_fixed() at coll_tuned_decision_fixed.c:714

            -----------------

            [0,3-63] (62 processes)

            -----------------

            ompi_coll_tuned_gather_intra_linear_sync() at coll_tuned_gather.c:248

              mca_pml_ob1_recv() at pml_ob1_irecv.c:104

                ompi_request_wait_completion() at ../../../../ompi/request/request.h:375

                  opal_condition_wait() at ../../../../opal/threads/condition.h:99

            -----------------

            [1] (1 processes)

            -----------------

            ompi_coll_tuned_gather_intra_linear_sync() at coll_tuned_gather.c:302

              mca_pml_ob1_send() at pml_ob1_isend.c:125

                ompi_request_wait_completion() at ../../../../ompi/request/request.h:375

                  opal_condition_wait() at ../../../../opal/threads/condition.h:99

            -----------------

            [2] (1 processes)

            -----------------

            ompi_coll_tuned_gather_intra_linear_sync() at coll_tuned_gather.c:315

              ompi_request_default_wait() at request/req_wait.c:37

                ompi_request_wait_completion() at ../ompi/request/request.h:375

                  opal_condition_wait() at ../opal/threads/condition.h:99

Stack trace(s) for thread: 2

-----------------

[0-63] (64 processes)

-----------------

start_thread() at ?:?

  btl_openib_async_thread() at btl_openib_async.c:344

    poll() at ?:?

Stack trace(s) for thread: 3

-----------------

[0-63] (64 processes)

-----------------

start_thread() at ?:?

  service_thread_start() at btl_openib_fd.c:427

    select() at ?:?

-bash-3.2$

 

 

When running again padb after couple of minutes, I can see that the total number of processes remain in the same position but

different processes are at different positions.

For example, this is the diff between two padb outputs:

 

Warning, remote process state differs across ranks

state : ranks

-R (running) : [0,2-4,6-13,16-18,20-21,28-31,33-36,38-56,58,60,62-63]

-S (sleeping) : [1,5,14-15,19,22-27,32,37,57,59,61]

+R (running) : [2,5-14,16-23,25,28-40,42-48,50-51,53-58,61,63]

+S (sleeping) : [0-1,3-4,15,24,26-27,41,49,52,59-60,62]

Stack trace(s) for thread: 1

-----------------

[0-63] (64 processes)

@@ -13,21 +13,21 @@

mca_coll_sync_gather() at coll_sync_gather.c:46

ompi_coll_tuned_gather_intra_dec_fixed() at coll_tuned_decision_fixed.c:714

-----------------

- [0,3-63] (62 processes)

+ [0-5,8-63] (62 processes)

-----------------

ompi_coll_tuned_gather_intra_linear_sync() at coll_tuned_gather.c:248

mca_pml_ob1_recv() at pml_ob1_irecv.c:104

ompi_request_wait_completion() at ../../../../ompi/request/request.h:375

opal_condition_wait() at ../../../../opal/threads/condition.h:99

-----------------

- [1] (1 processes)

+ [6] (1 processes)

-----------------

ompi_coll_tuned_gather_intra_linear_sync() at coll_tuned_gather.c:302

mca_pml_ob1_send() at pml_ob1_isend.c:125

ompi_request_wait_completion() at ../../../../ompi/request/request.h:375

opal_condition_wait() at ../../../../opal/threads/condition.h:99

-----------------

- [2] (1 processes)

+ [7] (1 processes)

-----------------

ompi_coll_tuned_gather_intra_linear_sync() at coll_tuned_gather.c:315

ompi_request_default_wait() at request/req_wait.c:37

 

 

Choosing different gather algorithm seems to bypass the hang.

I’ve used the following mca parameters:

--mca coll_tuned_use_dynamic_rules 1

--mca coll_tuned_gather_algorithm 1

 

Actually, both dec_fixed and basic_linear works while binomial and linear_sync doesn’t.

 

With OMPI 1.5 it doesn’t hangs (with all gather algorithms) and it much faster (the number of repetitions is much higher):

#----------------------------------------------------------------

# Benchmarking Gather

# #processes = 64

#----------------------------------------------------------------

       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]

            0         1000         0.02         0.03         0.02

            1         1000        18.50        18.55        18.53

            2         1000        18.17        18.25        18.22

            4         1000        19.04        19.10        19.07

            8         1000        19.60        19.67        19.64

           16         1000        21.39        21.47        21.43

           32         1000        24.83        24.91        24.87

           64         1000        27.35        27.45        27.40

          128         1000        33.23        33.34        33.29

          256         1000        41.24        41.39        41.32

          512         1000        52.62        52.81        52.71

         1024         1000        73.20        73.46        73.32

         2048         1000       416.36       418.04       417.22

         4096         1000       638.54       640.70       639.65

         8192         1000       506.26       506.97       506.63

        16384         1000       600.63       601.40       601.02

        32768         1000       639.52       640.34       639.93

        65536          640       914.22       916.02       915.13

       131072          320      2287.37      2295.18      2291.35

       262144          160      4041.36      4070.58      4056.27

       524288           80      7292.35      7463.27      7397.14

      1048576           40     13647.15     14107.15     13905.29

      2097152           20     30625.00     32635.45     31815.36

      4194304           10     63543.01     70987.49     68680.48

 

 

·         Problem 2 – segmentation fault with OMPI 1.4.3/1.5 and IMB gather np=768:

When trying to run the same command but with np=768 I get segmentation fault:

openmpi-1.4.3/bin/mpirun -np 768 -machinefile voltairenodes -mca btl sm,self,openib -mca coll_tuned_use_dynamic_rules 1 -mca coll_tuned_gather_algorithm 1 imb/src/IMB-MPI1 gather -npmin 768 -mem 1.6

 

This happens in OMPI 1.4.3 and 1.5

 

[compa163:20249] *** Process received signal ***

[compa163:20249] Signal: Segmentation fault (11)

[compa163:20249] Signal code: Address not mapped (1)

[compa163:20249] Failing at address: 0x2aab4a204000

[compa163:20249] [ 0] /lib64/libpthread.so.0 [0x366aa0e7c0]

[compa163:20249] [ 1] /gpfs/asrc/home/voltaire/install//openmpi-1.4.3/lib/libmpi.so.0(ompi_convertor_unpack+0x15f) [0x2b077882282e]

[compa163:20249] [ 2] /gpfs/asrc/home/voltaire/install/openmpi-1.4.3/lib/openmpi/mca_pml_ob1.so [0x2b077b9e1672]

[compa163:20249] [ 3] /gpfs/asrc/home/voltaire/install/openmpi-1.4.3/lib/openmpi/mca_pml_ob1.so [0x2b077b9dd0b6]

[compa163:20249] [ 4] /gpfs/asrc/home/voltaire/install/openmpi-1.4.3/lib/openmpi/mca_btl_sm.so [0x2b077c459d87]

[compa163:20249] [ 5] /gpfs/asrc/home/voltaire/install//openmpi-1.4.3/lib/libopen-pal.so.0(opal_progress+0xbe) [0x2b0778d845b8]

[compa163:20249] [ 6] /gpfs/asrc/home/voltaire/install/openmpi-1.4.3/lib/openmpi/mca_pml_ob1.so [0x2b077b9d6d62]

[compa163:20249] [ 7] /gpfs/asrc/home/voltaire/install/openmpi-1.4.3/lib/openmpi/mca_pml_ob1.so [0x2b077b9d6ba7]

[compa163:20249] [ 8] /gpfs/asrc/home/voltaire/install/openmpi-1.4.3/lib/openmpi/mca_pml_ob1.so [0x2b077b9d6a90]

[compa163:20249] [ 9] /gpfs/asrc/home/voltaire/install/openmpi-1.4.3/lib/openmpi/mca_coll_tuned.so [0x2b077d298dc5]

[compa163:20249] [10] /gpfs/asrc/home/voltaire/install/openmpi-1.4.3/lib/openmpi/mca_coll_tuned.so [0x2b077d2990d3]

[compa163:20249] [11] /gpfs/asrc/home/voltaire/install/openmpi-1.4.3/lib/openmpi/mca_coll_tuned.so [0x2b077d286e9b]

[compa163:20249] [12] /gpfs/asrc/home/voltaire/install/openmpi-1.4.3/lib/openmpi/mca_coll_sync.so [0x2b077d07e96c]

[compa163:20249] [13] /gpfs/asrc/home/voltaire/install//openmpi-1.4.3/lib/libmpi.so.0(PMPI_Gather+0x55e) [0x2b077883ec9a]

[compa163:20249] [14] imb/src/IMB-MPI1(IMB_gather+0xe8) [0x40a088]

[compa163:20249] [15] imb/src/IMB-MPI1(IMB_init_buffers_iter+0x28a) [0x405baa]

[compa163:20249] [16] imb/src/IMB-MPI1(main+0x30f) [0x40362f]

[compa163:20249] [17] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3669e1d994]

[compa163:20249] [18] imb/src/IMB-MPI1 [0x403269]

[compa163:20249] *** End of error message ***
 
 
Any ideas? More debuggin tips?
 
Thanks,
Doron

_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel