Subject: [OMPI users] stall with MPI_Scatter and shared memory
From: Willem Vermin (willem_at_[hidden])
Date: 2008-07-02 05:24:51

We noticed that the attached mpi program using openmpi (version 1.2.6 or
openmpi-1.3a1r18785), stalls.

compile: mpicc -o scattertest scattertest.c
run: mpiexec -n 4 ./scattertest 10000

This is for a ubuntu 32 bit system, equipped with 1 Gbyte of memory.
A test on a debian system shows the same results, however on a machine
with 8 Gbyte of memory, the number 10000 must be enlarged in order to
get a stall happening. The program runs ok when the number is lower:

   mpiexec -n 4 ./scattertest 10

or when disabling the sm:

   mpiexec -n 4 -mca btl ^sm ./sctattertest 100000

or when activating the commented out MPI_Barrier call

The same behaviour is observed with the use of MPI_Scatterv and
MPI_Isend - MPI_Irecv

Please find attached:

   scattertest.c : the test program
   config.log.bz2 : config.log from configuring openmpi-1.3a1r18785
   ompi_info--all.bz2: output from ompi_info --all
   ifconfig : output of ifconfig



Willem Vermin         tel (31)20 5923054/5923000
SARA, Kruislaan 415   fax (31)20 6683167
1098 SJ Amsterdam     willem_at_[hidden]

eth0 Link encap:Ethernet HWaddr 00:12:3F:2B:5D:77
          inet addr: Bcast: Mask:
          inet6 addr: fe80::212:3fff:fe2b:5d77/64 Scope:Link
          RX packets:3918360 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6003598 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2628497326 (2.4 GB) TX bytes:1968595851 (1.8 GB)

lo Link encap:Local Loopback
          inet addr: Mask:
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:16436 Metric:1
          RX packets:25381363 errors:0 dropped:0 overruns:0 frame:0
          TX packets:25381363 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1022252832 (974.8 MB) TX bytes:1022252832 (974.8 MB)