Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] problem with data transfer in a heterogeneous environment
From: Siegmar Gross (Siegmar.Gross_at_[hidden])
Date: 2012-12-14 08:52:03


Hi,

some weeks ago I reported a problem with my matrix multiplication
program in a heterogeneous environment (little endian and big endian
machines). The problem occurs in openmpi-1.6.x, openmpi-1.7, and
openmpi-1.9. Now I implemented a small program which only scatters
the columns of an integer matrix so that it is easier to see what
goes wrong. I configured for a heterogeneous environment. Adding
"-hetero-nodes" and/or "-hetero-apps" on the command line doesn't
change much as you can see at the end of this email. Everything
works fine, if I use only little endian or only big endian machines.
Is it possible to fix the problem or do you know in which file(s)
I would have to look to find the problem or do you know debug
switches which would provide more information to solve the problem?
I used the following command to configure the package on my "Solaris
10 Sparc" system (the commands for my other systems are similar).
Next time I will also add "-without-sctp" to get rid of the failures
on my Linux machines (Open SuSE 12.1).

../openmpi-1.9a1r27668/configure --prefix=/usr/local/openmpi-1.9_64_cc \
  --libdir=/usr/local/openmpi-1.9_64_cc/lib64 \
  --with-jdk-bindir=/usr/local/jdk1.7.0_07/bin/sparcv9 \
  --with-jdk-headers=/usr/local/jdk1.7.0_07/include \
  JAVA_HOME=/usr/local/jdk1.7.0_07 \
  LDFLAGS="-m64" \
  CC="cc" CXX="CC" FC="f95" \
  CFLAGS="-m64" CXXFLAGS="-m64 -library=stlport4" FCFLAGS="-m64" \
  CPP="cpp" CXXCPP="cpp" \
  CPPFLAGS="" CXXCPPFLAGS="" \
  C_INCL_PATH="" C_INCLUDE_PATH="" CPLUS_INCLUDE_PATH="" \
  OBJC_INCLUDE_PATH="" OPENMPI_HOME="" \
  --enable-cxx-exceptions \
  --enable-mpi-java \
  --enable-heterogeneous \
  --enable-opal-multi-threads \
  --enable-mpi-thread-multiple \
  --with-threads=posix \
  --with-hwloc=internal \
  --without-verbs \
  --without-udapl \
  --with-wrapper-cflags=-m64 \
  --enable-debug \
  |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc

tyr small_prog 501 ompi_info | grep -e Ident -e Hetero -e "Built on"
            Ident string: 1.9a1r27668
                Built on: Wed Dec 12 09:00:13 CET 2012
   Heterogeneous support: yes
tyr small_prog 502

tyr small_prog 488 mpiexec -np 6 -host sunpc0,rs0 column_int

matrix:

0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678

Column of process 1:
0x12345678 0x12345678 0x12345678 0x12345678

Column of process 2:
0x12345678 0x12345678 0x12345678 0x12345678

Column of process 3:
0x56780000 0x12340000 0x5678ffff 0x1234ce71

Column of process 4:
0x56780000 0x12340000 0x5678ffff 0x1234ce71

Column of process 0:
0x12345678 0x12345678 0x12345678 0x12345678

Column of process 5:
0x56780000 0x12340000 0x5678ffff 0x1234ce71
tyr small_prog 489

tyr small_prog 489 mpiexec -np 6 -host rs0,sunpc0 column_int

matrix:

0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678

Column of process 1:

Column of process 2:
0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678

Column of process 3:
0xffdf1234 0xffff5678 0x401234 0x5678

Column of process 4:
0xffdf1234 0xffff5678 0x401234 0x5678

Column of process 0:
0x12345678 0x12345678 0x12345678 0x12345678

Column of process 5:
0xffdf1234 0xffff5678 0x401234 0x5678
tyr small_prog 490

tyr small_prog 491 mpiexec -np 6 -mca btl ^sctp -host rs0,linpc0 column_int

matrix:

0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678

Column of process 1:

Column of process 2:
0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678

Column of process 3:
0x1234 0x5678 0xf71c1234 0x5678

Column of process 4:
0x1234 0x5678 0xc6011234 0x5678

Column of process 0:
0x12345678 0x12345678 0x12345678 0x12345678

Column of process 5:
0x1234 0x5678 0x426f1234 0x5678
tyr small_prog 492

tyr small_prog 492 mpiexec -np 6 -mca btl ^sctp -host linpc0,rs0 column_int

matrix:

0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678

Column of process 2:
0x12345678 0x12345678 0x12345678 0x12345678

Column of process 1:
0x12345678 0x12345678 0x12345678 0x12345678

Column of process 3:
0x56780000 0x12340000 0x5678ffff 0x1234ce51

Column of process 4:
0x56780000 0x12340000 0x5678ffff 0x1234ce51

Column of process 0:
0x12345678 0x12345678 0x12345678 0x12345678

Column of process 5:
0x56780000 0x12340000 0x5678ffff 0x1234ce51
tyr small_prog 493

tyr small_prog 498 mpiexec -np 6 -mca btl ^sctp -hetero-nodes \
  -host linpc0,rs0 column_int

matrix:

0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678

Column of process 1:
0x12345678 0x12345678 0x12345678 0x12345678

Column of process 2:
0x12345678 0x12345678 0x12345678 0x12345678

Column of process 3:
0x56780000 0x12340000 0x5678ffff 0x1234ce31

Column of process 4:
0x56780000 0x12340000 0x5678ffff 0x1234ce31

Column of process 0:
0x12345678 0x12345678 0x12345678 0x12345678

Column of process 5:
0x56780000 0x12340000 0x5678ffff 0x1234ce31
tyr small_prog 499

tyr small_prog 499 mpiexec -np 6 -mca btl ^sctp -hetero-nodes \
  -hetero-apps -host linpc0,rs0 column_int

matrix:

0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678

Column of process 1:
0x12345678 0x12345678 0x12345678 0x12345678

Column of process 2:
0x12345678 0x12345678 0x12345678 0x12345678

Column of process 3:
0x56780000 0x12340000 0x5678ffff 0x1234ce11

Column of process 4:
0x56780000 0x12340000 0x5678ffff 0x1234ce11

Column of process 0:
0x12345678 0x12345678 0x12345678 0x12345678

Column of process 5:
0x56780000 0x12340000 0x5678ffff 0x1234ce11
tyr small_prog 500

tyr small_prog 500 mpiexec -np 6 -mca btl ^sctp -hetero-apps \
  -host linpc0,rs0 column_int

matrix:

0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678

Column of process 2:
0x12345678 0x12345678 0x12345678 0x12345678

Column of process 1:
0x12345678 0x12345678 0x12345678 0x12345678

Column of process 3:
0x56780000 0x12340000 0x5678ffff 0x1234ce31

Column of process 4:
0x56780000 0x12340000 0x5678ffff 0x1234ce31

Column of process 0:
0x12345678 0x12345678 0x12345678 0x12345678

Column of process 5:
0x56780000 0x12340000 0x5678ffff 0x1234ce31
tyr small_prog 501

Thank you very much for any help in advance.

Kind regards

Siegmar