Hi,
some weeks ago I reported a problem with my matrix multiplication
program in a heterogeneous environment (little endian and big endian
machines). The problem occurs in openmpi-1.6.x, openmpi-1.7, and
openmpi-1.9. Now I implemented a small program which only scatters
the columns of an integer matrix so that it is easier to see what
goes wrong. I configured for a heterogeneous environment. Adding
"-hetero-nodes" and/or "-hetero-apps" on the command line doesn't
change much as you can see at the end of this email. Everything
works fine, if I use only little endian or only big endian machines.
Is it possible to fix the problem or do you know in which file(s)
I would have to look to find the problem or do you know debug
switches which would provide more information to solve the problem?
I used the following command to configure the package on my "Solaris
10 Sparc" system (the commands for my other systems are similar).
Next time I will also add "-without-sctp" to get rid of the failures
on my Linux machines (Open SuSE 12.1).
../openmpi-1.9a1r27668/configure --prefix=/usr/local/openmpi-1.9_64_cc \
--libdir=/usr/local/openmpi-1.9_64_cc/lib64 \
--with-jdk-bindir=/usr/local/jdk1.7.0_07/bin/sparcv9 \
--with-jdk-headers=/usr/local/jdk1.7.0_07/include \
JAVA_HOME=/usr/local/jdk1.7.0_07 \
LDFLAGS="-m64" \
CC="cc" CXX="CC" FC="f95" \
CFLAGS="-m64" CXXFLAGS="-m64 -library=stlport4" FCFLAGS="-m64" \
CPP="cpp" CXXCPP="cpp" \
CPPFLAGS="" CXXCPPFLAGS="" \
C_INCL_PATH="" C_INCLUDE_PATH="" CPLUS_INCLUDE_PATH="" \
OBJC_INCLUDE_PATH="" OPENMPI_HOME="" \
--enable-cxx-exceptions \
--enable-mpi-java \
--enable-heterogeneous \
--enable-opal-multi-threads \
--enable-mpi-thread-multiple \
--with-threads=posix \
--with-hwloc=internal \
--without-verbs \
--without-udapl \
--with-wrapper-cflags=-m64 \
--enable-debug \
|& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc
tyr small_prog 501 ompi_info | grep -e Ident -e Hetero -e "Built on"
Ident string: 1.9a1r27668
Built on: Wed Dec 12 09:00:13 CET 2012
Heterogeneous support: yes
tyr small_prog 502
tyr small_prog 488 mpiexec -np 6 -host sunpc0,rs0 column_int
matrix:
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
Column of process 1:
0x12345678 0x12345678 0x12345678 0x12345678
Column of process 2:
0x12345678 0x12345678 0x12345678 0x12345678
Column of process 3:
0x56780000 0x12340000 0x5678ffff 0x1234ce71
Column of process 4:
0x56780000 0x12340000 0x5678ffff 0x1234ce71
Column of process 0:
0x12345678 0x12345678 0x12345678 0x12345678
Column of process 5:
0x56780000 0x12340000 0x5678ffff 0x1234ce71
tyr small_prog 489
tyr small_prog 489 mpiexec -np 6 -host rs0,sunpc0 column_int
matrix:
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
Column of process 1:
Column of process 2:
0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678
Column of process 3:
0xffdf1234 0xffff5678 0x401234 0x5678
Column of process 4:
0xffdf1234 0xffff5678 0x401234 0x5678
Column of process 0:
0x12345678 0x12345678 0x12345678 0x12345678
Column of process 5:
0xffdf1234 0xffff5678 0x401234 0x5678
tyr small_prog 490
tyr small_prog 491 mpiexec -np 6 -mca btl ^sctp -host rs0,linpc0 column_int
matrix:
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
Column of process 1:
Column of process 2:
0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678
Column of process 3:
0x1234 0x5678 0xf71c1234 0x5678
Column of process 4:
0x1234 0x5678 0xc6011234 0x5678
Column of process 0:
0x12345678 0x12345678 0x12345678 0x12345678
Column of process 5:
0x1234 0x5678 0x426f1234 0x5678
tyr small_prog 492
tyr small_prog 492 mpiexec -np 6 -mca btl ^sctp -host linpc0,rs0 column_int
matrix:
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
Column of process 2:
0x12345678 0x12345678 0x12345678 0x12345678
Column of process 1:
0x12345678 0x12345678 0x12345678 0x12345678
Column of process 3:
0x56780000 0x12340000 0x5678ffff 0x1234ce51
Column of process 4:
0x56780000 0x12340000 0x5678ffff 0x1234ce51
Column of process 0:
0x12345678 0x12345678 0x12345678 0x12345678
Column of process 5:
0x56780000 0x12340000 0x5678ffff 0x1234ce51
tyr small_prog 493
tyr small_prog 498 mpiexec -np 6 -mca btl ^sctp -hetero-nodes \
-host linpc0,rs0 column_int
matrix:
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
Column of process 1:
0x12345678 0x12345678 0x12345678 0x12345678
Column of process 2:
0x12345678 0x12345678 0x12345678 0x12345678
Column of process 3:
0x56780000 0x12340000 0x5678ffff 0x1234ce31
Column of process 4:
0x56780000 0x12340000 0x5678ffff 0x1234ce31
Column of process 0:
0x12345678 0x12345678 0x12345678 0x12345678
Column of process 5:
0x56780000 0x12340000 0x5678ffff 0x1234ce31
tyr small_prog 499
tyr small_prog 499 mpiexec -np 6 -mca btl ^sctp -hetero-nodes \
-hetero-apps -host linpc0,rs0 column_int
matrix:
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
Column of process 1:
0x12345678 0x12345678 0x12345678 0x12345678
Column of process 2:
0x12345678 0x12345678 0x12345678 0x12345678
Column of process 3:
0x56780000 0x12340000 0x5678ffff 0x1234ce11
Column of process 4:
0x56780000 0x12340000 0x5678ffff 0x1234ce11
Column of process 0:
0x12345678 0x12345678 0x12345678 0x12345678
Column of process 5:
0x56780000 0x12340000 0x5678ffff 0x1234ce11
tyr small_prog 500
tyr small_prog 500 mpiexec -np 6 -mca btl ^sctp -hetero-apps \
-host linpc0,rs0 column_int
matrix:
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
0x12345678 0x12345678 0x12345678 0x12345678 0x12345678 0x12345678
Column of process 2:
0x12345678 0x12345678 0x12345678 0x12345678
Column of process 1:
0x12345678 0x12345678 0x12345678 0x12345678
Column of process 3:
0x56780000 0x12340000 0x5678ffff 0x1234ce31
Column of process 4:
0x56780000 0x12340000 0x5678ffff 0x1234ce31
Column of process 0:
0x12345678 0x12345678 0x12345678 0x12345678
Column of process 5:
0x56780000 0x12340000 0x5678ffff 0x1234ce31
tyr small_prog 501
Thank you very much for any help in advance.
Kind regards
Siegmar
|