Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] problem with groups and communicators in openmpi-1.6.4rc2
From: Siegmar Gross (Siegmar.Gross_at_[hidden])
Date: 2013-01-19 04:27:30


Hi

I have installed openmpi-1.6.4rc2 and have the following problem.

tyr strided_vector 110 ompi_info | grep "Open MPI:"
                Open MPI: 1.6.4rc2r27861
tyr strided_vector 111 mpicc -showme
gcc -I/usr/local/openmpi-1.6.4_64_gcc/include -fexceptions -pthread -m64
-L/usr/local/openmpi-1.6.4_64_gcc/lib64 -lmpi -lm -lkstat -llgrp -lsocket -lnsl
-lrt -lm

tyr strided_vector 112 mpiexec -np 4 data_type_4
Process 2 of 4 running on tyr.informatik.hs-fulda.de
Process 0 of 4 running on tyr.informatik.hs-fulda.de
Process 3 of 4 running on tyr.informatik.hs-fulda.de
Process 1 of 4 running on tyr.informatik.hs-fulda.de

original matrix:

     1 2 3 4 5 6 7 8 9 10
    11 12 13 14 15 16 17 18 19 20
    21 22 23 24 25 26 27 28 29 30
    31 32 33 34 35 36 37 38 39 40
    41 42 43 44 45 46 47 48 49 50
    51 52 53 54 55 56 57 58 59 60

result matrix:
  elements are sqared in columns:
     0 1 2 6 7
  elements are multiplied with 2 in columns:
     3 4 5 8 9

     1 4 9 8 10 12 49 64 18 20
   121 144 169 28 30 32 289 324 38 40
   441 484 529 48 50 52 729 784 58 60
   961 1024 1089 68 70 72 1369 1444 78 80
  1681 1764 1849 88 90 92 2209 2304 98 100
  2601 2704 2809 108 110 112 3249 3364 118 120

Assertion failed: OPAL_OBJ_MAGIC_ID == ((opal_object_t *) (comm->c_remote_group)
)->obj_magic_id, file ../../openmpi-1.6.4rc2r27861/ompi/communicator/comm_init.c
, line 412
[tyr:18578] *** Process received signal ***
[tyr:18578] Signal: Abort (6)
[tyr:18578] Signal code: (-1)
Assertion failed: OPAL_OBJ_MAGIC_ID == ((opal_object_t *) (comm->c_remote_group)
)->obj_magic_id, file ../../openmpi-1.6.4rc2r27861/ompi/communicator/comm_init.c
, line 412
/export2/prog/SunOS_sparc/openmpi-1.6.4_64_gcc/lib64/libmpi.so.1.0.7:opal_backtr
ace_print+0x20
[tyr:18580] *** Process received signal ***
/export2/prog/SunOS_sparc/openmpi-1.6.4_64_gcc/lib64/libmpi.so.1.0.7:0x2c1bc4
[tyr:18580] Signal: Abort (6)
[tyr:18580] Signal code: (-1)
/lib/sparcv9/libc.so.1:0xd88a4
/lib/sparcv9/libc.so.1:0xcc418
/lib/sparcv9/libc.so.1:0xcc624
/lib/sparcv9/libc.so.1:__lwp_kill+0x8 [ Signal 6 (ABRT)]
/lib/sparcv9/libc.so.1:abort+0xd0
/lib/sparcv9/libc.so.1:_assert+0x74
/export2/prog/SunOS_sparc/openmpi-1.6.4_64_gcc/lib64/libmpi.so.1.0.7:0xa4c58
/export2/prog/SunOS_sparc/openmpi-1.6.4_64_gcc/lib64/libmpi.so.1.0.7:0xa2430
/export2/prog/SunOS_sparc/openmpi-1.6.4_64_gcc/lib64/libmpi.so.1.0.7:ompi_comm_f
inalize+0x168
/export2/prog/SunOS_sparc/openmpi-1.6.4_64_gcc/lib64/libmpi.so.1.0.7:ompi_mpi_fi
nalize+0xa60
/export2/prog/SunOS_sparc/openmpi-1.6.4_64_gcc/lib64/libmpi.so.1.0.7:MPI_Finaliz
e+0x90
/home/fd1026/SunOS/sparc/bin/data_type_4:main+0x588
/home/fd1026/SunOS/sparc/bin/data_type_4:_start+0x7c
[tyr:18578] *** End of error message ***
...

Everything works fine with LAM-MPI (even in a heterogeneous environment
with little-endian and big-endian machines) so that it is probably an
error in Open MPI (but you never know).

tyr strided_vector 125 mpicc -showme
gcc -I/usr/local/lam-6.5.9_64_gcc/include -L/usr/local/lam-6.5.9_64_gcc/lib
-llamf77mpi -lmpi -llam -lsocket -lnsl
tyr strided_vector 126 lamboot -v hosts.lam-mpi

LAM 6.5.9/MPI 2 C++ - Indiana University

Executing hboot on n0 (tyr.informatik.hs-fulda.de - 2 CPUs)...
Executing hboot on n1 (sunpc1.informatik.hs-fulda.de - 4 CPUs)...
topology done

tyr strided_vector 127 mpirun -v app_data_type_4.lam-mpi
22894 data_type_4 running on local
22895 data_type_4 running on n0 (o)
21998 data_type_4 running on n1
22896 data_type_4 running on n0 (o)
Process 1 of 4 running on tyr.informatik.hs-fulda.de
Process 3 of 4 running on tyr.informatik.hs-fulda.de
Process 2 of 4 running on sunpc1
Process 0 of 4 running on tyr.informatik.hs-fulda.de

original matrix:

     1 2 3 4 5 6 7 8 9 10
    11 12 13 14 15 16 17 18 19 20
    21 22 23 24 25 26 27 28 29 30
    31 32 33 34 35 36 37 38 39 40
    41 42 43 44 45 46 47 48 49 50
    51 52 53 54 55 56 57 58 59 60

result matrix:
  elements are sqared in columns:
     0 1 2 6 7
  elements are multiplied with 2 in columns:
     3 4 5 8 9

     1 4 9 8 10 12 49 64 18 20
   121 144 169 28 30 32 289 324 38 40
   441 484 529 48 50 52 729 784 58 60
   961 1024 1089 68 70 72 1369 1444 78 80
  1681 1764 1849 88 90 92 2209 2304 98 100
  2601 2704 2809 108 110 112 3249 3364 118 120

tyr strided_vector 128 lamhalt

LAM 6.5.9/MPI 2 C++ - Indiana University

I would be grateful, if somebody could fix the problem. Thank you
very much for any help in advance.

Kind regards

Siegmar