Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] problem with groups and communicators in openmpi-1.9 in Java and C
From: Siegmar Gross (Siegmar.Gross_at_[hidden])
Date: 2013-01-15 16:28:58


Hi

I have a problem with groups and communicators in openmpi-1.9a1r27787
with Java. I want to multiply two matrices with any number of
processes. I build a new group, if I start more than n processes
and I use all processes, if I start at most n processes.

My program contains the following code.

...
      /* Create group "groupWorker" */
      groupWorker = groupCommWorld.Incl (group_w_mem);
    }
    else
    {
      /* there are at most as many processes as rows in matrix "a",
       * i.e., we can use the "basic group"
       */
      groupWorker = groupCommWorld;
    }
    /* Create group "groupOther" which demonstrates only how to use
     * another group operation and which has nothing to do in this
     * program.
     */
    groupOther = Group.Difference (groupCommWorld, groupWorker);
    if (groupOther == MPI.GROUP_EMPTY)
    {
      System.out.println ("groupOther is empty.");
    }
    else
    {
      System.out.println ("groupOther is not empty.");
    }

    groupCommWorld.finalize ();
    /* Create communicators for both groups. The communicator is only
     * defined for all processes of the group and it is undefined
     * (MPI.COMM_NULL) for all other processes.
     */
    COMM_WORKER = MPI.COMM_WORLD.Creat (groupWorker);
    COMM_OTHER = MPI.COMM_WORLD.Creat (groupOther);
...

Shouldn't "MPI.COMM_WORLD.Creat" be "MPI.COMM_WORLD.Create"?
"groupOther" should be empty, if I use "-np 4". Unfortunately it isn't.

tyr java 112 ompi_info | grep "Open MPI:"
                Open MPI: 1.9a1r27787
tyr java 113 mpijavac MatMultWithAnyProc2DarrayIn1DarrayMain.java
tyr java 114 mpiexec -np 4 java MatMultWithAnyProc2DarrayIn1DarrayMain
groupOther is not empty.
[tyr:25128] *** An error occurred in MPI_Comm_create
[tyr:25128] *** reported by process [3288334337,0]
[tyr:25128] *** on communicator MPI_COMM_WORLD
[tyr:25128] *** MPI_ERR_GROUP: invalid group
[tyr:25128] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now
abort,
[tyr:25128] *** and potentially your MPI job)
...

Everything works fine, if I use "-np 6". I have removed some lines,
so that the output is more readable.

tyr java 115 mpiexec -np 6 java MatMultWithAnyProc2DarrayIn1DarrayMain
groupOther is not empty.

(4,6)-matrix a:

      1.00 2.00 3.00 4.00 5.00 6.00
      7.00 8.00 9.00 10.00 11.00 12.00
     13.00 14.00 15.00 16.00 17.00 18.00
     19.00 20.00 21.00 22.00 23.00 24.00

(6,8)-matrix b:

     48.00 47.00 46.00 45.00 44.00 43.00 42.00 41.00
     40.00 39.00 38.00 37.00 36.00 35.00 34.00 33.00
     32.00 31.00 30.00 29.00 28.00 27.00 26.00 25.00
     24.00 23.00 22.00 21.00 20.00 19.00 18.00 17.00
     16.00 15.00 14.00 13.00 12.00 11.00 10.00 9.00
      8.00 7.00 6.00 5.00 4.00 3.00 2.00 1.00

(4,8)-result-matrix c = a * b:

    448.00 427.00 406.00 385.00 364.00 343.00 322.00 301.00
   1456.00 1399.00 1342.00 1285.00 1228.00 1171.00 1114.00 1057.00
   2464.00 2371.00 2278.00 2185.00 2092.00 1999.00 1906.00 1813.00
   3472.00 3343.00 3214.00 3085.00 2956.00 2827.00 2698.00 2569.00

It seems that I'm not allowed to do

groupWorker = groupCommWorld;
...
groupOther = Group.Difference (groupCommWorld, groupWorker);

or that Group.Difference doesn't return MPI.GROUP_EMPTY.

I have a similar program in C which also doesn't work with Open MPI
(I get the same error for openmpi-1.6.4 and 1.9).

tyr strided_vector 109 ompi_info | grep "Open MPI:"
                Open MPI: 1.6.4a1r27643

tyr strided_vector 108 ompi_info | grep "Open MPI:"
                Open MPI: 1.9a1r27787

tyr strided_vector 108 mpiexec -np 4 data_type_4
Process 0 of 4 running on tyr.informatik.hs-fulda.de
Process 1 of 4 running on tyr.informatik.hs-fulda.de
Process 2 of 4 running on tyr.informatik.hs-fulda.de
Process 3 of 4 running on tyr.informatik.hs-fulda.de

original matrix:

     1 2 3 4 5 6 7 8 9 10
    11 12 13 14 15 16 17 18 19 20
    21 22 23 24 25 26 27 28 29 30
    31 32 33 34 35 36 37 38 39 40
    41 42 43 44 45 46 47 48 49 50
    51 52 53 54 55 56 57 58 59 60

result matrix:
  elements are sqared in columns:
     0 1 2 6 7
  elements are multiplied with 2 in columns:
     3 4 5 8 9

     1 4 9 8 10 12 49 64 18 20
   121 144 169 28 30 32 289 324 38 40
   441 484 529 48 50 52 729 784 58 60
   961 1024 1089 68 70 72 1369 1444 78 80
  1681 1764 1849 88 90 92 2209 2304 98 100
  2601 2704 2809 108 110 112 3249 3364 118 120

Assertion failed: OPAL_OBJ_MAGIC_ID == ((opal_object_t *) (comm->c_remote_group)
)->obj_magic_id, file ../../openmpi-1.6.4a1r27643/ompi/communicator/comm_init.c,
 line 412
[tyr:24415] *** Process received signal ***
Assertion failed: OPAL_OBJ_MAGIC_ID == ((opal_object_t *) (comm->c_remote_group)
)->obj_magic_id, file ../../openmpi-1.6.4a1r27643/ompi/communicator/comm_init.c,
 line 412
[tyr:24415] Signal: Abort (6)
[tyr:24415] Signal code: (-1)
...

The program works as expected, if I use LAM-MPI.

tyr strided_vector 115 lamboot

LAM 6.5.9/MPI 2 C++ - Indiana University

tyr strided_vector 116 mpirun -np 4 data_type_4
Process 0 of 4 running on tyr.informatik.hs-fulda.de
Process 1 of 4 running on tyr.informatik.hs-fulda.de
Process 2 of 4 running on tyr.informatik.hs-fulda.de
Process 3 of 4 running on tyr.informatik.hs-fulda.de

original matrix:

     1 2 3 4 5 6 7 8 9 10
    11 12 13 14 15 16 17 18 19 20
    21 22 23 24 25 26 27 28 29 30
    31 32 33 34 35 36 37 38 39 40
    41 42 43 44 45 46 47 48 49 50
    51 52 53 54 55 56 57 58 59 60

result matrix:
  elements are sqared in columns:
     0 1 2 6 7
  elements are multiplied with 2 in columns:
     3 4 5 8 9

     1 4 9 8 10 12 49 64 18 20
   121 144 169 28 30 32 289 324 38 40
   441 484 529 48 50 52 729 784 58 60
   961 1024 1089 68 70 72 1369 1444 78 80
  1681 1764 1849 88 90 92 2209 2304 98 100
  2601 2704 2809 108 110 112 3249 3364 118 120

tyr strided_vector 117 lamhalt

LAM 6.5.9/MPI 2 C++ - Indiana University

I would be grateful, if somebody can fix the problems in Open MPI.
Thank you very much for any help in advance.

Kind regards

Siegmar