Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] matrix multiplication in openmpi-1.9a1r27787 with Java
From: Siegmar Gross (Siegmar.Gross_at_[hidden])
Date: 2013-01-15 06:23:39


Hi

today I implemented a small Java program to multiply two matrices.
Once more the program works only well, if you simulate a 2-dimensional
array in an 1-dimensional one. The program works on Solaris 10 Sparc
and x86_64. It breaks on Linux x86_64 (openSuSE 12.1). Furthermore it
breaks if I combine little-endian and big-endian machines.

mpiexec -np 4 -host tyr java MatMultWithNproc2DarrayIn1DarrayMain or
mpiexec -np 4 -host sunpc1 java MatMultWithNproc2DarrayIn1DarrayMain

Process 0 of 4 running on tyr.informatik.hs-fulda.de.
Process 1 of 4 running on tyr.informatik.hs-fulda.de.
Process 2 of 4 running on tyr.informatik.hs-fulda.de.
Process 3 of 4 running on tyr.informatik.hs-fulda.de.

(4,6)-matrix a:

      1.00 2.00 3.00 4.00 5.00 6.00
      7.00 8.00 9.00 10.00 11.00 12.00
     13.00 14.00 15.00 16.00 17.00 18.00
     19.00 20.00 21.00 22.00 23.00 24.00

(6,8)-matrix b:

     48.00 47.00 46.00 45.00 44.00 43.00 42.00 41.00
     40.00 39.00 38.00 37.00 36.00 35.00 34.00 33.00
     32.00 31.00 30.00 29.00 28.00 27.00 26.00 25.00
     24.00 23.00 22.00 21.00 20.00 19.00 18.00 17.00
     16.00 15.00 14.00 13.00 12.00 11.00 10.00 9.00
      8.00 7.00 6.00 5.00 4.00 3.00 2.00 1.00

(4,8)-result-matrix c = a * b:

    448.00 427.00 406.00 385.00 364.00 343.00 322.00 301.00
   1456.00 1399.00 1342.00 1285.00 1228.00 1171.00 1114.00 1057.00
   2464.00 2371.00 2278.00 2185.00 2092.00 1999.00 1906.00 1813.00
   3472.00 3343.00 3214.00 3085.00 2956.00 2827.00 2698.00 2569.00

mpiexec -np 4 -host linpc1 java MatMultWithNproc2DarrayIn1DarrayMain
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  mca_base_open failed
  --> Returned value -2 instead of OPAL_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_init failed
  --> Returned value Out of resource (-2) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
...
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[(null):29256] Local abort before MPI_INIT completed successfully; not able to
aggregate erro
r messages, and not able to guarantee that all other processes were killed!
-------------------------------------------------------
...

mpiexec -np 4 -host tyr,sunpc1 java MatMultWithNproc2DarrayIn1DarrayMain
[tyr:20374] *** An error occurred in MPI_Comm_dup
[tyr:20374] *** reported by process [3921084417,0]
[tyr:20374] *** on communicator MPI_COMM_WORLD
[tyr:20374] *** MPI_ERR_INTERN: internal error
[tyr:20374] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now
abort,
[tyr:20374] *** and potentially your MPI job)
[tyr.informatik.hs-fulda.de:20369] 1 more process has sent help message
help-mpi-errors.txt / mpi_errors_are_fatal
[tyr.informatik.hs-fulda.de:20369] Set MCA parameter "orte_base_help_aggregate"
to 0 to see all help / error messages
tyr java 270

Any ideas why it breaks? Thank you very much for your help in advance.

Kind regards

Siegmar