Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] internal error with mpiJava in openmpi-1.9a1r27380
From: Siegmar Gross (Siegmar.Gross_at_[hidden])
Date: 2012-10-10 08:42:52


Hi,

I have built openmpi-1.9a1r27380 with Java support and implemented
a small program that sends some kind of "hello" with Send/Recv.

tyr java 164 make
mpijavac -d /home/fd1026/mpi_classfiles MsgSendRecvMain.java
...

Everything works fine, if I use Solaris 10 x86_84.

tyr java 165 mpiexec -np 3 -host sunpc0,sunpc1 \
  java -cp $HOME/mpi_classfiles MsgSendRecvMain

Now 2 processes are sending greetings.

Greetings from process 2:
  message tag: 3
  message length: 6
  message: sunpc1

Greetings from process 1:
  message tag: 3
  message length: 6
  message: sunpc0

Everything works fine, if I use Solaris 10 Sparc.

tyr java 166 mpiexec -np 3 -host rs0,rs1 \
  java -cp $HOME/mpi_classfiles MsgSendRecvMain

Now 2 processes are sending greetings.

Greetings from process 2:
  message tag: 3
  message length: 26
  message: rs1.informatik.hs-fulda.de

Greetings from process 1:
  message tag: 3
  message length: 26
  message: rs0.informatik.hs-fulda.de

The program breaks, if I use both systems.

tyr java 167 mpiexec -np 3 -host rs0,sunpc0 \
  java -cp $HOME/mpi_classfiles MsgSendRecvMain
[rs0.informatik.hs-fulda.de:9621] *** An error occurred in MPI_Comm_dup
[rs0.informatik.hs-fulda.de:9621] *** reported by process [1976500225,0]
[rs0.informatik.hs-fulda.de:9621] *** on communicator MPI_COMM_WORLD
[rs0.informatik.hs-fulda.de:9621] *** MPI_ERR_INTERN: internal error
[rs0.informatik.hs-fulda.de:9621] *** MPI_ERRORS_ARE_FATAL (processes
   in this communicator will now abort,
[rs0.informatik.hs-fulda.de:9621] *** and potentially your MPI job)
[tyr.informatik.hs-fulda.de:22491] 1 more process has sent help message
   help-mpi-errors.txt / mpi_errors_are_fatal
[tyr.informatik.hs-fulda.de:22491] Set MCA parameter
  "orte_base_help_aggregate" to 0 to see all help / error messages

The program breaks, if I use Linux x86_64.

tyr java 168 mpiexec -np 3 -host linpc0,linpc1 \
  java -cp $HOME/mpi_classfiles MsgSendRecvMain
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  mca_base_open failed
  --> Returned value -2 instead of OPAL_SUCCESS
--------------------------------------------------------------------------
...

*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[linpc0:20277] Local abort before MPI_INIT completed successfully;
  not able to aggregate error messages, and not able to guarantee that
  all other processes were killed!
...

Please let me know if you need more information to track the problem.
Thank you very much for any help in advance.

Kind regards

Siegmar