Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] problems with mpiJava in openmpi-1.9a1r27362
From: Siegmar Gross (Siegmar.Gross_at_[hidden])
Date: 2012-09-26 08:42:03


Hi,

yesterday I installed openmpi-1.9a1r27362 on Solaris and Linux and
I have a problem with mpiJava on Linux (openSUSE-Linux 12.1, x86_64).

linpc4 mpi_classfiles 104 javac HelloMainWithoutMPI.java
linpc4 mpi_classfiles 105 mpijavac HelloMainWithBarrier.java
linpc4 mpi_classfiles 106 mpijavac -showme
/usr/local/jdk1.7.0_07-64/bin/javac \
  -cp ...:.:/usr/local/openmpi-1.9_64_cc/lib64/mpi.jar

It works with Java without MPI.

linpc4 mpi_classfiles 107 mpiexec java -cp $HOME/mpi_classfiles \
  HelloMainWithoutMPI
Hello from linpc4.informatik.hs-fulda.de/193.174.26.225

It breaks with Java and MPI.

linpc4 mpi_classfiles 108 mpiexec java -cp $HOME/mpi_classfiles \
  HelloMainWithBarrier
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  mca_base_open failed
  --> Returned value -2 instead of OPAL_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_init failed
  --> Returned value Out of resource (-2) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: orte_init failed
  --> Returned "Out of resource" (-2) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[linpc4:15332] Local abort before MPI_INIT completed successfully; not able to
aggregate error messages, and not able to guarantee that all other processes were
killed!
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus
causing
the job to be terminated. The first process to do so was:

  Process name: [[58875,1],0]
  Exit code: 1
--------------------------------------------------------------------------

I configured with the following command.

../openmpi-1.9a1r27362/configure --prefix=/usr/local/openmpi-1.9_64_cc \
  --libdir=/usr/local/openmpi-1.9_64_cc/lib64 \
  --with-jdk-bindir=/usr/local/jdk1.7.0_07-64/bin \
  --with-jdk-headers=/usr/local/jdk1.7.0_07-64/include \
  JAVA_HOME=/usr/local/jdk1.7.0_07-64 \
  LDFLAGS="-m64" \
  CC="cc" CXX="CC" FC="f95" \
  CFLAGS="-m64" CXXFLAGS="-m64 -library=stlport4" FCFLAGS="-m64" \
  CPP="cpp" CXXCPP="cpp" \
  CPPFLAGS="" CXXCPPFLAGS="" \
  C_INCL_PATH="" C_INCLUDE_PATH="" CPLUS_INCLUDE_PATH="" \
  OBJC_INCLUDE_PATH="" OPENMPI_HOME="" \
  --enable-cxx-exceptions \
  --enable-mpi-java \
  --enable-heterogeneous \
  --enable-opal-multi-threads \
  --enable-mpi-thread-multiple \
  --with-threads=posix \
  --with-hwloc=internal \
  --without-verbs \
  --without-udapl \
  --with-wrapper-cflags=-m64 \
  --enable-debug \
  |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc

It works fine on Solaris machines as long as the hosts belong to the
same kind (Sparc or x86_64).

tyr mpi_classfiles 194 mpiexec -host sunpc0,sunpc1,sunpc4 \
  java -cp $HOME/mpi_classfiles HelloMainWithBarrier
Process 1 of 3 running on sunpc1
Process 2 of 3 running on sunpc4.informatik.hs-fulda.de
Process 0 of 3 running on sunpc0

sunpc4 fd1026 107 mpiexec -host tyr,rs0,rs1 \
  java -cp $HOME/mpi_classfiles HelloMainWithBarrier
Process 1 of 3 running on rs0.informatik.hs-fulda.de
Process 2 of 3 running on rs1.informatik.hs-fulda.de
Process 0 of 3 running on tyr.informatik.hs-fulda.de

It breaks if the hosts belong to both kinds of machines.

sunpc4 fd1026 106 mpiexec -host tyr,rs0,sunpc1 \
  java -cp $HOME/mpi_classfiles HelloMainWithBarrier
[rs0.informatik.hs-fulda.de:7718] *** An error occurred in MPI_Comm_dup
[rs0.informatik.hs-fulda.de:7718] *** reported by process [565116929,1]
[rs0.informatik.hs-fulda.de:7718] *** on communicator MPI_COMM_WORLD
[rs0.informatik.hs-fulda.de:7718] *** MPI_ERR_INTERN: internal error
[rs0.informatik.hs-fulda.de:7718] *** MPI_ERRORS_ARE_FATAL (processes
  in this communicator will now abort,
[rs0.informatik.hs-fulda.de:7718] *** and potentially your MPI job)
[sunpc4.informatik.hs-fulda.de:07900] 1 more process has sent help
  message help-mpi-errors.txt / mpi_errors_are_fatal
[sunpc4.informatik.hs-fulda.de:07900] Set MCA parameter
  "orte_base_help_aggregate" to 0 to see all help / error messages

Please let me know if I can provide anything else to track these errors.
Thank you very much for any help in advance.

Kind regards

Siegmar