Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] problem with 32-bit mpiJava on openmpi-1.9a1r27361
From: Siegmar Gross (Siegmar.Gross_at_[hidden])
Date: 2012-09-24 07:35:46


Hi,

I tried mpiJava on a 32-bit installation of openmpi-1.9a1r27361.
Why doesn't "mpiexec" start a process on my local machine (it
is not a matter of Java, because I have the same behaviour when
I use "hostname")?

tyr java 133 mpiexec -np 3 -host tyr,sunpc4,sunpc1 \
  java -cp $HOME/mpi_classfiles HelloMainWithBarrier
Process 0 of 3 running on sunpc4.informatik.hs-fulda.de
Process 1 of 3 running on sunpc4.informatik.hs-fulda.de
Process 2 of 3 running on sunpc1
...

tyr small_prog 142 mpiexec -np 3 -host tyr,sunpc4,sunpc1 hostname
sunpc1
sunpc4.informatik.hs-fulda.de
sunpc4.informatik.hs-fulda.de

The command breaks if I add a Linux machine.

tyr java 110 mpiexec -np 3 -host tyr,sunpc4,linpc4 \
  java -cp $HOME/mpi_classfiles HelloMainWithBarrier
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  mca_base_open failed
  --> Returned value -2 instead of OPAL_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_init failed
  --> Returned value Out of resource (-2) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: orte_init failed
  --> Returned "Out of resource" (-2) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[linpc4:27369] Local abort before MPI_INIT completed successfully;
  not able to aggregate error messages, and not able to guarantee
  that all other processes were killed!
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status,
  thus causing
the job to be terminated. The first process to do so was:

  Process name: [[21095,1],2]
  Exit code: 1
--------------------------------------------------------------------------

tyr java 111 which mpijavac
/usr/local/openmpi-1.9_32_cc/bin/mpijavac
tyr java 112 more /usr/local/openmpi-1.9_32_cc/bin/mpijavac
#!/usr/bin/env perl

# WARNING: DO NOT EDIT THE mpijava.pl FILE AS IT IS GENERATED!
# MAKE ALL CHANGES IN mpijava.pl.in

# Copyright (c) 2011 Cisco Systems, Inc. All rights reserved.
# Copyright (c) 2012 Oracle and/or its affiliates. All rights reserved.

use strict;

# The main purpose of this wrapper compiler is to check for
# and adjust the Java class path to include the OMPI classes
# in mpi.jar. The user may have specified a class path on
# our cmd line, or it may be in the environment, so we have
# to check for both. We also need to be careful not to
# just override the class path as it probably includes classes
# they need for their application! It also may already include
# the path to mpi.jar, and while it doesn't hurt anything, we
# don't want to include our class path more than once to avoid
# user astonishment

# Let the build system provide us with some critical values
my $my_compiler = "/usr/local/jdk1.7.0_07/bin/javac";
my $ompi_classpath = "/usr/local/openmpi-1.9_32_cc/lib/mpi.jar";

# globals
my $showme_arg = 0;
my $verbose = 0;
my $my_arg;
...

All libraries are available.

tyr java 113 ldd /usr/local/jdk1.7.0_07/bin/javac
        libthread.so.1 => /usr/lib/libthread.so.1
        libjli.so =>
/export2/prog/SunOS_sparc/jdk1.7.0_07/bin/../jre/lib/sparc/jli/libjli.so
        libdl.so.1 => /usr/lib/libdl.so.1
        libc.so.1 => /usr/lib/libc.so.1
        libm.so.2 => /usr/lib/libm.so.2
        /platform/SUNW,A70/lib/libc_psr.so.1
tyr java 114 ssh sunpc4 ldd /usr/local/jdk1.7.0_07/bin/javac
        libthread.so.1 => /usr/lib/libthread.so.1
        libjli.so =>
/usr/local/jdk1.7.0_07/bin/../jre/lib/i386/jli/libjli.so
        libdl.so.1 => /usr/lib/libdl.so.1
        libc.so.1 => /usr/lib/libc.so.1
        libm.so.2 => /usr/lib/libm.so.2
tyr java 115 ssh linpc4 ldd /usr/local/jdk1.7.0_07/bin/javac
        linux-gate.so.1 => (0xffffe000)
        libpthread.so.0 => /lib/libpthread.so.0 (0xf77b2000)
        libjli.so => /usr/local/jdk1.7.0_07/bin/../jre/lib/i386/jli/libjli.so
(0xf779d000)
        libdl.so.2 => /lib/libdl.so.2 (0xf7798000)
        libc.so.6 => /lib/libc.so.6 (0xf762b000)
        /lib/ld-linux.so.2 (0xf77ce000)

I don't have any errors in the log files except the error for nfs.

tyr openmpi-1.9-Linux.x86_64.32_cc 136 ls log.*
log.configure.Linux.x86_64.32_cc log.make-install.Linux.x86_64.32_cc
log.make-check.Linux.x86_64.32_cc log.make.Linux.x86_64.32_cc

tyr openmpi-1.9-Linux.x86_64.32_cc 137 grep "Error 1" log.*
log.make-check.Linux.x86_64.32_cc:make[3]: *** [check-TESTS] Error 1
log.make-check.Linux.x86_64.32_cc:make[1]: *** [check-recursive] Error 1
log.make-check.Linux.x86_64.32_cc:make: *** [check-recursive] Error 1

...
SUPPORT: OMPI Test failed: opal_path_nfs() (1 of 32 failed)
FAIL: opal_path_nfs
========================================================
1 of 2 tests failed
Please report to http://www.open-mpi.org/community/help/
========================================================
make[3]: *** [check-TESTS] Error 1
...

It doesn't help to build the class files on Linux (which should be
independent of the architecture anyway).

tyr java 131 ssh linpc4
linpc4 fd1026 98 cd .../prog/mpi/java
linpc4 java 99 make clean
rm -f /home/fd1026/mpi_classfiles/HelloMainWithBarrier.class \
  /home/fd1026/mpi_classfiles/HelloMainWithoutBarrier.class
linpc4 java 100 make
mpijavac -d /home/fd1026/mpi_classfiles HelloMainWithBarrier.java
mpijavac -d /home/fd1026/mpi_classfiles HelloMainWithoutBarrier.java

linpc4 java 101 mpiexec -np 3 -host linpc4 \
  java -cp $HOME/mpi_classfiles HelloMainWithBarrier
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  mca_base_open failed
  --> Returned value -2 instead of OPAL_SUCCESS
...

Has anybody else this problem as well? Do you know a solution?
Thank you very much for any help in advance.

Kind regards

Siegmar