Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] problem with 32-bit mpiJava on openmpi-1.9a1r27361
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-09-24 11:04:06


On Sep 24, 2012, at 4:35 AM, Siegmar Gross <Siegmar.Gross_at_[hidden]> wrote:

> Hi,
>
> I tried mpiJava on a 32-bit installation of openmpi-1.9a1r27361.
> Why doesn't "mpiexec" start a process on my local machine (it
> is not a matter of Java, because I have the same behaviour when
> I use "hostname")?
>
> tyr java 133 mpiexec -np 3 -host tyr,sunpc4,sunpc1 \
> java -cp $HOME/mpi_classfiles HelloMainWithBarrier
> Process 0 of 3 running on sunpc4.informatik.hs-fulda.de
> Process 1 of 3 running on sunpc4.informatik.hs-fulda.de
> Process 2 of 3 running on sunpc1
> ...
>
> tyr small_prog 142 mpiexec -np 3 -host tyr,sunpc4,sunpc1 hostname
> sunpc1
> sunpc4.informatik.hs-fulda.de
> sunpc4.informatik.hs-fulda.de
>

No idea - it works fine for me. Do you have an environmental variable, or something in your default MCA param file, that indicates "no_use_local"?

>
> The command breaks if I add a Linux machine.

Check to ensure that the path and ld_library_path on your linux box is being correctly set to point to the corresponding Linux OMPI libs. It looks like that isn't the case. Remember, the Java bindings are just that - they are bindings that wrap on top of the regular C code. Thus, the underlying OMPI system remains system-dependent, and you must have the appropriate native libraries installed on each machine.

>
> tyr java 110 mpiexec -np 3 -host tyr,sunpc4,linpc4 \
> java -cp $HOME/mpi_classfiles HelloMainWithBarrier
> --------------------------------------------------------------------------
> It looks like opal_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during opal_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> mca_base_open failed
> --> Returned value -2 instead of OPAL_SUCCESS
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> opal_init failed
> --> Returned value Out of resource (-2) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> ompi_mpi_init: orte_init failed
> --> Returned "Out of resource" (-2) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> *** and potentially your MPI job)
> [linpc4:27369] Local abort before MPI_INIT completed successfully;
> not able to aggregate error messages, and not able to guarantee
> that all other processes were killed!
> -------------------------------------------------------
> Primary job terminated normally, but 1 process returned
> a non-zero exit code.. Per user-direction, the job has been aborted.
> -------------------------------------------------------
> --------------------------------------------------------------------------
> mpiexec detected that one or more processes exited with non-zero status,
> thus causing
> the job to be terminated. The first process to do so was:
>
> Process name: [[21095,1],2]
> Exit code: 1
> --------------------------------------------------------------------------
>
>
> tyr java 111 which mpijavac
> /usr/local/openmpi-1.9_32_cc/bin/mpijavac
> tyr java 112 more /usr/local/openmpi-1.9_32_cc/bin/mpijavac
> #!/usr/bin/env perl
>
> # WARNING: DO NOT EDIT THE mpijava.pl FILE AS IT IS GENERATED!
> # MAKE ALL CHANGES IN mpijava.pl.in
>
> # Copyright (c) 2011 Cisco Systems, Inc. All rights reserved.
> # Copyright (c) 2012 Oracle and/or its affiliates. All rights reserved.
>
> use strict;
>
> # The main purpose of this wrapper compiler is to check for
> # and adjust the Java class path to include the OMPI classes
> # in mpi.jar. The user may have specified a class path on
> # our cmd line, or it may be in the environment, so we have
> # to check for both. We also need to be careful not to
> # just override the class path as it probably includes classes
> # they need for their application! It also may already include
> # the path to mpi.jar, and while it doesn't hurt anything, we
> # don't want to include our class path more than once to avoid
> # user astonishment
>
> # Let the build system provide us with some critical values
> my $my_compiler = "/usr/local/jdk1.7.0_07/bin/javac";
> my $ompi_classpath = "/usr/local/openmpi-1.9_32_cc/lib/mpi.jar";
>
> # globals
> my $showme_arg = 0;
> my $verbose = 0;
> my $my_arg;
> ...
>
>
> All libraries are available.
>
> tyr java 113 ldd /usr/local/jdk1.7.0_07/bin/javac
> libthread.so.1 => /usr/lib/libthread.so.1
> libjli.so =>
> /export2/prog/SunOS_sparc/jdk1.7.0_07/bin/../jre/lib/sparc/jli/libjli.so
> libdl.so.1 => /usr/lib/libdl.so.1
> libc.so.1 => /usr/lib/libc.so.1
> libm.so.2 => /usr/lib/libm.so.2
> /platform/SUNW,A70/lib/libc_psr.so.1
> tyr java 114 ssh sunpc4 ldd /usr/local/jdk1.7.0_07/bin/javac
> libthread.so.1 => /usr/lib/libthread.so.1
> libjli.so =>
> /usr/local/jdk1.7.0_07/bin/../jre/lib/i386/jli/libjli.so
> libdl.so.1 => /usr/lib/libdl.so.1
> libc.so.1 => /usr/lib/libc.so.1
> libm.so.2 => /usr/lib/libm.so.2
> tyr java 115 ssh linpc4 ldd /usr/local/jdk1.7.0_07/bin/javac
> linux-gate.so.1 => (0xffffe000)
> libpthread.so.0 => /lib/libpthread.so.0 (0xf77b2000)
> libjli.so => /usr/local/jdk1.7.0_07/bin/../jre/lib/i386/jli/libjli.so
> (0xf779d000)
> libdl.so.2 => /lib/libdl.so.2 (0xf7798000)
> libc.so.6 => /lib/libc.so.6 (0xf762b000)
> /lib/ld-linux.so.2 (0xf77ce000)
>
>
> I don't have any errors in the log files except the error for nfs.
>
> tyr openmpi-1.9-Linux.x86_64.32_cc 136 ls log.*
> log.configure.Linux.x86_64.32_cc log.make-install.Linux.x86_64.32_cc
> log.make-check.Linux.x86_64.32_cc log.make.Linux.x86_64.32_cc
>
> tyr openmpi-1.9-Linux.x86_64.32_cc 137 grep "Error 1" log.*
> log.make-check.Linux.x86_64.32_cc:make[3]: *** [check-TESTS] Error 1
> log.make-check.Linux.x86_64.32_cc:make[1]: *** [check-recursive] Error 1
> log.make-check.Linux.x86_64.32_cc:make: *** [check-recursive] Error 1
>
> ...
> SUPPORT: OMPI Test failed: opal_path_nfs() (1 of 32 failed)
> FAIL: opal_path_nfs
> ========================================================
> 1 of 2 tests failed
> Please report to http://www.open-mpi.org/community/help/
> ========================================================
> make[3]: *** [check-TESTS] Error 1
> ...
>
>
> It doesn't help to build the class files on Linux (which should be
> independent of the architecture anyway).
>
> tyr java 131 ssh linpc4
> linpc4 fd1026 98 cd .../prog/mpi/java
> linpc4 java 99 make clean
> rm -f /home/fd1026/mpi_classfiles/HelloMainWithBarrier.class \
> /home/fd1026/mpi_classfiles/HelloMainWithoutBarrier.class
> linpc4 java 100 make
> mpijavac -d /home/fd1026/mpi_classfiles HelloMainWithBarrier.java
> mpijavac -d /home/fd1026/mpi_classfiles HelloMainWithoutBarrier.java
>
> linpc4 java 101 mpiexec -np 3 -host linpc4 \
> java -cp $HOME/mpi_classfiles HelloMainWithBarrier
> --------------------------------------------------------------------------
> It looks like opal_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during opal_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> mca_base_open failed
> --> Returned value -2 instead of OPAL_SUCCESS
> ...
>
> Has anybody else this problem as well? Do you know a solution?
> Thank you very much for any help in advance.
>
>
> Kind regards
>
> Siegmar
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users