Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] problem with 32-bit mpiJava on openmpi-1.9a1r27361
From: Siegmar Gross (Siegmar.Gross_at_[hidden])
Date: 2012-09-25 09:45:59


Hi,

> > I tried mpiJava on a 32-bit installation of openmpi-1.9a1r27361.
> > Why doesn't "mpiexec" start a process on my local machine (it
> > is not a matter of Java, because I have the same behaviour when
> > I use "hostname")?
> >
> > tyr java 133 mpiexec -np 3 -host tyr,sunpc4,sunpc1 \
> > java -cp $HOME/mpi_classfiles HelloMainWithBarrier
> > Process 0 of 3 running on sunpc4.informatik.hs-fulda.de
> > Process 1 of 3 running on sunpc4.informatik.hs-fulda.de
> > Process 2 of 3 running on sunpc1
> > ...
> >
> > tyr small_prog 142 mpiexec -np 3 -host tyr,sunpc4,sunpc1 hostname
> > sunpc1
> > sunpc4.informatik.hs-fulda.de
> > sunpc4.informatik.hs-fulda.de
> >
>
> No idea - it works fine for me. Do you have an environmental
> variable, or something in your default MCA param file, that
> indicates "no_use_local"?

I have only built and installed Open MPI and I have no param file.
I don't have a mca environment variable.

tyr hello_1 136 grep local \
  /usr/local/openmpi-1.9_64_cc/etc/openmpi-mca-params.conf
# $sysconf is a directory on a local disk, it is likely that changes
# component_path = /usr/local/lib/openmpi:~/my_openmpi_components

tyr hello_1 143 env | grep -i mca
tyr hello_1 144

> > The command breaks if I add a Linux machine.
>
> Check to ensure that the path and ld_library_path on your linux box
> is being correctly set to point to the corresponding Linux OMPI libs.
> It looks like that isn't the case. Remember, the Java bindings are
> just that - they are bindings that wrap on top of the regular C
> code. Thus, the underlying OMPI system remains system-dependent,
> and you must have the appropriate native libraries installed on
> each machine.

I implemented a small program, which shows these values and they
are wrong for MPI, but I have no idea why. The two entries at the
beginning from PATH and LD_LIBRARY_PATH are not from our normal
environment, because I add these values at the end of the environment
variables PATH, LD_LIBRARY_PATH_32, and LD_LIBRARY_PATH_64. Afterwards
I set LD_LIBRARY_PATH to LD_LIBRARY_PATH_64 on a 64-bit Solaris
machine, to LD_LIBRARY_PATH_32 followed by LD_LIBRARY_PATH_64 on a
64-bit Linux machine, and to LD_LIBRARY_PATH_32 on every 32-bit
machine.

Now 1 slave tasks are sending their environment.

Environment from task 1:
  message type: 3
  msg length: 4622 characters
  message:
    hostname: tyr.informatik.hs-fulda.de
    operating system: SunOS
    release: 5.10
    processor: sun4u
    PATH
                       /usr/local/openmpi-1.9_64_cc/bin (!!!)
                       /usr/local/openmpi-1.9_64_cc/bin (!!!)
                       /usr/local/eclipse-3.6.1
                       ...
                       /usr/local/openmpi-1.9_64_cc/bin (<- from our environment)
    LD_LIBRARY_PATH_32
                       /usr/lib
                       /usr/local/jdk1.7.0_07/jre/lib/sparc
                       ...
                       /usr/local/openmpi-1.9_64_cc/lib (<- from our environment)
    LD_LIBRARY_PATH_64
                       /usr/lib/sparcv9
                       /usr/local/jdk1.7.0_07/jre/lib/sparcv9
                       ...
                       /usr/local/openmpi-1.9_64_cc/lib64 (<- from our environment)
    LD_LIBRARY_PATH
                       /usr/local/openmpi-1.9_64_cc/lib (!!!)
                       /usr/local/openmpi-1.9_64_cc/lib64 (!!!)
                       /usr/lib/sparcv9
                       /usr/local/jdk1.7.0_07/jre/lib/sparcv9
                       ...
                       /usr/local/openmpi-1.9_64_cc/lib64 (<- from our environment)
    CLASSPATH
                       /usr/local/junit4.10
                       /usr/local/junit4.10/junit-4.10.jar
                       //usr/local/jdk1.7.0_07/j3d/lib/ext/j3dcore.jar
                       //usr/local/jdk1.7.0_07/j3d/lib/ext/j3dutils.jar
                       //usr/local/jdk1.7.0_07/j3d/lib/ext/vecmath.jar
                       /usr/local/javacc-5.0/javacc.jar
                       .

Without MPI the program uses our environment.

tyr hello_1 147 diff env_with*
1,7c1
<
<
< Now 1 slave tasks are sending their environment.
<
< Environment from task 1:
< message type: 3
< msg length: 4622 characters

---
> Environment:
14,15d7
<                        /usr/local/openmpi-1.9_64_cc/bin
<                        /usr/local/openmpi-1.9_64_cc/bin
81,82d72
<                        /usr/local/openmpi-1.9_64_cc/lib
<                        /usr/local/openmpi-1.9_64_cc/lib64
tyr hello_1 148 
I have attached the programs so that you can check yourself and
hopefully get the same results. Do you modify PATH and LD_LIBRARY_PATH?
Kind regards
Siegmar
> > tyr java 110 mpiexec -np 3 -host tyr,sunpc4,linpc4 \
> >  java -cp $HOME/mpi_classfiles HelloMainWithBarrier
> > --------------------------------------------------------------------------
> > It looks like opal_init failed for some reason; your parallel process is
> > likely to abort.  There are many reasons that a parallel process can
> > fail during opal_init; some of which are due to configuration or
> > environment problems.  This failure appears to be an internal failure;
> > here's some additional information (which may only be relevant to an
> > Open MPI developer):
> > 
> >  mca_base_open failed
> >  --> Returned value -2 instead of OPAL_SUCCESS
> > --------------------------------------------------------------------------
> > --------------------------------------------------------------------------
> > It looks like orte_init failed for some reason; your parallel process is
> > likely to abort.  There are many reasons that a parallel process can
> > fail during orte_init; some of which are due to configuration or
> > environment problems.  This failure appears to be an internal failure;
> > here's some additional information (which may only be relevant to an
> > Open MPI developer):
> > 
> >  opal_init failed
> >  --> Returned value Out of resource (-2) instead of ORTE_SUCCESS
> > --------------------------------------------------------------------------
> > --------------------------------------------------------------------------
> > It looks like MPI_INIT failed for some reason; your parallel process is
> > likely to abort.  There are many reasons that a parallel process can
> > fail during MPI_INIT; some of which are due to configuration or environment
> > problems.  This failure appears to be an internal failure; here's some
> > additional information (which may only be relevant to an Open MPI
> > developer):
> > 
> >  ompi_mpi_init: orte_init failed
> >  --> Returned "Out of resource" (-2) instead of "Success" (0)
> > --------------------------------------------------------------------------
> > *** An error occurred in MPI_Init
> > *** on a NULL communicator
> > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> > ***    and potentially your MPI job)
> > [linpc4:27369] Local abort before MPI_INIT completed successfully;
> >  not able to aggregate error messages, and not able to guarantee
> >  that all other processes were killed!
> > -------------------------------------------------------
> > Primary job  terminated normally, but 1 process returned
> > a non-zero exit code.. Per user-direction, the job has been aborted.
> > -------------------------------------------------------
> > --------------------------------------------------------------------------
> > mpiexec detected that one or more processes exited with non-zero status,
> >  thus causing
> > the job to be terminated. The first process to do so was:
> > 
> >  Process name: [[21095,1],2]
> >  Exit code:    1
> > --------------------------------------------------------------------------
> > 
> > 
> > tyr java 111 which mpijavac
> > /usr/local/openmpi-1.9_32_cc/bin/mpijavac
> > tyr java 112 more /usr/local/openmpi-1.9_32_cc/bin/mpijavac
> > #!/usr/bin/env perl
> > 
> > # WARNING: DO NOT EDIT THE mpijava.pl FILE AS IT IS GENERATED!
> > #          MAKE ALL CHANGES IN mpijava.pl.in
> > 
> > # Copyright (c) 2011      Cisco Systems, Inc.  All rights reserved.
> > # Copyright (c) 2012      Oracle and/or its affiliates.  All rights reserved.
> > 
> > use strict;
> > 
> > # The main purpose of this wrapper compiler is to check for
> > # and adjust the Java class path to include the OMPI classes
> > # in mpi.jar. The user may have specified a class path on
> > # our cmd line, or it may be in the environment, so we have
> > # to check for both. We also need to be careful not to
> > # just override the class path as it probably includes classes
> > # they need for their application! It also may already include
> > # the path to mpi.jar, and while it doesn't hurt anything, we
> > # don't want to include our class path more than once to avoid
> > # user astonishment
> > 
> > # Let the build system provide us with some critical values
> > my $my_compiler = "/usr/local/jdk1.7.0_07/bin/javac";
> > my $ompi_classpath = "/usr/local/openmpi-1.9_32_cc/lib/mpi.jar";
> > 
> > # globals
> > my $showme_arg = 0;
> > my $verbose = 0;
> > my $my_arg;
> > ...
> > 
> > 
> > All libraries are available.
> > 
> > tyr java 113 ldd /usr/local/jdk1.7.0_07/bin/javac
> >        libthread.so.1 =>        /usr/lib/libthread.so.1
> >        libjli.so =>     
> > /export2/prog/SunOS_sparc/jdk1.7.0_07/bin/../jre/lib/sparc/jli/libjli.so
> >        libdl.so.1 =>    /usr/lib/libdl.so.1
> >        libc.so.1 =>     /usr/lib/libc.so.1
> >        libm.so.2 =>     /usr/lib/libm.so.2
> >        /platform/SUNW,A70/lib/libc_psr.so.1
> > tyr java 114 ssh sunpc4 ldd /usr/local/jdk1.7.0_07/bin/javac
> >        libthread.so.1 =>        /usr/lib/libthread.so.1
> >        libjli.so =>     
> > /usr/local/jdk1.7.0_07/bin/../jre/lib/i386/jli/libjli.so
> >        libdl.so.1 =>    /usr/lib/libdl.so.1
> >        libc.so.1 =>     /usr/lib/libc.so.1
> >        libm.so.2 =>     /usr/lib/libm.so.2
> > tyr java 115 ssh linpc4 ldd /usr/local/jdk1.7.0_07/bin/javac
> >        linux-gate.so.1 =>  (0xffffe000)
> >        libpthread.so.0 => /lib/libpthread.so.0 (0xf77b2000)
> >        libjli.so => /usr/local/jdk1.7.0_07/bin/../jre/lib/i386/jli/libjli.so 
> > (0xf779d000)
> >        libdl.so.2 => /lib/libdl.so.2 (0xf7798000)
> >        libc.so.6 => /lib/libc.so.6 (0xf762b000)
> >        /lib/ld-linux.so.2 (0xf77ce000)
> > 
> > 
> > I don't have any errors in the log files except the error for nfs.
> > 
> > tyr openmpi-1.9-Linux.x86_64.32_cc 136 ls log.*
> > log.configure.Linux.x86_64.32_cc   log.make-install.Linux.x86_64.32_cc
> > log.make-check.Linux.x86_64.32_cc  log.make.Linux.x86_64.32_cc
> > 
> > tyr openmpi-1.9-Linux.x86_64.32_cc 137 grep "Error 1" log.*
> > log.make-check.Linux.x86_64.32_cc:make[3]: *** [check-TESTS] Error 1
> > log.make-check.Linux.x86_64.32_cc:make[1]: *** [check-recursive] Error 1
> > log.make-check.Linux.x86_64.32_cc:make: *** [check-recursive] Error 1
> > 
> > ...
> > SUPPORT: OMPI Test failed: opal_path_nfs() (1 of 32 failed)
> > FAIL: opal_path_nfs
> > ========================================================
> > 1 of 2 tests failed
> > Please report to http://www.open-mpi.org/community/help/
> > ========================================================
> > make[3]: *** [check-TESTS] Error 1
> > ...
> > 
> > 
> > It doesn't help to build the class files on Linux (which should be
> > independent of the architecture anyway).
> > 
> > tyr java 131 ssh linpc4
> > linpc4 fd1026 98 cd .../prog/mpi/java
> > linpc4 java 99 make clean
> > rm -f /home/fd1026/mpi_classfiles/HelloMainWithBarrier.class \
> >  /home/fd1026/mpi_classfiles/HelloMainWithoutBarrier.class
> > linpc4 java 100 make
> > mpijavac -d /home/fd1026/mpi_classfiles HelloMainWithBarrier.java
> > mpijavac -d /home/fd1026/mpi_classfiles HelloMainWithoutBarrier.java
> > 
> > linpc4 java 101  mpiexec -np 3 -host linpc4 \
> >  java -cp $HOME/mpi_classfiles HelloMainWithBarrier
> > --------------------------------------------------------------------------
> > It looks like opal_init failed for some reason; your parallel process is
> > likely to abort.  There are many reasons that a parallel process can
> > fail during opal_init; some of which are due to configuration or
> > environment problems.  This failure appears to be an internal failure;
> > here's some additional information (which may only be relevant to an
> > Open MPI developer):
> > 
> >  mca_base_open failed
> >  --> Returned value -2 instead of OPAL_SUCCESS
> > ...
> > 
> > Has anybody else this problem as well? Do you know a solution?
> > Thank you very much for any help in advance.
> > 
> > 
> > Kind regards
> > 
> > Siegmar
> > 
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
>