Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] problem with 32-bit mpiJava on openmpi-1.9a1r27361
From: Siegmar Gross (Siegmar.Gross_at_[hidden])
Date: 2012-09-25 09:45:59


Hi,

> > I tried mpiJava on a 32-bit installation of openmpi-1.9a1r27361.
> > Why doesn't "mpiexec" start a process on my local machine (it
> > is not a matter of Java, because I have the same behaviour when
> > I use "hostname")?
> >
> > tyr java 133 mpiexec -np 3 -host tyr,sunpc4,sunpc1 \
> > java -cp $HOME/mpi_classfiles HelloMainWithBarrier
> > Process 0 of 3 running on sunpc4.informatik.hs-fulda.de
> > Process 1 of 3 running on sunpc4.informatik.hs-fulda.de
> > Process 2 of 3 running on sunpc1
> > ...
> >
> > tyr small_prog 142 mpiexec -np 3 -host tyr,sunpc4,sunpc1 hostname
> > sunpc1
> > sunpc4.informatik.hs-fulda.de
> > sunpc4.informatik.hs-fulda.de
> >
>
> No idea - it works fine for me. Do you have an environmental
> variable, or something in your default MCA param file, that
> indicates "no_use_local"?

I have only built and installed Open MPI and I have no param file.
I don't have a mca environment variable.

tyr hello_1 136 grep local \
  /usr/local/openmpi-1.9_64_cc/etc/openmpi-mca-params.conf
# $sysconf is a directory on a local disk, it is likely that changes
# component_path = /usr/local/lib/openmpi:~/my_openmpi_components

tyr hello_1 143 env | grep -i mca
tyr hello_1 144

> > The command breaks if I add a Linux machine.
>
> Check to ensure that the path and ld_library_path on your linux box
> is being correctly set to point to the corresponding Linux OMPI libs.
> It looks like that isn't the case. Remember, the Java bindings are
> just that - they are bindings that wrap on top of the regular C
> code. Thus, the underlying OMPI system remains system-dependent,
> and you must have the appropriate native libraries installed on
> each machine.

I implemented a small program, which shows these values and they
are wrong for MPI, but I have no idea why. The two entries at the
beginning from PATH and LD_LIBRARY_PATH are not from our normal
environment, because I add these values at the end of the environment
variables PATH, LD_LIBRARY_PATH_32, and LD_LIBRARY_PATH_64. Afterwards
I set LD_LIBRARY_PATH to LD_LIBRARY_PATH_64 on a 64-bit Solaris
machine, to LD_LIBRARY_PATH_32 followed by LD_LIBRARY_PATH_64 on a
64-bit Linux machine, and to LD_LIBRARY_PATH_32 on every 32-bit
machine.

Now 1 slave tasks are sending their environment.

Environment from task 1:
  message type: 3
  msg length: 4622 characters
  message:
    hostname: tyr.informatik.hs-fulda.de
    operating system: SunOS
    release: 5.10
    processor: sun4u
    PATH
                       /usr/local/openmpi-1.9_64_cc/bin (!!!)
                       /usr/local/openmpi-1.9_64_cc/bin (!!!)
                       /usr/local/eclipse-3.6.1
                       ...
                       /usr/local/openmpi-1.9_64_cc/bin (<- from our environment)
    LD_LIBRARY_PATH_32
                       /usr/lib
                       /usr/local/jdk1.7.0_07/jre/lib/sparc
                       ...
                       /usr/local/openmpi-1.9_64_cc/lib (<- from our environment)
    LD_LIBRARY_PATH_64
                       /usr/lib/sparcv9
                       /usr/local/jdk1.7.0_07/jre/lib/sparcv9
                       ...
                       /usr/local/openmpi-1.9_64_cc/lib64 (<- from our environment)
    LD_LIBRARY_PATH
                       /usr/local/openmpi-1.9_64_cc/lib (!!!)
                       /usr/local/openmpi-1.9_64_cc/lib64 (!!!)
                       /usr/lib/sparcv9
                       /usr/local/jdk1.7.0_07/jre/lib/sparcv9
                       ...
                       /usr/local/openmpi-1.9_64_cc/lib64 (<- from our environment)
    CLASSPATH
                       /usr/local/junit4.10
                       /usr/local/junit4.10/junit-4.10.jar
                       //usr/local/jdk1.7.0_07/j3d/lib/ext/j3dcore.jar
                       //usr/local/jdk1.7.0_07/j3d/lib/ext/j3dutils.jar
                       //usr/local/jdk1.7.0_07/j3d/lib/ext/vecmath.jar
                       /usr/local/javacc-5.0/javacc.jar
                       .

Without MPI the program uses our environment.

tyr hello_1 147 diff env_with*
1,7c1
<
<
< Now 1 slave tasks are sending their environment.
<
< Environment from task 1:
< message type: 3
< msg length: 4622 characters

---
> Environment:
14,15d7
<                        /usr/local/openmpi-1.9_64_cc/bin
<                        /usr/local/openmpi-1.9_64_cc/bin
81,82d72
<                        /usr/local/openmpi-1.9_64_cc/lib
<                        /usr/local/openmpi-1.9_64_cc/lib64
tyr hello_1 148 
I have attached the programs so that you can check yourself and
hopefully get the same results. Do you modify PATH and LD_LIBRARY_PATH?
Kind regards
Siegmar
> > tyr java 110 mpiexec -np 3 -host tyr,sunpc4,linpc4 \
> >  java -cp $HOME/mpi_classfiles HelloMainWithBarrier
> > --------------------------------------------------------------------------
> > It looks like opal_init failed for some reason; your parallel process is
> > likely to abort.  There are many reasons that a parallel process can
> > fail during opal_init; some of which are due to configuration or
> > environment problems.  This failure appears to be an internal failure;
> > here's some additional information (which may only be relevant to an
> > Open MPI developer):
> > 
> >  mca_base_open failed
> >  --> Returned value -2 instead of OPAL_SUCCESS
> > --------------------------------------------------------------------------
> > --------------------------------------------------------------------------
> > It looks like orte_init failed for some reason; your parallel process is
> > likely to abort.  There are many reasons that a parallel process can
> > fail during orte_init; some of which are due to configuration or
> > environment problems.  This failure appears to be an internal failure;
> > here's some additional information (which may only be relevant to an
> > Open MPI developer):
> > 
> >  opal_init failed
> >  --> Returned value Out of resource (-2) instead of ORTE_SUCCESS
> > --------------------------------------------------------------------------
> > --------------------------------------------------------------------------
> > It looks like MPI_INIT failed for some reason; your parallel process is
> > likely to abort.  There are many reasons that a parallel process can
> > fail during MPI_INIT; some of which are due to configuration or environment
> > problems.  This failure appears to be an internal failure; here's some
> > additional information (which may only be relevant to an Open MPI
> > developer):
> > 
> >  ompi_mpi_init: orte_init failed
> >  --> Returned "Out of resource" (-2) instead of "Success" (0)
> > --------------------------------------------------------------------------
> > *** An error occurred in MPI_Init
> > *** on a NULL communicator
> > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> > ***    and potentially your MPI job)
> > [linpc4:27369] Local abort before MPI_INIT completed successfully;
> >  not able to aggregate error messages, and not able to guarantee
> >  that all other processes were killed!
> > -------------------------------------------------------
> > Primary job  terminated normally, but 1 process returned
> > a non-zero exit code.. Per user-direction, the job has been aborted.
> > -------------------------------------------------------
> > --------------------------------------------------------------------------
> > mpiexec detected that one or more processes exited with non-zero status,
> >  thus causing
> > the job to be terminated. The first process to do so was:
> > 
> >  Process name: [[21095,1],2]
> >  Exit code:    1
> > --------------------------------------------------------------------------
> > 
> > 
> > tyr java 111 which mpijavac
> > /usr/local/openmpi-1.9_32_cc/bin/mpijavac
> > tyr java 112 more /usr/local/openmpi-1.9_32_cc/bin/mpijavac
> > #!/usr/bin/env perl
> > 
> > # WARNING: DO NOT EDIT THE mpijava.pl FILE AS IT IS GENERATED!
> > #          MAKE ALL CHANGES IN mpijava.pl.in
> > 
> > # Copyright (c) 2011      Cisco Systems, Inc.  All rights reserved.
> > # Copyright (c) 2012      Oracle and/or its affiliates.  All rights reserved.
> > 
> > use strict;
> > 
> > # The main purpose of this wrapper compiler is to check for
> > # and adjust the Java class path to include the OMPI classes
> > # in mpi.jar. The user may have specified a class path on
> > # our cmd line, or it may be in the environment, so we have
> > # to check for both. We also need to be careful not to
> > # just override the class path as it probably includes classes
> > # they need for their application! It also may already include
> > # the path to mpi.jar, and while it doesn't hurt anything, we
> > # don't want to include our class path more than once to avoid
> > # user astonishment
> > 
> > # Let the build system provide us with some critical values
> > my $my_compiler = "/usr/local/jdk1.7.0_07/bin/javac";
> > my $ompi_classpath = "/usr/local/openmpi-1.9_32_cc/lib/mpi.jar";
> > 
> > # globals
> > my $showme_arg = 0;
> > my $verbose = 0;
> > my $my_arg;
> > ...
> > 
> > 
> > All libraries are available.
> > 
> > tyr java 113 ldd /usr/local/jdk1.7.0_07/bin/javac
> >        libthread.so.1 =>        /usr/lib/libthread.so.1
> >        libjli.so =>     
> > /export2/prog/SunOS_sparc/jdk1.7.0_07/bin/../jre/lib/sparc/jli/libjli.so
> >        libdl.so.1 =>    /usr/lib/libdl.so.1
> >        libc.so.1 =>     /usr/lib/libc.so.1
> >        libm.so.2 =>     /usr/lib/libm.so.2
> >        /platform/SUNW,A70/lib/libc_psr.so.1
> > tyr java 114 ssh sunpc4 ldd /usr/local/jdk1.7.0_07/bin/javac
> >        libthread.so.1 =>        /usr/lib/libthread.so.1
> >        libjli.so =>     
> > /usr/local/jdk1.7.0_07/bin/../jre/lib/i386/jli/libjli.so
> >        libdl.so.1 =>    /usr/lib/libdl.so.1
> >        libc.so.1 =>     /usr/lib/libc.so.1
> >        libm.so.2 =>     /usr/lib/libm.so.2
> > tyr java 115 ssh linpc4 ldd /usr/local/jdk1.7.0_07/bin/javac
> >        linux-gate.so.1 =>  (0xffffe000)
> >        libpthread.so.0 => /lib/libpthread.so.0 (0xf77b2000)
> >        libjli.so => /usr/local/jdk1.7.0_07/bin/../jre/lib/i386/jli/libjli.so 
> > (0xf779d000)
> >        libdl.so.2 => /lib/libdl.so.2 (0xf7798000)
> >        libc.so.6 => /lib/libc.so.6 (0xf762b000)
> >        /lib/ld-linux.so.2 (0xf77ce000)
> > 
> > 
> > I don't have any errors in the log files except the error for nfs.
> > 
> > tyr openmpi-1.9-Linux.x86_64.32_cc 136 ls log.*
> > log.configure.Linux.x86_64.32_cc   log.make-install.Linux.x86_64.32_cc
> > log.make-check.Linux.x86_64.32_cc  log.make.Linux.x86_64.32_cc
> > 
> > tyr openmpi-1.9-Linux.x86_64.32_cc 137 grep "Error 1" log.*
> > log.make-check.Linux.x86_64.32_cc:make[3]: *** [check-TESTS] Error 1
> > log.make-check.Linux.x86_64.32_cc:make[1]: *** [check-recursive] Error 1
> > log.make-check.Linux.x86_64.32_cc:make: *** [check-recursive] Error 1
> > 
> > ...
> > SUPPORT: OMPI Test failed: opal_path_nfs() (1 of 32 failed)
> > FAIL: opal_path_nfs
> > ========================================================
> > 1 of 2 tests failed
> > Please report to http://www.open-mpi.org/community/help/
> > ========================================================
> > make[3]: *** [check-TESTS] Error 1
> > ...
> > 
> > 
> > It doesn't help to build the class files on Linux (which should be
> > independent of the architecture anyway).
> > 
> > tyr java 131 ssh linpc4
> > linpc4 fd1026 98 cd .../prog/mpi/java
> > linpc4 java 99 make clean
> > rm -f /home/fd1026/mpi_classfiles/HelloMainWithBarrier.class \
> >  /home/fd1026/mpi_classfiles/HelloMainWithoutBarrier.class
> > linpc4 java 100 make
> > mpijavac -d /home/fd1026/mpi_classfiles HelloMainWithBarrier.java
> > mpijavac -d /home/fd1026/mpi_classfiles HelloMainWithoutBarrier.java
> > 
> > linpc4 java 101  mpiexec -np 3 -host linpc4 \
> >  java -cp $HOME/mpi_classfiles HelloMainWithBarrier
> > --------------------------------------------------------------------------
> > It looks like opal_init failed for some reason; your parallel process is
> > likely to abort.  There are many reasons that a parallel process can
> > fail during opal_init; some of which are due to configuration or
> > environment problems.  This failure appears to be an internal failure;
> > here's some additional information (which may only be relevant to an
> > Open MPI developer):
> > 
> >  mca_base_open failed
> >  --> Returned value -2 instead of OPAL_SUCCESS
> > ...
> > 
> > Has anybody else this problem as well? Do you know a solution?
> > Thank you very much for any help in advance.
> > 
> > 
> > Kind regards
> > 
> > Siegmar
> > 
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
>