Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] problem with 32-bit mpiJava on openmpi-1.9a1r27361
From: Siegmar Gross (Siegmar.Gross_at_[hidden])
Date: 2012-09-26 10:16:41


Hi,

> Does the behavior only occur with Java applications, as your subject
> implies? I thought this was a more general behavior based on prior notes?

It is a general problem as you can see in the older email below. I
didn't change the header because I detected this behaviour when I
tried out mpiJava.

> As I said back then, I have no earthly idea why your local machine is being
> ignored, and I cannot replicate that behavior on any system available to me.
>
> What you might try is adding --display-allocation --display-devel-map to
> your cmd line and see what the system thinks it is doing. The first option
> will display what nodes and slots it thinks are available to it, and the
> second will report where it thinks it placed everything.

tyr topo 244 mpiexec -np 3 -host tyr,sunpc4,linpc4 --display-allocation \
  --display-devel-map hostname

====================== ALLOCATED NODES ======================

 Data for node: tyr Launch id: -1 State: 2
        Daemon: [[3909,0],0] Daemon launched: True
        Num slots: 1 Slots in use: 0 Oversubscribed: FALSE
        Num slots allocated: 1 Max slots: 0
        Username on node: NULL
        Num procs: 0 Next node_rank: 0
 Data for node: sunpc4 Launch id: -1 State: 2
        Daemon: [[3909,0],1] Daemon launched: False
        Num slots: 1 Slots in use: 0 Oversubscribed: FALSE
        Num slots allocated: 1 Max slots: 0
        Username on node: NULL
        Num procs: 0 Next node_rank: 0
 Data for node: linpc4 Launch id: -1 State: 2
        Daemon: [[3909,0],2] Daemon launched: False
        Num slots: 1 Slots in use: 0 Oversubscribed: FALSE
        Num slots allocated: 1 Max slots: 0
        Username on node: NULL
        Num procs: 0 Next node_rank: 0

=================================================================

 Mapper requested: NULL Last mapper: round_robin Mapping policy: BYSLOT
   Ranking policy: SLOT Binding policy: NONE[NODE] Cpu set: NULL PPR: NULL
        Num new daemons: 0 New daemon starting vpid INVALID
        Num nodes: 2

 Data for node: sunpc4 Launch id: -1 State: 2
        Daemon: [[3909,0],1] Daemon launched: False
        Num slots: 1 Slots in use: 1 Oversubscribed: TRUE
        Num slots allocated: 1 Max slots: 0
        Username on node: NULL
        Num procs: 2 Next node_rank: 2
        Data for proc: [[3909,1],0]
                Pid: 0 Local rank: 0 Node rank: 0 App rank: 0
                State: INITIALIZED Restarts: 0 App_context: 0
                  Locale: 0-1 Binding: NULL[0]
        Data for proc: [[3909,1],1]
                Pid: 0 Local rank: 1 Node rank: 1 App rank: 1
                State: INITIALIZED Restarts: 0 App_context: 0
                  Locale: 0-1 Binding: NULL[0]

 Data for node: linpc4 Launch id: -1 State: 2
        Daemon: [[3909,0],2] Daemon launched: False
        Num slots: 1 Slots in use: 1 Oversubscribed: FALSE
        Num slots allocated: 1 Max slots: 0
        Username on node: NULL
        Num procs: 1 Next node_rank: 1
        Data for proc: [[3909,1],2]
                Pid: 0 Local rank: 0 Node rank: 0 App rank: 2
                State: INITIALIZED Restarts: 0 App_context: 0
                  Locale: 0-1 Binding: NULL[0]
linpc4
sunpc4.informatik.hs-fulda.de
sunpc4.informatik.hs-fulda.de

I get the following output for the command for openmpi-1.6.2.

tyr topo 109 mpiexec -np 3 -host tyr,sunpc4,linpc4 \
  --display-allocation --display-devel-map hostname

====================== ALLOCATED NODES ======================

 Data for node: tyr.informatik.hs-fulda.de Launch id: -1 State: 2
        Num boards: 1 Num sockets/board: 0 Num cores/socket: 0
        Daemon: [[4018,0],0] Daemon launched: True
        Num slots: 1 Slots in use: 0 Oversubscribed: FALSE
        Num slots allocated: 1 Max slots: 0
        Username on node: NULL
        Detected Resources:
        Num procs: 0 Next node_rank: 0
 Data for node: sunpc4 Launch id: -1 State: 2
        Num boards: 1 Num sockets/board: 0 Num cores/socket: 0
        Daemon: Not defined Daemon launched: False
        Num slots: 1 Slots in use: 0 Oversubscribed: FALSE
        Num slots allocated: 1 Max slots: 0
        Username on node: NULL
        Detected Resources:
        Num procs: 0 Next node_rank: 0
 Data for node: linpc4 Launch id: -1 State: 2
        Num boards: 1 Num sockets/board: 0 Num cores/socket: 0
        Daemon: Not defined Daemon launched: False
        Num slots: 1 Slots in use: 0 Oversubscribed: FALSE
        Num slots allocated: 1 Max slots: 0
        Username on node: NULL
        Detected Resources:
        Num procs: 0 Next node_rank: 0

=================================================================

 Map generated by mapping policy: 0400
        Npernode: 0 Oversubscribe allowed: TRUE CPU Lists: FALSE
        Num new daemons: 2 New daemon starting vpid 1
        Num nodes: 3

 Data for node: tyr.informatik.hs-fulda.de Launch id: -1 State: 2
        Num boards: 1 Num sockets/board: 0 Num cores/socket: 0
        Daemon: [[4018,0],0] Daemon launched: True
        Num slots: 1 Slots in use: 1 Oversubscribed: FALSE
        Num slots allocated: 1 Max slots: 0
        Username on node: NULL
        Detected Resources:
        Num procs: 1 Next node_rank: 1
        Data for proc: [[4018,1],0]
                Pid: 0 Local rank: 0 Node rank: 0
                State: 0 Restarts: 0 App_context: 0 Slot list: NULL

 Data for node: sunpc4 Launch id: -1 State: 2
        Num boards: 1 Num sockets/board: 0 Num cores/socket: 0
        Daemon: [[4018,0],1] Daemon launched: False
        Num slots: 1 Slots in use: 1 Oversubscribed: FALSE
        Num slots allocated: 1 Max slots: 0
        Username on node: NULL
        Detected Resources:
        Num procs: 1 Next node_rank: 1
        Data for proc: [[4018,1],1]
                Pid: 0 Local rank: 0 Node rank: 0
                State: 0 Restarts: 0 App_context: 0 Slot list: NULL

 Data for node: linpc4 Launch id: -1 State: 2
        Num boards: 1 Num sockets/board: 0 Num cores/socket: 0
        Daemon: [[4018,0],2] Daemon launched: False
        Num slots: 1 Slots in use: 1 Oversubscribed: FALSE
        Num slots allocated: 1 Max slots: 0
        Username on node: NULL
        Detected Resources:
        Num procs: 1 Next node_rank: 1
        Data for proc: [[4018,1],2]
                Pid: 0 Local rank: 0 Node rank: 0
                State: 0 Restarts: 0 App_context: 0 Slot list: NULL
linpc4
sunpc4.informatik.hs-fulda.de
tyr.informatik.hs-fulda.de

Is the above output helpful? Thank you very much for any help in advance.

Kind regards

Siegmar

> On Wed, Sep 26, 2012 at 4:53 AM, Siegmar Gross <
> Siegmar.Gross_at_[hidden]> wrote:
>
> > Hi,
> >
> > yesterday I have installed openmpi-1.9a1r27362 and I still have a
> > problem with "-host". My local machine will not be used, if I try
> > to start processes on three hosts.
> >
> > tyr: Solaris 10, Sparc
> > sunpc4: Solaris 10 , x86_64
> > linpc4: openSUSE-Linux 12.1, x86_64
> >
> >
> > tyr mpi_classfiles 175 javac HelloMainWithoutMPI.java
> > tyr mpi_classfiles 176 mpiexec -np 3 -host tyr,sunpc4,linpc4 \
> > java -cp $HOME/mpi_classfiles HelloMainWithoutMPI
> > Hello from linpc4.informatik.hs-fulda.de/193.174.26.225
> > Hello from sunpc4.informatik.hs-fulda.de/193.174.26.224
> > Hello from sunpc4.informatik.hs-fulda.de/193.174.26.224
> > tyr mpi_classfiles 177 which mpiexec
> > /usr/local/openmpi-1.9_64_cc/bin/mpiexec
> >
> >
> > Everything works fine with openmpi-1.6.2rc5r27346.
> >
> > tyr mpi_classfiles 108 javac HelloMainWithoutMPI.java
> > tyr mpi_classfiles 109 mpiexec -np 3 -host tyr,sunpc4,linpc4 \
> > java -cp $HOME/mpi_classfiles HelloMainWithoutMPI
> > Hello from linpc4.informatik.hs-fulda.de/193.174.26.225
> > Hello from sunpc4.informatik.hs-fulda.de/193.174.26.224
> > Hello from tyr.informatik.hs-fulda.de/193.174.24.39
> > tyr mpi_classfiles 110 which mpiexec
> > /usr/local/openmpi-1.6.2_64_cc/bin/mpiexec
> >
> >
> > In my opinion it is a problem with openmpi-1.9. I used the following
> > configure command for Sparc. The commands for the other platforms are
> > similar.
> >
> > ../openmpi-1.9a1r27362/configure --prefix=/usr/local/openmpi-1.9_64_cc \
> > --libdir=/usr/local/openmpi-1.9_64_cc/lib64 \
> > --with-jdk-bindir=/usr/local/jdk1.7.0_07/bin/sparcv9 \
> > --with-jdk-headers=/usr/local/jdk1.7.0_07/include \
> > JAVA_HOME=/usr/local/jdk1.7.0_07 \
> > LDFLAGS="-m64" \
> > CC="cc" CXX="CC" FC="f95" \
> > CFLAGS="-m64" CXXFLAGS="-m64 -library=stlport4" FCFLAGS="-m64" \
> > CPP="cpp" CXXCPP="cpp" \
> > CPPFLAGS="" CXXCPPFLAGS="" \
> > C_INCL_PATH="" C_INCLUDE_PATH="" CPLUS_INCLUDE_PATH="" \
> > OBJC_INCLUDE_PATH="" OPENMPI_HOME="" \
> > --enable-cxx-exceptions \
> > --enable-mpi-java \
> > --enable-heterogeneous \
> > --enable-opal-multi-threads \
> > --enable-mpi-thread-multiple \
> > --with-threads=posix \
> > --with-hwloc=internal \
> > --without-verbs \
> > --without-udapl \
> > --with-wrapper-cflags=-m64 \
> > --enable-debug \
> > |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc
> >
> > Can I provide anything to track the problem? Thank you very much for
> > any help in advance.
> >
> >
> > Kind regards
> >
> > Siegmar
> >
> >
> >
> > > >>> I tried mpiJava on a 32-bit installation of openmpi-1.9a1r27361.
> > > >>> Why doesn't "mpiexec" start a process on my local machine (it
> > > >>> is not a matter of Java, because I have the same behaviour when
> > > >>> I use "hostname")?
> > > >>>
> > > >>> tyr java 133 mpiexec -np 3 -host tyr,sunpc4,sunpc1 \
> > > >>> java -cp $HOME/mpi_classfiles HelloMainWithBarrier
> > > >>> Process 0 of 3 running on sunpc4.informatik.hs-fulda.de
> > > >>> Process 1 of 3 running on sunpc4.informatik.hs-fulda.de
> > > >>> Process 2 of 3 running on sunpc1
> > > >>> ...
> > > >>>
> > > >>> tyr small_prog 142 mpiexec -np 3 -host tyr,sunpc4,sunpc1 hostname
> > > >>> sunpc1
> > > >>> sunpc4.informatik.hs-fulda.de
> > > >>> sunpc4.informatik.hs-fulda.de
> > > >>>
> > > >>
> > > >> No idea - it works fine for me. Do you have an environmental
> > > >> variable, or something in your default MCA param file, that
> > > >> indicates "no_use_local"?
> > > >
> > > > I have only built and installed Open MPI and I have no param file.
> > > > I don't have a mca environment variable.
> > > >
> > > > tyr hello_1 136 grep local \
> > > > /usr/local/openmpi-1.9_64_cc/etc/openmpi-mca-params.conf
> > > > # $sysconf is a directory on a local disk, it is likely that changes
> > > > # component_path = /usr/local/lib/openmpi:~/my_openmpi_components
> > > >
> > > > tyr hello_1 143 env | grep -i mca
> > > > tyr hello_1 144
> > >
> > > No ideas - I can't make it behave that way :-(
> > >
> > > >
> > > >
> > > >>> The command breaks if I add a Linux machine.
> > > >>
> > > >> Check to ensure that the path and ld_library_path on your linux box
> > > >> is being correctly set to point to the corresponding Linux OMPI libs.
> > > >> It looks like that isn't the case. Remember, the Java bindings are
> > > >> just that - they are bindings that wrap on top of the regular C
> > > >> code. Thus, the underlying OMPI system remains system-dependent,
> > > >> and you must have the appropriate native libraries installed on
> > > >> each machine.
> > > >
> > > > I implemented a small program, which shows these values and they
> > > > are wrong for MPI, but I have no idea why. The two entries at the
> > > > beginning from PATH and LD_LIBRARY_PATH are not from our normal
> > > > environment, because I add these values at the end of the environment
> > > > variables PATH, LD_LIBRARY_PATH_32, and LD_LIBRARY_PATH_64. Afterwards
> > > > I set LD_LIBRARY_PATH to LD_LIBRARY_PATH_64 on a 64-bit Solaris
> > > > machine, to LD_LIBRARY_PATH_32 followed by LD_LIBRARY_PATH_64 on a
> > > > 64-bit Linux machine, and to LD_LIBRARY_PATH_32 on every 32-bit
> > > > machine.
> > > >
> > >
> > > I see the problem - our heterogeneous support could use some
> > improvement, but
> > it'll be awhile before I can get to it.
> > >
> > > What's happening is that we are picking up and propagating the prefix you
> > specified, prepending it to your path and ld_library_path. Did you by
> > chance
> > configure with --enable-orterun-prefix-by-default? Or specify --prefix on
> > your
> > cmd line? Otherwise, it shouldn't be doing this. For this purpose, you
> > cannot
> > use either of those options.
> > >
> > > Also, you'll need to add --enable-heterogeneous to your configure so the
> > MPI
> > layer builds the right support, and add --hetero-nodes to your cmd line.
> > >
> > >
> > > >
> > > > Now 1 slave tasks are sending their environment.
> > > >
> > > > Environment from task 1:
> > > > message type: 3
> > > > msg length: 4622 characters
> > > > message:
> > > > hostname: tyr.informatik.hs-fulda.de
> > > > operating system: SunOS
> > > > release: 5.10
> > > > processor: sun4u
> > > > PATH
> > > > /usr/local/openmpi-1.9_64_cc/bin (!!!)
> > > > /usr/local/openmpi-1.9_64_cc/bin (!!!)
> > > > /usr/local/eclipse-3.6.1
> > > > ...
> > > > /usr/local/openmpi-1.9_64_cc/bin (<- from our
> > environment)
> > > > LD_LIBRARY_PATH_32
> > > > /usr/lib
> > > > /usr/local/jdk1.7.0_07/jre/lib/sparc
> > > > ...
> > > > /usr/local/openmpi-1.9_64_cc/lib (<- from our
> > environment)
> > > > LD_LIBRARY_PATH_64
> > > > /usr/lib/sparcv9
> > > > /usr/local/jdk1.7.0_07/jre/lib/sparcv9
> > > > ...
> > > > /usr/local/openmpi-1.9_64_cc/lib64 (<- from our
> > environment)
> > > > LD_LIBRARY_PATH
> > > > /usr/local/openmpi-1.9_64_cc/lib (!!!)
> > > > /usr/local/openmpi-1.9_64_cc/lib64 (!!!)
> > > > /usr/lib/sparcv9
> > > > /usr/local/jdk1.7.0_07/jre/lib/sparcv9
> > > > ...
> > > > /usr/local/openmpi-1.9_64_cc/lib64 (<- from our
> > environment)
> > > > CLASSPATH
> > > > /usr/local/junit4.10
> > > > /usr/local/junit4.10/junit-4.10.jar
> > > > //usr/local/jdk1.7.0_07/j3d/lib/ext/j3dcore.jar
> > > > //usr/local/jdk1.7.0_07/j3d/lib/ext/j3dutils.jar
> > > > //usr/local/jdk1.7.0_07/j3d/lib/ext/vecmath.jar
> > > > /usr/local/javacc-5.0/javacc.jar
> > > > .
> > > >
> > > >
> > > > Without MPI the program uses our environment.
> > > >
> > > > tyr hello_1 147 diff env_with*
> > > > 1,7c1
> > > > <
> > > > <
> > > > < Now 1 slave tasks are sending their environment.
> > > > <
> > > > < Environment from task 1:
> > > > < message type: 3
> > > > < msg length: 4622 characters
> > > > ---
> > > >> Environment:
> > > > 14,15d7
> > > > < /usr/local/openmpi-1.9_64_cc/bin
> > > > < /usr/local/openmpi-1.9_64_cc/bin
> > > > 81,82d72
> > > > < /usr/local/openmpi-1.9_64_cc/lib
> > > > < /usr/local/openmpi-1.9_64_cc/lib64
> > > > tyr hello_1 148
> > > >
> > > >
> > > > I have attached the programs so that you can check yourself and
> > > > hopefully get the same results. Do you modify PATH and LD_LIBRARY_PATH?
> > > >
> > > >
> > > > Kind regards
> > > >
> > > > Siegmar
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >>> tyr java 110 mpiexec -np 3 -host tyr,sunpc4,linpc4 \
> > > >>> java -cp $HOME/mpi_classfiles HelloMainWithBarrier
> > > >>>
> > --------------------------------------------------------------------------
> > > >>> It looks like opal_init failed for some reason; your parallel
> > process is
> > > >>> likely to abort. There are many reasons that a parallel process can
> > > >>> fail during opal_init; some of which are due to configuration or
> > > >>> environment problems. This failure appears to be an internal
> > failure;
> > > >>> here's some additional information (which may only be relevant to an
> > > >>> Open MPI developer):
> > > >>>
> > > >>> mca_base_open failed
> > > >>> --> Returned value -2 instead of OPAL_SUCCESS
> > > >>>
> > --------------------------------------------------------------------------
> > > >>>
> > --------------------------------------------------------------------------
> > > >>> It looks like orte_init failed for some reason; your parallel
> > process is
> > > >>> likely to abort. There are many reasons that a parallel process can
> > > >>> fail during orte_init; some of which are due to configuration or
> > > >>> environment problems. This failure appears to be an internal
> > failure;
> > > >>> here's some additional information (which may only be relevant to an
> > > >>> Open MPI developer):
> > > >>>
> > > >>> opal_init failed
> > > >>> --> Returned value Out of resource (-2) instead of ORTE_SUCCESS
> > > >>>
> > --------------------------------------------------------------------------
> > > >>>
> > --------------------------------------------------------------------------
> > > >>> It looks like MPI_INIT failed for some reason; your parallel process
> > is
> > > >>> likely to abort. There are many reasons that a parallel process can
> > > >>> fail during MPI_INIT; some of which are due to configuration or
> > environment
> > > >>> problems. This failure appears to be an internal failure; here's
> > some
> > > >>> additional information (which may only be relevant to an Open MPI
> > > >>> developer):
> > > >>>
> > > >>> ompi_mpi_init: orte_init failed
> > > >>> --> Returned "Out of resource" (-2) instead of "Success" (0)
> > > >>>
> > --------------------------------------------------------------------------
> > > >>> *** An error occurred in MPI_Init
> > > >>> *** on a NULL communicator
> > > >>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now
> > abort,
> > > >>> *** and potentially your MPI job)
> > > >>> [linpc4:27369] Local abort before MPI_INIT completed successfully;
> > > >>> not able to aggregate error messages, and not able to guarantee
> > > >>> that all other processes were killed!
> > > >>> -------------------------------------------------------
> > > >>> Primary job terminated normally, but 1 process returned
> > > >>> a non-zero exit code.. Per user-direction, the job has been aborted.
> > > >>> -------------------------------------------------------
> > > >>>
> > --------------------------------------------------------------------------
> > > >>> mpiexec detected that one or more processes exited with non-zero
> > status,
> > > >>> thus causing
> > > >>> the job to be terminated. The first process to do so was:
> > > >>>
> > > >>> Process name: [[21095,1],2]
> > > >>> Exit code: 1
> > > >>>
> > --------------------------------------------------------------------------
> > > >>>
> > > >>>
> > > >>> tyr java 111 which mpijavac
> > > >>> /usr/local/openmpi-1.9_32_cc/bin/mpijavac
> > > >>> tyr java 112 more /usr/local/openmpi-1.9_32_cc/bin/mpijavac
> > > >>> #!/usr/bin/env perl
> > > >>>
> > > >>> # WARNING: DO NOT EDIT THE mpijava.pl FILE AS IT IS GENERATED!
> > > >>> # MAKE ALL CHANGES IN mpijava.pl.in
> > > >>>
> > > >>> # Copyright (c) 2011 Cisco Systems, Inc. All rights reserved.
> > > >>> # Copyright (c) 2012 Oracle and/or its affiliates. All rights
> > reserved.
> > > >>>
> > > >>> use strict;
> > > >>>
> > > >>> # The main purpose of this wrapper compiler is to check for
> > > >>> # and adjust the Java class path to include the OMPI classes
> > > >>> # in mpi.jar. The user may have specified a class path on
> > > >>> # our cmd line, or it may be in the environment, so we have
> > > >>> # to check for both. We also need to be careful not to
> > > >>> # just override the class path as it probably includes classes
> > > >>> # they need for their application! It also may already include
> > > >>> # the path to mpi.jar, and while it doesn't hurt anything, we
> > > >>> # don't want to include our class path more than once to avoid
> > > >>> # user astonishment
> > > >>>
> > > >>> # Let the build system provide us with some critical values
> > > >>> my $my_compiler = "/usr/local/jdk1.7.0_07/bin/javac";
> > > >>> my $ompi_classpath = "/usr/local/openmpi-1.9_32_cc/lib/mpi.jar";
> > > >>>
> > > >>> # globals
> > > >>> my $showme_arg = 0;
> > > >>> my $verbose = 0;
> > > >>> my $my_arg;
> > > >>> ...
> > > >>>
> > > >>>
> > > >>> All libraries are available.
> > > >>>
> > > >>> tyr java 113 ldd /usr/local/jdk1.7.0_07/bin/javac
> > > >>> libthread.so.1 => /usr/lib/libthread.so.1
> > > >>> libjli.so =>
> > > >>>
> > /export2/prog/SunOS_sparc/jdk1.7.0_07/bin/../jre/lib/sparc/jli/libjli.so
> > > >>> libdl.so.1 => /usr/lib/libdl.so.1
> > > >>> libc.so.1 => /usr/lib/libc.so.1
> > > >>> libm.so.2 => /usr/lib/libm.so.2
> > > >>> /platform/SUNW,A70/lib/libc_psr.so.1
> > > >>> tyr java 114 ssh sunpc4 ldd /usr/local/jdk1.7.0_07/bin/javac
> > > >>> libthread.so.1 => /usr/lib/libthread.so.1
> > > >>> libjli.so =>
> > > >>> /usr/local/jdk1.7.0_07/bin/../jre/lib/i386/jli/libjli.so
> > > >>> libdl.so.1 => /usr/lib/libdl.so.1
> > > >>> libc.so.1 => /usr/lib/libc.so.1
> > > >>> libm.so.2 => /usr/lib/libm.so.2
> > > >>> tyr java 115 ssh linpc4 ldd /usr/local/jdk1.7.0_07/bin/javac
> > > >>> linux-gate.so.1 => (0xffffe000)
> > > >>> libpthread.so.0 => /lib/libpthread.so.0 (0xf77b2000)
> > > >>> libjli.so =>
> > /usr/local/jdk1.7.0_07/bin/../jre/lib/i386/jli/libjli.so
> > > >>> (0xf779d000)
> > > >>> libdl.so.2 => /lib/libdl.so.2 (0xf7798000)
> > > >>> libc.so.6 => /lib/libc.so.6 (0xf762b000)
> > > >>> /lib/ld-linux.so.2 (0xf77ce000)
> > > >>>
> > > >>>
> > > >>> I don't have any errors in the log files except the error for nfs.
> > > >>>
> > > >>> tyr openmpi-1.9-Linux.x86_64.32_cc 136 ls log.*
> > > >>> log.configure.Linux.x86_64.32_cc
> > log.make-install.Linux.x86_64.32_cc
> > > >>> log.make-check.Linux.x86_64.32_cc log.make.Linux.x86_64.32_cc
> > > >>>
> > > >>> tyr openmpi-1.9-Linux.x86_64.32_cc 137 grep "Error 1" log.*
> > > >>> log.make-check.Linux.x86_64.32_cc:make[3]: *** [check-TESTS] Error 1
> > > >>> log.make-check.Linux.x86_64.32_cc:make[1]: *** [check-recursive]
> > Error 1
> > > >>> log.make-check.Linux.x86_64.32_cc:make: *** [check-recursive] Error 1
> > > >>>
> > > >>> ...
> > > >>> SUPPORT: OMPI Test failed: opal_path_nfs() (1 of 32 failed)
> > > >>> FAIL: opal_path_nfs
> > > >>> ========================================================
> > > >>> 1 of 2 tests failed
> > > >>> Please report to http://www.open-mpi.org/community/help/
> > > >>> ========================================================
> > > >>> make[3]: *** [check-TESTS] Error 1
> > > >>> ...
> > > >>>
> > > >>>
> > > >>> It doesn't help to build the class files on Linux (which should be
> > > >>> independent of the architecture anyway).
> > > >>>
> > > >>> tyr java 131 ssh linpc4
> > > >>> linpc4 fd1026 98 cd .../prog/mpi/java
> > > >>> linpc4 java 99 make clean
> > > >>> rm -f /home/fd1026/mpi_classfiles/HelloMainWithBarrier.class \
> > > >>> /home/fd1026/mpi_classfiles/HelloMainWithoutBarrier.class
> > > >>> linpc4 java 100 make
> > > >>> mpijavac -d /home/fd1026/mpi_classfiles HelloMainWithBarrier.java
> > > >>> mpijavac -d /home/fd1026/mpi_classfiles HelloMainWithoutBarrier.java
> > > >>>
> > > >>> linpc4 java 101 mpiexec -np 3 -host linpc4 \
> > > >>> java -cp $HOME/mpi_classfiles HelloMainWithBarrier
> > > >>>
> > --------------------------------------------------------------------------
> > > >>> It looks like opal_init failed for some reason; your parallel
> > process is
> > > >>> likely to abort. There are many reasons that a parallel process can
> > > >>> fail during opal_init; some of which are due to configuration or
> > > >>> environment problems. This failure appears to be an internal
> > failure;
> > > >>> here's some additional information (which may only be relevant to an
> > > >>> Open MPI developer):
> > > >>>
> > > >>> mca_base_open failed
> > > >>> --> Returned value -2 instead of OPAL_SUCCESS
> > > >>> ...
> > > >>>
> > > >>> Has anybody else this problem as well? Do you know a solution?
> > > >>> Thank you very much for any help in advance.
> > > >>>
> > > >>>
> > > >>> Kind regards
> > > >>>
> > > >>> Siegmar
> > > >>>
> > > >>> _______________________________________________
> > > >>> users mailing list
> > > >>> users_at_[hidden]
> > > >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > >>
> > > >>
> > > > /* A small MPI program, which delivers some information about its
> > > > * machine, operating system, and some environment variables.
> > > > *
> > > > *
> > > > * Compiling:
> > > > * Store executable(s) into local directory.
> > > > * mpicc -o <program name> <source code file name>
> > > > *
> > > > * Store executable(s) into predefined directories.
> > > > * make
> > > > *
> > > > * Make program(s) automatically on all specified hosts. You must
> > > > * edit the file "make_compile" and specify your host names before
> > > > * you execute it.
> > > > * make_compile
> > > > *
> > > > * Running:
> > > > * LAM-MPI:
> > > > * mpiexec -boot -np <number of processes> <program name>
> > > > * or
> > > > * mpiexec -boot \
> > > > * -host <hostname> -np <number of processes> <program name> : \
> > > > * -host <hostname> -np <number of processes> <program name>
> > > > * or
> > > > * mpiexec -boot [-v] -configfile <application file>
> > > > * or
> > > > * lamboot [-v] [<host file>]
> > > > * mpiexec -np <number of processes> <program name>
> > > > * or
> > > > * mpiexec [-v] -configfile <application file>
> > > > * lamhalt
> > > > *
> > > > * OpenMPI:
> > > > * "host1", "host2", and so on can all have the same name,
> > > > * if you want to start a virtual computer with some virtual
> > > > * cpu's on the local host. The name "localhost" is allowed
> > > > * as well.
> > > > *
> > > > * mpiexec -np <number of processes> <program name>
> > > > * or
> > > > * mpiexec --host <host1,host2,...> \
> > > > * -np <number of processes> <program name>
> > > > * or
> > > > * mpiexec -hostfile <hostfile name> \
> > > > * -np <number of processes> <program name>
> > > > * or
> > > > * mpiexec -app <application file>
> > > > *
> > > > * Cleaning:
> > > > * local computer:
> > > > * rm <program name>
> > > > * or
> > > > * make clean_all
> > > > * on all specified computers (you must edit the file "make_clean_all"
> > > > * and specify your host names before you execute it.
> > > > * make_clean_all
> > > > *
> > > > *
> > > > * File: environ_mpi.c Author: S. Gross
> > > > * Date: 25.09.2012
> > > > *
> > > > */
> > > >
> > > > #include <stdio.h>
> > > > #include <stdlib.h>
> > > > #include <string.h>
> > > > #include <unistd.h>
> > > > #include <sys/utsname.h>
> > > > #include "mpi.h"
> > > >
> > > > #define BUF_SIZE 8192 /* message buffer size
> > */
> > > > #define MAX_TASKS 12 /* max. number of tasks
> > */
> > > > #define SENDTAG 1 /* send message command
> > */
> > > > #define EXITTAG 2 /* termination command
> > */
> > > > #define MSGTAG 3 /* normal message token
> > */
> > > >
> > > > #define ENTASKS -1 /* error: too many tasks
> > */
> > > >
> > > > static void master (void);
> > > > static void slave (void);
> > > >
> > > > int main (int argc, char *argv[])
> > > > {
> > > > int mytid, /* my task id
> > */
> > > > ntasks; /* number of parallel tasks
> > */
> > > >
> > > > MPI_Init (&argc, &argv);
> > > > MPI_Comm_rank (MPI_COMM_WORLD, &mytid);
> > > > MPI_Comm_size (MPI_COMM_WORLD, &ntasks);
> > > >
> > > > if (mytid == 0)
> > > > {
> > > > master ();
> > > > }
> > > > else
> > > > {
> > > > slave ();
> > > > }
> > > > MPI_Finalize ();
> > > > return EXIT_SUCCESS;
> > > > }
> > > >
> > > >
> > > > /* Function for the "master task". The master sends a request to all
> > > > * slaves asking for a message. After receiving and printing the
> > > > * messages he sends all slaves a termination command.
> > > > *
> > > > * input parameters: not necessary
> > > > * output parameters: not available
> > > > * return value: nothing
> > > > * side effects: no side effects
> > > > *
> > > > */
> > > > void master (void)
> > > > {
> > > > int ntasks, /* number of parallel tasks
> > */
> > > > mytid, /* my task id */
> > > > num, /* number of entries */
> > > > i; /* loop variable */
> > > > char buf[BUF_SIZE + 1]; /* message buffer (+1 for
> > '\0')
> > */
> > > > MPI_Status stat; /* message details */
> > > >
> > > > MPI_Comm_rank (MPI_COMM_WORLD, &mytid);
> > > > MPI_Comm_size (MPI_COMM_WORLD, &ntasks);
> > > > if (ntasks > MAX_TASKS)
> > > > {
> > > > fprintf (stderr, "Error: Too many tasks. Try again with at most "
> > > > "%d tasks.\n", MAX_TASKS);
> > > > /* terminate all slave tasks */
> > > > for (i = 1; i < ntasks; ++i)
> > > > {
> > > > MPI_Send ((char *) NULL, 0, MPI_CHAR, i, EXITTAG, MPI_COMM_WORLD);
> > > > }
> > > > MPI_Finalize ();
> > > > exit (ENTASKS);
> > > > }
> > > > printf ("\n\nNow %d slave tasks are sending their environment.\n\n",
> > > > ntasks - 1);
> > > > /* request messages from slave tasks
> > */
> > > > for (i = 1; i < ntasks; ++i)
> > > > {
> > > > MPI_Send ((char *) NULL, 0, MPI_CHAR, i, SENDTAG, MPI_COMM_WORLD);
> > > > }
> > > > /* wait for messages and print greetings
> > */
> > > > for (i = 1; i < ntasks; ++i)
> > > > {
> > > > MPI_Recv (buf, BUF_SIZE, MPI_CHAR, MPI_ANY_SOURCE,
> > > > MPI_ANY_TAG, MPI_COMM_WORLD, &stat);
> > > > MPI_Get_count (&stat, MPI_CHAR, &num);
> > > > buf[num] = '\0'; /* add missing end-of-string */
> > > > printf ("Environment from task %d:\n"
> > > > " message type: %d\n"
> > > > " msg length: %d characters\n"
> > > > " message: %s\n\n",
> > > > stat.MPI_SOURCE, stat.MPI_TAG, num, buf);
> > > > }
> > > > /* terminate all slave tasks
> > */
> > > > for (i = 1; i < ntasks; ++i)
> > > > {
> > > > MPI_Send ((char *) NULL, 0, MPI_CHAR, i, EXITTAG, MPI_COMM_WORLD);
> > > > }
> > > > }
> > > >
> > > >
> > > > /* Function for "slave tasks". The slave task sends its hostname,
> > > > * operating system name and release, and processor architecture
> > > > * as a message to the master.
> > > > *
> > > > * input parameters: not necessary
> > > > * output parameters: not available
> > > > * return value: nothing
> > > > * side effects: no side effects
> > > > *
> > > > */
> > > > void slave (void)
> > > > {
> > > > struct utsname sys_info; /* system information */
> > > > int mytid, /* my task id
> > */
> > > > num_env_vars, /* # of environment variables */
> > > > i, /* loop variable */
> > > > more_to_do;
> > > > char buf[BUF_SIZE], /* message buffer
> > */
> > > > *env_vars[] = {"PATH",
> > > > "LD_LIBRARY_PATH_32",
> > > > "LD_LIBRARY_PATH_64",
> > > > "LD_LIBRARY_PATH",
> > > > "CLASSPATH"};
> > > > MPI_Status stat; /* message details */
> > > >
> > > > MPI_Comm_rank (MPI_COMM_WORLD, &mytid);
> > > > num_env_vars = sizeof (env_vars) / sizeof (env_vars[0]);
> > > > more_to_do = 1;
> > > > while (more_to_do == 1)
> > > > {
> > > > /* wait for a message from the master task
> > */
> > > > MPI_Recv (buf, BUF_SIZE, MPI_CHAR, 0, MPI_ANY_TAG,
> > > > MPI_COMM_WORLD, &stat);
> > > > if (stat.MPI_TAG != EXITTAG)
> > > > {
> > > > uname (&sys_info);
> > > > strcpy (buf, "\n hostname: ");
> > > > strncpy (buf + strlen (buf), sys_info.nodename,
> > > > BUF_SIZE - strlen (buf));
> > > > strncpy (buf + strlen (buf), "\n operating system: ",
> > > > BUF_SIZE - strlen (buf));
> > > > strncpy (buf + strlen (buf), sys_info.sysname,
> > > > BUF_SIZE - strlen (buf));
> > > > strncpy (buf + strlen (buf), "\n release: ",
> > > > BUF_SIZE - strlen (buf));
> > > > strncpy (buf + strlen (buf), sys_info.release,
> > > > BUF_SIZE - strlen (buf));
> > > > strncpy (buf + strlen (buf), "\n processor: ",
> > > > BUF_SIZE - strlen (buf));
> > > > strncpy (buf + strlen (buf), sys_info.machine,
> > > > BUF_SIZE - strlen (buf));
> > > > for (i = 0; i < num_env_vars; ++i)
> > > > {
> > > > char *env_val, /* pointer to environment value */
> > > > *delimiter = ":" , /* field delimiter for "strtok" */
> > > > *next_tok; /* next token */
> > > >
> > > > env_val = getenv (env_vars[i]);
> > > > if (env_val != NULL)
> > > > {
> > > > if ((strlen (buf) + strlen (env_vars[i]) + 6) < BUF_SIZE)
> > > > {
> > > > strncpy (buf + strlen (buf), "\n ",
> > > > BUF_SIZE - strlen (buf));
> > > > strncpy (buf + strlen (buf), env_vars[i],
> > > > BUF_SIZE - strlen (buf));
> > > > }
> > > > else
> > > > {
> > > > fprintf (stderr, "Buffer too small. Couldn't add \"%s\"."
> > > > "\n\n", env_vars[i]);
> > > > }
> > > > /* Get first token in "env_val". "strtok" skips all
> > > > * characters that are contained in the current delimiter
> > > > * string. If it finds a character which is not contained
> > > > * in the delimiter string, it is the start of the first
> > > > * token. Now it searches for the next character which is
> > > > * part of the delimiter string. If it finds one it will
> > > > * overwrite it by a '\0' to terminate the first token.
> > > > * Otherwise the token extends to the end of the string.
> > > > * Subsequent calls of "strtok" use a NULL pointer as first
> > > > * argument and start searching from the saved position
> > > > * after the last token. "strtok" returns NULL if it
> > > > * couldn't find a token.
> > > > */
> > > > next_tok = strtok (env_val, delimiter);
> > > > while (next_tok != NULL)
> > > > {
> > > > if ((strlen (buf) + strlen (next_tok) + 25) < BUF_SIZE)
> > > > {
> > > > strncpy (buf + strlen (buf), "\n ",
> > > > BUF_SIZE - strlen (buf));
> > > > strncpy (buf + strlen (buf), next_tok,
> > > > BUF_SIZE - strlen (buf));
> > > > }
> > > > else
> > > > {
> > > > fprintf (stderr, "Buffer too small. Couldn't add \"%s\" "
> > > > "to %s.\n\n", next_tok, env_vars[i]);
> > > > }
> > > > /* get next token */
> > > > next_tok = strtok (NULL, delimiter);
> > > > }
> > > > }
> > > > }
> > > > MPI_Send (buf, strlen (buf), MPI_CHAR, stat.MPI_SOURCE,
> > > > MSGTAG, MPI_COMM_WORLD);
> > > > }
> > > > else
> > > > {
> > > > more_to_do = 0; /* terminate
> > */
> > > > }
> > > > }
> > > > }
> > > > /* A small program, which delivers some information about its
> > > > * machine, operating system, and some environment variables.
> > > > *
> > > > *
> > > > * Compiling:
> > > > * Store executable(s) into local directory.
> > > > * (g)cc -o environ_without_mpi environ_without_mpi.c
> > > > *
> > > > * Running:
> > > > * environ_without_mpi
> > > > *
> > > > *
> > > > * File: environ_without_mpi.c Author: S. Gross
> > > > * Date: 25.09.2012
> > > > *
> > > > */
> > > >
> > > > #include <stdio.h>
> > > > #include <stdlib.h>
> > > > #include <string.h>
> > > > #include <unistd.h>
> > > > #include <sys/utsname.h>
> > > >
> > > > #define BUF_SIZE 8192 /* message buffer size
> > */
> > > >
> > > > int main (int argc, char *argv[])
> > > > {
> > > > struct utsname sys_info; /* system information */
> > > > int num_env_vars, /* # of environment
> > variables
> > */
> > > > i; /* loop variable */
> > > > char buf[BUF_SIZE], /* message buffer
> > */
> > > > *env_vars[] = {"PATH",
> > > > "LD_LIBRARY_PATH_32",
> > > > "LD_LIBRARY_PATH_64",
> > > > "LD_LIBRARY_PATH",
> > > > "CLASSPATH"};
> > > >
> > > > num_env_vars = sizeof (env_vars) / sizeof (env_vars[0]);
> > > > uname (&sys_info);
> > > > strcpy (buf, "\n hostname: ");
> > > > strncpy (buf + strlen (buf), sys_info.nodename,
> > > > BUF_SIZE - strlen (buf));
> > > > strncpy (buf + strlen (buf), "\n operating system: ",
> > > > BUF_SIZE - strlen (buf));
> > > > strncpy (buf + strlen (buf), sys_info.sysname,
> > > > BUF_SIZE - strlen (buf));
> > > > strncpy (buf + strlen (buf), "\n release: ",
> > > > BUF_SIZE - strlen (buf));
> > > > strncpy (buf + strlen (buf), sys_info.release,
> > > > BUF_SIZE - strlen (buf));
> > > > strncpy (buf + strlen (buf), "\n processor: ",
> > > > BUF_SIZE - strlen (buf));
> > > > strncpy (buf + strlen (buf), sys_info.machine,
> > > > BUF_SIZE - strlen (buf));
> > > > for (i = 0; i < num_env_vars; ++i)
> > > > {
> > > > char *env_val, /* pointer to environment value */
> > > > *delimiter = ":" , /* field delimiter for "strtok" */
> > > > *next_tok; /* next token */
> > > >
> > > > env_val = getenv (env_vars[i]);
> > > > if (env_val != NULL)
> > > > {
> > > > if ((strlen (buf) + strlen (env_vars[i]) + 6) < BUF_SIZE)
> > > > {
> > > > strncpy (buf + strlen (buf), "\n ",
> > > > BUF_SIZE - strlen (buf));
> > > > strncpy (buf + strlen (buf), env_vars[i],
> > > > BUF_SIZE - strlen (buf));
> > > > }
> > > > else
> > > > {
> > > > fprintf (stderr, "Buffer too small. Couldn't add \"%s\"."
> > > > "\n\n", env_vars[i]);
> > > > }
> > > > /* Get first token in "env_val". "strtok" skips all
> > > > * characters that are contained in the current delimiter
> > > > * string. If it finds a character which is not contained
> > > > * in the delimiter string, it is the start of the first
> > > > * token. Now it searches for the next character which is
> > > > * part of the delimiter string. If it finds one it will
> > > > * overwrite it by a '\0' to terminate the first token.
> > > > * Otherwise the token extends to the end of the string.
> > > > * Subsequent calls of "strtok" use a NULL pointer as first
> > > > * argument and start searching from the saved position
> > > > * after the last token. "strtok" returns NULL if it
> > > > * couldn't find a token.
> > > > */
> > > > next_tok = strtok (env_val, delimiter);
> > > > while (next_tok != NULL)
> > > > {
> > > > if ((strlen (buf) + strlen (next_tok) + 25) < BUF_SIZE)
> > > > {
> > > > strncpy (buf + strlen (buf), "\n ",
> > > > BUF_SIZE - strlen (buf));
> > > > strncpy (buf + strlen (buf), next_tok,
> > > > BUF_SIZE - strlen (buf));
> > > > }
> > > > else
> > > > {
> > > > fprintf (stderr, "Buffer too small. Couldn't add \"%s\" "
> > > > "to %s.\n\n", next_tok, env_vars[i]);
> > > > }
> > > > /* get next token */
> > > > next_tok = strtok (NULL, delimiter);
> > > > }
> > > > }
> > > > }
> > > > printf ("Environment:\n"
> > > > " message: %s\n\n", buf);
> > > > return EXIT_SUCCESS;
> > > > }
> > >
> > >
> >
> >