Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] broadcasting basic data items in Java
From: Siegmar Gross (Siegmar.Gross_at_[hidden])
Date: 2012-12-22 02:53:24


Hi

> Interesting. My best guess is that the OMPI libraries aren't being
> found, though I'm a little surprised because the error message
> indicates an inability to malloc - but it's possible the message
> isn't accurate.
>
> One thing stands out - I see you compiled your program with "javac".
> I suspect that is the source of the trouble - you really need to use
> the Java wrapper compiler "mpijavac" so all the libs get absorbed
> and/or linked correctly.

No, I only compiled the first two programs (which don't use any MPI
methods) with javac. The MPI program "InitFinalizeMain.java" was
compiled with "mpijavac" (I use a script file and GNUmakefile).

linpc1 java 102 make_classfiles
...
=========== linpc1 ===========
Warning: untrusted X11 forwarding setup failed: xauth key data not generated
Warning: No xauth data; using fake authentication data for X11 forwarding.
mpijavac -d /home/fd1026/Linux/x86_64/mpi_classfiles InitFinalizeMain.java
...

The other programs work also if I compile them with "mpijavac"

linpc1 prog 107 mpijavac MemAllocMain.java
linpc1 prog 108 mpiexec java -cp `pwd` MemAllocMain
Type something ("quit" terminates program): dgdas
Received input: dgdas
Converted to upper case: DGDAS
Type something ("quit" terminates program): quit
Received input: quit
Converted to upper case: QUIT
linpc1 prog 109

My environment should be valid as well. LD_LIBRARY_PATH contains
first the directories for 32 bit libraries and then the directories
for 64 bit libraries. I have split the long lines for the PATH
variables so that they are easier to read.

linpc1 java 111 mpiexec java EnvironVarMain

Operating system: Linux Processor architecture: x86_64

  CLASSPATH:
/usr/local/junit4.10:
/usr/local/junit4.10/junit-4.10.jar:
//usr/local/jdk1.7.0_07-64/j3d/lib/ext/j3dcore.jar:
//usr/local/jdk1.7.0_07-64/j3d/lib/ext/j3dutils.jar:
//usr/local/jdk1.7.0_07-64/j3d/lib/ext/vecmath.jar:
/usr/local/javacc-5.0/javacc.jar:
.:
/home/fd1026/Linux/x86_64/mpi_classfiles

  LD_LIBRARY_PATH:
/usr/lib:
...
/usr/lib64:
/usr/local/jdk1.7.0_07-64/jre/lib/amd64:
/usr/local/gcc-4.7.1/lib64:
/usr/local/gcc-4.7.1/libexec/gcc/x86_64-unknown-linux-gnu/4.7.1:
/usr/local/gcc-4.7.1/lib/gcc/x86_64-unknown-linux-gnu/4.7.1:
/usr/local/lib64:
/usr/local/ssl/lib64:
/usr/lib64:
/usr/X11R6/lib64:
/usr/local/openmpi-1.9_64_cc/lib64:
/home/fd1026/Linux/x86_64/lib64
linpc1 java 112

Can I provide any other information to solve this problem?

Kind regards

Siegmar

> On Dec 21, 2012, at 9:46 AM, Siegmar Gross <Siegmar.Gross_at_[hidden]> wrote:
>
> > Hi
> >
> >> Hmmm...weird. Well, it looks like OMPI itself is okay, so the issue
> >> appears to be in the Java side of things. For whatever reason, your
> >> Java VM is refusing to allow a malloc to succeed. I suspect it has
> >> something to do with its setup, but I'm not enough of a Java person
> >> to point you to the problem.
> >>
> >> Is it possible that the program was compiled against a different
> >> (perhaps incompatible) version of Java?
> >
> > No, I don't think so. A small Java program without MPI methods works.
> >
> > linpc1 bin 122 which mpicc
> > /usr/local/openmpi-1.9_64_cc/bin/mpicc
> > linpc1 bin 123 pwd
> > /usr/local/openmpi-1.9_64_cc/bin
> > linpc1 bin 124 grep jdk *
> > mpijavac:my $my_compiler = "/usr/local/jdk1.7.0_07-64/bin/javac";
> > mpijavac.pl:my $my_compiler = "/usr/local/jdk1.7.0_07-64/bin/javac";
> > linpc1 bin 125 which java
> > /usr/local/jdk1.7.0_07-64/bin/java
> > linpc1 bin 126
> >
> >
> > linpc1 prog 110 javac MiniProgMain.java
> > linpc1 prog 111 java MiniProgMain
> > Message 0
> > Message 1
> > Message 2
> > Message 3
> > Message 4
> > linpc1 prog 112 mpiexec java MiniProgMain
> > Message 0
> > Message 1
> > Message 2
> > Message 3
> > Message 4
> > linpc1 prog 113 mpiexec -np 2 java MiniProgMain
> > Message 0
> > Message 1
> > Message 2
> > Message 3
> > Message 4
> > Message 0
> > Message 1
> > Message 2
> > Message 3
> > Message 4
> >
> >
> > A small program which allocates buffer for a new string.
> > ...
> > stringBUFLEN = new String (string.substring (0, len));
> > ...
> >
> > linpc1 prog 115 javac MemAllocMain.java
> > linpc1 prog 116 java MemAllocMain
> > Type something ("quit" terminates program): ffghhfhh
> > Received input: ffghhfhh
> > Converted to upper case: FFGHHFHH
> > Type something ("quit" terminates program): quit
> > Received input: quit
> > Converted to upper case: QUIT
> >
> > linpc1 prog 117 mpiexec java MemAllocMain
> > Type something ("quit" terminates program): fbhshnhjs
> > Received input: fbhshnhjs
> > Converted to upper case: FBHSHNHJS
> > Type something ("quit" terminates program): quit
> > Received input: quit
> > Converted to upper case: QUIT
> > linpc1 prog 118
> >
> > I'm not sure if this is of any help, but the problem starts with
> > MPI methods. The following program calls just the Init() and
> > Finalize() method.
> >
> > tyr java 203 mpiexec -host linpc1 java InitFinalizeMain
> > --------------------------------------------------------------------------
> > It looks like opal_init failed for some reason; your parallel process is
> > likely to abort. There are many reasons that a parallel process can
> > fail during opal_init; some of which are due to configuration or
> > environment problems. This failure appears to be an internal failure;
> > here's some additional information (which may only be relevant to an
> > Open MPI developer):
> >
> > mca_base_open failed
> > --> Returned value -2 instead of OPAL_SUCCESS
> > ...
> >
> >
> > Hopefully somebody will have an idea what goes wrong on my Linux
> > system. Thank you very much for any help in advance.
> >
> > Kind regards
> >
> > Siegmar
> >
> >
> >> Just shooting in the dark here - I suspect you'll have to ask someone
> >> more knowledgeable on JVMs.
> >>
> >>
> >> On Dec 21, 2012, at 7:32 AM, Siegmar Gross
> > <Siegmar.Gross_at_[hidden]> wrote:
> >>
> >>> Hi
> >>>
> >>>> I can't speak to the other issues, but for these - it looks like
> >>>> something isn't right in the system. Could be an incompatibility
> >>>> with Suse 12.1.
> >>>>
> >>>> What the errors are saying is that malloc is failing when used at
> >>>> a very early stage in starting the process. Can you run even a
> >>>> C-based MPI "hello" program?
> >>>
> >>> Yes. I have implemented more or less the same program in C and Java.
> >>>
> >>> tyr hello_1 131 mpiexec -np 2 -host linpc0,linpc1 hello_1_mpi
> >>> Process 0 of 2 running on linpc0
> >>> Process 1 of 2 running on linpc1
> >>>
> >>> Now 1 slave tasks are sending greetings.
> >>>
> >>> Greetings from task 1:
> >>> message type: 3
> >>> msg length: 132 characters
> >>> message:
> >>> hostname: linpc1
> >>> operating system: Linux
> >>> release: 3.1.10-1.16-desktop
> >>> processor: x86_64
> >>>
> >>>
> >>> tyr hello_1 132 mpiexec -np 2 -host linpc0,linpc1 java HelloMainWithBarrier
> >>> --------------------------------------------------------------------------
> >>> It looks like opal_init failed for some reason; your parallel process is
> >>> likely to abort. There are many reasons that a parallel process can
> >>> fail during opal_init; some of which are due to configuration or
> >>> environment problems. This failure appears to be an internal failure;
> >>> here's some additional information (which may only be relevant to an
> >>> Open MPI developer):
> >>>
> >>> mca_base_open failed
> >>> --> Returned value -2 instead of OPAL_SUCCESS
> >>> ...
> >>>
> >>>
> >>> Thank you very much for any help in advance.
> >>>
> >>> Kind regards
> >>>
> >>> Siegmar
> >>>
> >>>
> >>>
> >>>> On Dec 21, 2012, at 1:41 AM, Siegmar Gross
> >>> <Siegmar.Gross_at_[hidden]> wrote:
> >>>>
> >>>>> The program breaks if I use two Linux.x86_64 machines (Open Suse 12.1).
> >>>>>
> >>>>> linpc1 etc 101 mpiexec -np 2 -host linpc0,linpc1 java BcastIntArrayMain
> >>>>> --------------------------------------------------------------------------
> >>>>> It looks like opal_init failed for some reason; your parallel process is
> >>>>> likely to abort. There are many reasons that a parallel process can
> >>>>> fail during opal_init; some of which are due to configuration or
> >>>>> environment problems. This failure appears to be an internal failure;
> >>>>> here's some additional information (which may only be relevant to an
> >>>>> Open MPI developer):
> >>>>>
> >>>>> mca_base_open failed
> >>>>> --> Returned value -2 instead of OPAL_SUCCESS
> >>>>> ...
> >>>>> ompi_mpi_init: orte_init failed
> >>>>> --> Returned "Out of resource" (-2) instead of "Success" (0)
> >>>>> --------------------------------------------------------------------------
> >>>>> *** An error occurred in MPI_Init
> >>>>> *** on a NULL communicator
> >>>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> >>>>> *** and potentially your MPI job)
> >>>>> [(null):10586] Local abort before MPI_INIT completed successfully; not
> > able
> >>> to
> >>>>> aggregate error messages, and not able to guarantee that all other
> > processes
> >>>>> were killed!
> >>>>> -------------------------------------------------------
> >>>>> Primary job terminated normally, but 1 process returned
> >>>>> a non-zero exit code.. Per user-direction, the job has been aborted.
> >>>>> -------------------------------------------------------
> >>>>> --------------------------------------------------------------------------
> >>>>> mpiexec detected that one or more processes exited with non-zero status,
> >>> thus
> >>>>> causing
> >>>>> the job to be terminated. The first process to do so was:
> >>>>>
> >>>>> Process name: [[16706,1],1]
> >>>>> Exit code: 1
> >>>>> --------------------------------------------------------------------------
> >>>>>
> >>>>>
> >>>>>
> >>>>> I use a valid environment on all machines. The problem occurs as well
> >>>>> when I compile and run the program directly on the Linux system.
> >>>>>
> >>>>> linpc1 java 101 mpijavac BcastIntMain.java
> >>>>> linpc1 java 102 mpiexec -np 2 -host linpc0,linpc1 java -cp `pwd`
> >>> BcastIntMain
> >>>>> --------------------------------------------------------------------------
> >>>>> It looks like opal_init failed for some reason; your parallel process is
> >>>>> likely to abort. There are many reasons that a parallel process can
> >>>>> fail during opal_init; some of which are due to configuration or
> >>>>> environment problems. This failure appears to be an internal failure;
> >>>>> here's some additional information (which may only be relevant to an
> >>>>> Open MPI developer):
> >>>>>
> >>>>> mca_base_open failed
> >>>>> --> Returned value -2 instead of OPAL_SUCCESS
> >>>>
> >>>
> >>> _______________________________________________
> >>> users mailing list
> >>> users_at_[hidden]
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >>
> > <InitFinalizeMain.java>_______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>