Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] broadcasting basic data items in Java
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2012-12-24 09:31:54


I can confirm that the first program fails (bcast a single int).

I'm trying to understand how the implementation works, but this may take some time (due to the holidays, etc.).

On Dec 22, 2012, at 2:53 AM, Siegmar Gross wrote:

> Hi
>
>> Interesting. My best guess is that the OMPI libraries aren't being
>> found, though I'm a little surprised because the error message
>> indicates an inability to malloc - but it's possible the message
>> isn't accurate.
>>
>> One thing stands out - I see you compiled your program with "javac".
>> I suspect that is the source of the trouble - you really need to use
>> the Java wrapper compiler "mpijavac" so all the libs get absorbed
>> and/or linked correctly.
>
> No, I only compiled the first two programs (which don't use any MPI
> methods) with javac. The MPI program "InitFinalizeMain.java" was
> compiled with "mpijavac" (I use a script file and GNUmakefile).
>
> linpc1 java 102 make_classfiles
> ...
> =========== linpc1 ===========
> Warning: untrusted X11 forwarding setup failed: xauth key data not generated
> Warning: No xauth data; using fake authentication data for X11 forwarding.
> mpijavac -d /home/fd1026/Linux/x86_64/mpi_classfiles InitFinalizeMain.java
> ...
>
>
> The other programs work also if I compile them with "mpijavac"
>
> linpc1 prog 107 mpijavac MemAllocMain.java
> linpc1 prog 108 mpiexec java -cp `pwd` MemAllocMain
> Type something ("quit" terminates program): dgdas
> Received input: dgdas
> Converted to upper case: DGDAS
> Type something ("quit" terminates program): quit
> Received input: quit
> Converted to upper case: QUIT
> linpc1 prog 109
>
>
> My environment should be valid as well. LD_LIBRARY_PATH contains
> first the directories for 32 bit libraries and then the directories
> for 64 bit libraries. I have split the long lines for the PATH
> variables so that they are easier to read.
>
> linpc1 java 111 mpiexec java EnvironVarMain
>
> Operating system: Linux Processor architecture: x86_64
>
> CLASSPATH:
> /usr/local/junit4.10:
> /usr/local/junit4.10/junit-4.10.jar:
> //usr/local/jdk1.7.0_07-64/j3d/lib/ext/j3dcore.jar:
> //usr/local/jdk1.7.0_07-64/j3d/lib/ext/j3dutils.jar:
> //usr/local/jdk1.7.0_07-64/j3d/lib/ext/vecmath.jar:
> /usr/local/javacc-5.0/javacc.jar:
> .:
> /home/fd1026/Linux/x86_64/mpi_classfiles
>
> LD_LIBRARY_PATH:
> /usr/lib:
> ...
> /usr/lib64:
> /usr/local/jdk1.7.0_07-64/jre/lib/amd64:
> /usr/local/gcc-4.7.1/lib64:
> /usr/local/gcc-4.7.1/libexec/gcc/x86_64-unknown-linux-gnu/4.7.1:
> /usr/local/gcc-4.7.1/lib/gcc/x86_64-unknown-linux-gnu/4.7.1:
> /usr/local/lib64:
> /usr/local/ssl/lib64:
> /usr/lib64:
> /usr/X11R6/lib64:
> /usr/local/openmpi-1.9_64_cc/lib64:
> /home/fd1026/Linux/x86_64/lib64
> linpc1 java 112
>
> Can I provide any other information to solve this problem?
>
>
> Kind regards
>
> Siegmar
>
>
>> On Dec 21, 2012, at 9:46 AM, Siegmar Gross <Siegmar.Gross_at_[hidden]> wrote:
>>
>>> Hi
>>>
>>>> Hmmm...weird. Well, it looks like OMPI itself is okay, so the issue
>>>> appears to be in the Java side of things. For whatever reason, your
>>>> Java VM is refusing to allow a malloc to succeed. I suspect it has
>>>> something to do with its setup, but I'm not enough of a Java person
>>>> to point you to the problem.
>>>>
>>>> Is it possible that the program was compiled against a different
>>>> (perhaps incompatible) version of Java?
>>>
>>> No, I don't think so. A small Java program without MPI methods works.
>>>
>>> linpc1 bin 122 which mpicc
>>> /usr/local/openmpi-1.9_64_cc/bin/mpicc
>>> linpc1 bin 123 pwd
>>> /usr/local/openmpi-1.9_64_cc/bin
>>> linpc1 bin 124 grep jdk *
>>> mpijavac:my $my_compiler = "/usr/local/jdk1.7.0_07-64/bin/javac";
>>> mpijavac.pl:my $my_compiler = "/usr/local/jdk1.7.0_07-64/bin/javac";
>>> linpc1 bin 125 which java
>>> /usr/local/jdk1.7.0_07-64/bin/java
>>> linpc1 bin 126
>>>
>>>
>>> linpc1 prog 110 javac MiniProgMain.java
>>> linpc1 prog 111 java MiniProgMain
>>> Message 0
>>> Message 1
>>> Message 2
>>> Message 3
>>> Message 4
>>> linpc1 prog 112 mpiexec java MiniProgMain
>>> Message 0
>>> Message 1
>>> Message 2
>>> Message 3
>>> Message 4
>>> linpc1 prog 113 mpiexec -np 2 java MiniProgMain
>>> Message 0
>>> Message 1
>>> Message 2
>>> Message 3
>>> Message 4
>>> Message 0
>>> Message 1
>>> Message 2
>>> Message 3
>>> Message 4
>>>
>>>
>>> A small program which allocates buffer for a new string.
>>> ...
>>> stringBUFLEN = new String (string.substring (0, len));
>>> ...
>>>
>>> linpc1 prog 115 javac MemAllocMain.java
>>> linpc1 prog 116 java MemAllocMain
>>> Type something ("quit" terminates program): ffghhfhh
>>> Received input: ffghhfhh
>>> Converted to upper case: FFGHHFHH
>>> Type something ("quit" terminates program): quit
>>> Received input: quit
>>> Converted to upper case: QUIT
>>>
>>> linpc1 prog 117 mpiexec java MemAllocMain
>>> Type something ("quit" terminates program): fbhshnhjs
>>> Received input: fbhshnhjs
>>> Converted to upper case: FBHSHNHJS
>>> Type something ("quit" terminates program): quit
>>> Received input: quit
>>> Converted to upper case: QUIT
>>> linpc1 prog 118
>>>
>>> I'm not sure if this is of any help, but the problem starts with
>>> MPI methods. The following program calls just the Init() and
>>> Finalize() method.
>>>
>>> tyr java 203 mpiexec -host linpc1 java InitFinalizeMain
>>> --------------------------------------------------------------------------
>>> It looks like opal_init failed for some reason; your parallel process is
>>> likely to abort. There are many reasons that a parallel process can
>>> fail during opal_init; some of which are due to configuration or
>>> environment problems. This failure appears to be an internal failure;
>>> here's some additional information (which may only be relevant to an
>>> Open MPI developer):
>>>
>>> mca_base_open failed
>>> --> Returned value -2 instead of OPAL_SUCCESS
>>> ...
>>>
>>>
>>> Hopefully somebody will have an idea what goes wrong on my Linux
>>> system. Thank you very much for any help in advance.
>>>
>>> Kind regards
>>>
>>> Siegmar
>>>
>>>
>>>> Just shooting in the dark here - I suspect you'll have to ask someone
>>>> more knowledgeable on JVMs.
>>>>
>>>>
>>>> On Dec 21, 2012, at 7:32 AM, Siegmar Gross
>>> <Siegmar.Gross_at_[hidden]> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>>> I can't speak to the other issues, but for these - it looks like
>>>>>> something isn't right in the system. Could be an incompatibility
>>>>>> with Suse 12.1.
>>>>>>
>>>>>> What the errors are saying is that malloc is failing when used at
>>>>>> a very early stage in starting the process. Can you run even a
>>>>>> C-based MPI "hello" program?
>>>>>
>>>>> Yes. I have implemented more or less the same program in C and Java.
>>>>>
>>>>> tyr hello_1 131 mpiexec -np 2 -host linpc0,linpc1 hello_1_mpi
>>>>> Process 0 of 2 running on linpc0
>>>>> Process 1 of 2 running on linpc1
>>>>>
>>>>> Now 1 slave tasks are sending greetings.
>>>>>
>>>>> Greetings from task 1:
>>>>> message type: 3
>>>>> msg length: 132 characters
>>>>> message:
>>>>> hostname: linpc1
>>>>> operating system: Linux
>>>>> release: 3.1.10-1.16-desktop
>>>>> processor: x86_64
>>>>>
>>>>>
>>>>> tyr hello_1 132 mpiexec -np 2 -host linpc0,linpc1 java HelloMainWithBarrier
>>>>> --------------------------------------------------------------------------
>>>>> It looks like opal_init failed for some reason; your parallel process is
>>>>> likely to abort. There are many reasons that a parallel process can
>>>>> fail during opal_init; some of which are due to configuration or
>>>>> environment problems. This failure appears to be an internal failure;
>>>>> here's some additional information (which may only be relevant to an
>>>>> Open MPI developer):
>>>>>
>>>>> mca_base_open failed
>>>>> --> Returned value -2 instead of OPAL_SUCCESS
>>>>> ...
>>>>>
>>>>>
>>>>> Thank you very much for any help in advance.
>>>>>
>>>>> Kind regards
>>>>>
>>>>> Siegmar
>>>>>
>>>>>
>>>>>
>>>>>> On Dec 21, 2012, at 1:41 AM, Siegmar Gross
>>>>> <Siegmar.Gross_at_[hidden]> wrote:
>>>>>>
>>>>>>> The program breaks if I use two Linux.x86_64 machines (Open Suse 12.1).
>>>>>>>
>>>>>>> linpc1 etc 101 mpiexec -np 2 -host linpc0,linpc1 java BcastIntArrayMain
>>>>>>> --------------------------------------------------------------------------
>>>>>>> It looks like opal_init failed for some reason; your parallel process is
>>>>>>> likely to abort. There are many reasons that a parallel process can
>>>>>>> fail during opal_init; some of which are due to configuration or
>>>>>>> environment problems. This failure appears to be an internal failure;
>>>>>>> here's some additional information (which may only be relevant to an
>>>>>>> Open MPI developer):
>>>>>>>
>>>>>>> mca_base_open failed
>>>>>>> --> Returned value -2 instead of OPAL_SUCCESS
>>>>>>> ...
>>>>>>> ompi_mpi_init: orte_init failed
>>>>>>> --> Returned "Out of resource" (-2) instead of "Success" (0)
>>>>>>> --------------------------------------------------------------------------
>>>>>>> *** An error occurred in MPI_Init
>>>>>>> *** on a NULL communicator
>>>>>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
>>>>>>> *** and potentially your MPI job)
>>>>>>> [(null):10586] Local abort before MPI_INIT completed successfully; not
>>> able
>>>>> to
>>>>>>> aggregate error messages, and not able to guarantee that all other
>>> processes
>>>>>>> were killed!
>>>>>>> -------------------------------------------------------
>>>>>>> Primary job terminated normally, but 1 process returned
>>>>>>> a non-zero exit code.. Per user-direction, the job has been aborted.
>>>>>>> -------------------------------------------------------
>>>>>>> --------------------------------------------------------------------------
>>>>>>> mpiexec detected that one or more processes exited with non-zero status,
>>>>> thus
>>>>>>> causing
>>>>>>> the job to be terminated. The first process to do so was:
>>>>>>>
>>>>>>> Process name: [[16706,1],1]
>>>>>>> Exit code: 1
>>>>>>> --------------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I use a valid environment on all machines. The problem occurs as well
>>>>>>> when I compile and run the program directly on the Linux system.
>>>>>>>
>>>>>>> linpc1 java 101 mpijavac BcastIntMain.java
>>>>>>> linpc1 java 102 mpiexec -np 2 -host linpc0,linpc1 java -cp `pwd`
>>>>> BcastIntMain
>>>>>>> --------------------------------------------------------------------------
>>>>>>> It looks like opal_init failed for some reason; your parallel process is
>>>>>>> likely to abort. There are many reasons that a parallel process can
>>>>>>> fail during opal_init; some of which are due to configuration or
>>>>>>> environment problems. This failure appears to be an internal failure;
>>>>>>> here's some additional information (which may only be relevant to an
>>>>>>> Open MPI developer):
>>>>>>>
>>>>>>> mca_base_open failed
>>>>>>> --> Returned value -2 instead of OPAL_SUCCESS
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>> <InitFinalizeMain.java>_______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/