Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] broadcasting basic data items in Java
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-12-21 16:12:55


Interesting. My best guess is that the OMPI libraries aren't being found, though I'm a little surprised because the error message indicates an inability to malloc - but it's possible the message isn't accurate.

One thing stands out - I see you compiled your program with "javac". I suspect that is the source of the trouble - you really need to use the Java wrapper compiler "mpijavac" so all the libs get absorbed and/or linked correctly.

On Dec 21, 2012, at 9:46 AM, Siegmar Gross <Siegmar.Gross_at_[hidden]> wrote:

> Hi
>
>> Hmmm...weird. Well, it looks like OMPI itself is okay, so the issue
>> appears to be in the Java side of things. For whatever reason, your
>> Java VM is refusing to allow a malloc to succeed. I suspect it has
>> something to do with its setup, but I'm not enough of a Java person
>> to point you to the problem.
>>
>> Is it possible that the program was compiled against a different
>> (perhaps incompatible) version of Java?
>
> No, I don't think so. A small Java program without MPI methods works.
>
> linpc1 bin 122 which mpicc
> /usr/local/openmpi-1.9_64_cc/bin/mpicc
> linpc1 bin 123 pwd
> /usr/local/openmpi-1.9_64_cc/bin
> linpc1 bin 124 grep jdk *
> mpijavac:my $my_compiler = "/usr/local/jdk1.7.0_07-64/bin/javac";
> mpijavac.pl:my $my_compiler = "/usr/local/jdk1.7.0_07-64/bin/javac";
> linpc1 bin 125 which java
> /usr/local/jdk1.7.0_07-64/bin/java
> linpc1 bin 126
>
>
> linpc1 prog 110 javac MiniProgMain.java
> linpc1 prog 111 java MiniProgMain
> Message 0
> Message 1
> Message 2
> Message 3
> Message 4
> linpc1 prog 112 mpiexec java MiniProgMain
> Message 0
> Message 1
> Message 2
> Message 3
> Message 4
> linpc1 prog 113 mpiexec -np 2 java MiniProgMain
> Message 0
> Message 1
> Message 2
> Message 3
> Message 4
> Message 0
> Message 1
> Message 2
> Message 3
> Message 4
>
>
> A small program which allocates buffer for a new string.
> ...
> stringBUFLEN = new String (string.substring (0, len));
> ...
>
> linpc1 prog 115 javac MemAllocMain.java
> linpc1 prog 116 java MemAllocMain
> Type something ("quit" terminates program): ffghhfhh
> Received input: ffghhfhh
> Converted to upper case: FFGHHFHH
> Type something ("quit" terminates program): quit
> Received input: quit
> Converted to upper case: QUIT
>
> linpc1 prog 117 mpiexec java MemAllocMain
> Type something ("quit" terminates program): fbhshnhjs
> Received input: fbhshnhjs
> Converted to upper case: FBHSHNHJS
> Type something ("quit" terminates program): quit
> Received input: quit
> Converted to upper case: QUIT
> linpc1 prog 118
>
> I'm not sure if this is of any help, but the problem starts with
> MPI methods. The following program calls just the Init() and
> Finalize() method.
>
> tyr java 203 mpiexec -host linpc1 java InitFinalizeMain
> --------------------------------------------------------------------------
> It looks like opal_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during opal_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> mca_base_open failed
> --> Returned value -2 instead of OPAL_SUCCESS
> ...
>
>
> Hopefully somebody will have an idea what goes wrong on my Linux
> system. Thank you very much for any help in advance.
>
> Kind regards
>
> Siegmar
>
>
>> Just shooting in the dark here - I suspect you'll have to ask someone
>> more knowledgeable on JVMs.
>>
>>
>> On Dec 21, 2012, at 7:32 AM, Siegmar Gross
> <Siegmar.Gross_at_[hidden]> wrote:
>>
>>> Hi
>>>
>>>> I can't speak to the other issues, but for these - it looks like
>>>> something isn't right in the system. Could be an incompatibility
>>>> with Suse 12.1.
>>>>
>>>> What the errors are saying is that malloc is failing when used at
>>>> a very early stage in starting the process. Can you run even a
>>>> C-based MPI "hello" program?
>>>
>>> Yes. I have implemented more or less the same program in C and Java.
>>>
>>> tyr hello_1 131 mpiexec -np 2 -host linpc0,linpc1 hello_1_mpi
>>> Process 0 of 2 running on linpc0
>>> Process 1 of 2 running on linpc1
>>>
>>> Now 1 slave tasks are sending greetings.
>>>
>>> Greetings from task 1:
>>> message type: 3
>>> msg length: 132 characters
>>> message:
>>> hostname: linpc1
>>> operating system: Linux
>>> release: 3.1.10-1.16-desktop
>>> processor: x86_64
>>>
>>>
>>> tyr hello_1 132 mpiexec -np 2 -host linpc0,linpc1 java HelloMainWithBarrier
>>> --------------------------------------------------------------------------
>>> It looks like opal_init failed for some reason; your parallel process is
>>> likely to abort. There are many reasons that a parallel process can
>>> fail during opal_init; some of which are due to configuration or
>>> environment problems. This failure appears to be an internal failure;
>>> here's some additional information (which may only be relevant to an
>>> Open MPI developer):
>>>
>>> mca_base_open failed
>>> --> Returned value -2 instead of OPAL_SUCCESS
>>> ...
>>>
>>>
>>> Thank you very much for any help in advance.
>>>
>>> Kind regards
>>>
>>> Siegmar
>>>
>>>
>>>
>>>> On Dec 21, 2012, at 1:41 AM, Siegmar Gross
>>> <Siegmar.Gross_at_[hidden]> wrote:
>>>>
>>>>> The program breaks if I use two Linux.x86_64 machines (Open Suse 12.1).
>>>>>
>>>>> linpc1 etc 101 mpiexec -np 2 -host linpc0,linpc1 java BcastIntArrayMain
>>>>> --------------------------------------------------------------------------
>>>>> It looks like opal_init failed for some reason; your parallel process is
>>>>> likely to abort. There are many reasons that a parallel process can
>>>>> fail during opal_init; some of which are due to configuration or
>>>>> environment problems. This failure appears to be an internal failure;
>>>>> here's some additional information (which may only be relevant to an
>>>>> Open MPI developer):
>>>>>
>>>>> mca_base_open failed
>>>>> --> Returned value -2 instead of OPAL_SUCCESS
>>>>> ...
>>>>> ompi_mpi_init: orte_init failed
>>>>> --> Returned "Out of resource" (-2) instead of "Success" (0)
>>>>> --------------------------------------------------------------------------
>>>>> *** An error occurred in MPI_Init
>>>>> *** on a NULL communicator
>>>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
>>>>> *** and potentially your MPI job)
>>>>> [(null):10586] Local abort before MPI_INIT completed successfully; not
> able
>>> to
>>>>> aggregate error messages, and not able to guarantee that all other
> processes
>>>>> were killed!
>>>>> -------------------------------------------------------
>>>>> Primary job terminated normally, but 1 process returned
>>>>> a non-zero exit code.. Per user-direction, the job has been aborted.
>>>>> -------------------------------------------------------
>>>>> --------------------------------------------------------------------------
>>>>> mpiexec detected that one or more processes exited with non-zero status,
>>> thus
>>>>> causing
>>>>> the job to be terminated. The first process to do so was:
>>>>>
>>>>> Process name: [[16706,1],1]
>>>>> Exit code: 1
>>>>> --------------------------------------------------------------------------
>>>>>
>>>>>
>>>>>
>>>>> I use a valid environment on all machines. The problem occurs as well
>>>>> when I compile and run the program directly on the Linux system.
>>>>>
>>>>> linpc1 java 101 mpijavac BcastIntMain.java
>>>>> linpc1 java 102 mpiexec -np 2 -host linpc0,linpc1 java -cp `pwd`
>>> BcastIntMain
>>>>> --------------------------------------------------------------------------
>>>>> It looks like opal_init failed for some reason; your parallel process is
>>>>> likely to abort. There are many reasons that a parallel process can
>>>>> fail during opal_init; some of which are due to configuration or
>>>>> environment problems. This failure appears to be an internal failure;
>>>>> here's some additional information (which may only be relevant to an
>>>>> Open MPI developer):
>>>>>
>>>>> mca_base_open failed
>>>>> --> Returned value -2 instead of OPAL_SUCCESS
>>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
> <InitFinalizeMain.java>_______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users