Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] broadcasting basic data items in Java
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-12-21 10:54:23


Hmmm...weird. Well, it looks like OMPI itself is okay, so the issue appears to be in the Java side of things. For whatever reason, your Java VM is refusing to allow a malloc to succeed. I suspect it has something to do with its setup, but I'm not enough of a Java person to point you to the problem.

Is it possible that the program was compiled against a different (perhaps incompatible) version of Java?

Just shooting in the dark here - I suspect you'll have to ask someone more knowledgeable on JVMs.

On Dec 21, 2012, at 7:32 AM, Siegmar Gross <Siegmar.Gross_at_[hidden]> wrote:

> Hi
>
>> I can't speak to the other issues, but for these - it looks like
>> something isn't right in the system. Could be an incompatibility
>> with Suse 12.1.
>>
>> What the errors are saying is that malloc is failing when used at
>> a very early stage in starting the process. Can you run even a
>> C-based MPI "hello" program?
>
> Yes. I have implemented more or less the same program in C and Java.
>
> tyr hello_1 131 mpiexec -np 2 -host linpc0,linpc1 hello_1_mpi
> Process 0 of 2 running on linpc0
> Process 1 of 2 running on linpc1
>
> Now 1 slave tasks are sending greetings.
>
> Greetings from task 1:
> message type: 3
> msg length: 132 characters
> message:
> hostname: linpc1
> operating system: Linux
> release: 3.1.10-1.16-desktop
> processor: x86_64
>
>
> tyr hello_1 132 mpiexec -np 2 -host linpc0,linpc1 java HelloMainWithBarrier
> --------------------------------------------------------------------------
> It looks like opal_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during opal_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> mca_base_open failed
> --> Returned value -2 instead of OPAL_SUCCESS
> ...
>
>
> Thank you very much for any help in advance.
>
> Kind regards
>
> Siegmar
>
>
>
>> On Dec 21, 2012, at 1:41 AM, Siegmar Gross
> <Siegmar.Gross_at_[hidden]> wrote:
>>
>>> The program breaks if I use two Linux.x86_64 machines (Open Suse 12.1).
>>>
>>> linpc1 etc 101 mpiexec -np 2 -host linpc0,linpc1 java BcastIntArrayMain
>>> --------------------------------------------------------------------------
>>> It looks like opal_init failed for some reason; your parallel process is
>>> likely to abort. There are many reasons that a parallel process can
>>> fail during opal_init; some of which are due to configuration or
>>> environment problems. This failure appears to be an internal failure;
>>> here's some additional information (which may only be relevant to an
>>> Open MPI developer):
>>>
>>> mca_base_open failed
>>> --> Returned value -2 instead of OPAL_SUCCESS
>>> ...
>>> ompi_mpi_init: orte_init failed
>>> --> Returned "Out of resource" (-2) instead of "Success" (0)
>>> --------------------------------------------------------------------------
>>> *** An error occurred in MPI_Init
>>> *** on a NULL communicator
>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
>>> *** and potentially your MPI job)
>>> [(null):10586] Local abort before MPI_INIT completed successfully; not able
> to
>>> aggregate error messages, and not able to guarantee that all other processes
>>> were killed!
>>> -------------------------------------------------------
>>> Primary job terminated normally, but 1 process returned
>>> a non-zero exit code.. Per user-direction, the job has been aborted.
>>> -------------------------------------------------------
>>> --------------------------------------------------------------------------
>>> mpiexec detected that one or more processes exited with non-zero status,
> thus
>>> causing
>>> the job to be terminated. The first process to do so was:
>>>
>>> Process name: [[16706,1],1]
>>> Exit code: 1
>>> --------------------------------------------------------------------------
>>>
>>>
>>>
>>> I use a valid environment on all machines. The problem occurs as well
>>> when I compile and run the program directly on the Linux system.
>>>
>>> linpc1 java 101 mpijavac BcastIntMain.java
>>> linpc1 java 102 mpiexec -np 2 -host linpc0,linpc1 java -cp `pwd`
> BcastIntMain
>>> --------------------------------------------------------------------------
>>> It looks like opal_init failed for some reason; your parallel process is
>>> likely to abort. There are many reasons that a parallel process can
>>> fail during opal_init; some of which are due to configuration or
>>> environment problems. This failure appears to be an internal failure;
>>> here's some additional information (which may only be relevant to an
>>> Open MPI developer):
>>>
>>> mca_base_open failed
>>> --> Returned value -2 instead of OPAL_SUCCESS
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users