Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] broadcasting basic data items in Java
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-12-21 10:09:04


I can't speak to the other issues, but for these - it looks like something isn't right in the system. Could be an incompatibility with Suse 12.1.

What the errors are saying is that malloc is failing when used at a very early stage in starting the process. Can you run even a C-based MPI "hello" program?

On Dec 21, 2012, at 1:41 AM, Siegmar Gross <Siegmar.Gross_at_[hidden]> wrote:

> The program breaks if I use two Linux.x86_64 machines (Open Suse 12.1).
>
> linpc1 etc 101 mpiexec -np 2 -host linpc0,linpc1 java BcastIntArrayMain
> --------------------------------------------------------------------------
> It looks like opal_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during opal_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> mca_base_open failed
> --> Returned value -2 instead of OPAL_SUCCESS
> ...
> ompi_mpi_init: orte_init failed
> --> Returned "Out of resource" (-2) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> *** and potentially your MPI job)
> [(null):10586] Local abort before MPI_INIT completed successfully; not able to
> aggregate error messages, and not able to guarantee that all other processes
> were killed!
> -------------------------------------------------------
> Primary job terminated normally, but 1 process returned
> a non-zero exit code.. Per user-direction, the job has been aborted.
> -------------------------------------------------------
> --------------------------------------------------------------------------
> mpiexec detected that one or more processes exited with non-zero status, thus
> causing
> the job to be terminated. The first process to do so was:
>
> Process name: [[16706,1],1]
> Exit code: 1
> --------------------------------------------------------------------------
>
>
>
> I use a valid environment on all machines. The problem occurs as well
> when I compile and run the program directly on the Linux system.
>
> linpc1 java 101 mpijavac BcastIntMain.java
> linpc1 java 102 mpiexec -np 2 -host linpc0,linpc1 java -cp `pwd` BcastIntMain
> --------------------------------------------------------------------------
> It looks like opal_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during opal_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> mca_base_open failed
> --> Returned value -2 instead of OPAL_SUCCESS