Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] broadcasting basic data items in Java
From: Siegmar Gross (Siegmar.Gross_at_[hidden])
Date: 2012-12-21 10:32:24


Hi

> I can't speak to the other issues, but for these - it looks like
> something isn't right in the system. Could be an incompatibility
> with Suse 12.1.
>
> What the errors are saying is that malloc is failing when used at
> a very early stage in starting the process. Can you run even a
> C-based MPI "hello" program?

Yes. I have implemented more or less the same program in C and Java.

tyr hello_1 131 mpiexec -np 2 -host linpc0,linpc1 hello_1_mpi
Process 0 of 2 running on linpc0
Process 1 of 2 running on linpc1

Now 1 slave tasks are sending greetings.

Greetings from task 1:
  message type: 3
  msg length: 132 characters
  message:
    hostname: linpc1
    operating system: Linux
    release: 3.1.10-1.16-desktop
    processor: x86_64

tyr hello_1 132 mpiexec -np 2 -host linpc0,linpc1 java HelloMainWithBarrier
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  mca_base_open failed
  --> Returned value -2 instead of OPAL_SUCCESS
...

Thank you very much for any help in advance.

Kind regards

Siegmar

> On Dec 21, 2012, at 1:41 AM, Siegmar Gross
<Siegmar.Gross_at_[hidden]> wrote:
>
> > The program breaks if I use two Linux.x86_64 machines (Open Suse 12.1).
> >
> > linpc1 etc 101 mpiexec -np 2 -host linpc0,linpc1 java BcastIntArrayMain
> > --------------------------------------------------------------------------
> > It looks like opal_init failed for some reason; your parallel process is
> > likely to abort. There are many reasons that a parallel process can
> > fail during opal_init; some of which are due to configuration or
> > environment problems. This failure appears to be an internal failure;
> > here's some additional information (which may only be relevant to an
> > Open MPI developer):
> >
> > mca_base_open failed
> > --> Returned value -2 instead of OPAL_SUCCESS
> > ...
> > ompi_mpi_init: orte_init failed
> > --> Returned "Out of resource" (-2) instead of "Success" (0)
> > --------------------------------------------------------------------------
> > *** An error occurred in MPI_Init
> > *** on a NULL communicator
> > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> > *** and potentially your MPI job)
> > [(null):10586] Local abort before MPI_INIT completed successfully; not able
to
> > aggregate error messages, and not able to guarantee that all other processes
> > were killed!
> > -------------------------------------------------------
> > Primary job terminated normally, but 1 process returned
> > a non-zero exit code.. Per user-direction, the job has been aborted.
> > -------------------------------------------------------
> > --------------------------------------------------------------------------
> > mpiexec detected that one or more processes exited with non-zero status,
thus
> > causing
> > the job to be terminated. The first process to do so was:
> >
> > Process name: [[16706,1],1]
> > Exit code: 1
> > --------------------------------------------------------------------------
> >
> >
> >
> > I use a valid environment on all machines. The problem occurs as well
> > when I compile and run the program directly on the Linux system.
> >
> > linpc1 java 101 mpijavac BcastIntMain.java
> > linpc1 java 102 mpiexec -np 2 -host linpc0,linpc1 java -cp `pwd`
BcastIntMain
> > --------------------------------------------------------------------------
> > It looks like opal_init failed for some reason; your parallel process is
> > likely to abort. There are many reasons that a parallel process can
> > fail during opal_init; some of which are due to configuration or
> > environment problems. This failure appears to be an internal failure;
> > here's some additional information (which may only be relevant to an
> > Open MPI developer):
> >
> > mca_base_open failed
> > --> Returned value -2 instead of OPAL_SUCCESS
>