Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] broadcasting basic data items in Java
From: Siegmar Gross (Siegmar.Gross_at_[hidden])
Date: 2012-12-21 10:32:24


Hi

> I can't speak to the other issues, but for these - it looks like
> something isn't right in the system. Could be an incompatibility
> with Suse 12.1.
>
> What the errors are saying is that malloc is failing when used at
> a very early stage in starting the process. Can you run even a
> C-based MPI "hello" program?

Yes. I have implemented more or less the same program in C and Java.

tyr hello_1 131 mpiexec -np 2 -host linpc0,linpc1 hello_1_mpi
Process 0 of 2 running on linpc0
Process 1 of 2 running on linpc1

Now 1 slave tasks are sending greetings.

Greetings from task 1:
  message type: 3
  msg length: 132 characters
  message:
    hostname: linpc1
    operating system: Linux
    release: 3.1.10-1.16-desktop
    processor: x86_64

tyr hello_1 132 mpiexec -np 2 -host linpc0,linpc1 java HelloMainWithBarrier
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  mca_base_open failed
  --> Returned value -2 instead of OPAL_SUCCESS
...

Thank you very much for any help in advance.

Kind regards

Siegmar

> On Dec 21, 2012, at 1:41 AM, Siegmar Gross
<Siegmar.Gross_at_[hidden]> wrote:
>
> > The program breaks if I use two Linux.x86_64 machines (Open Suse 12.1).
> >
> > linpc1 etc 101 mpiexec -np 2 -host linpc0,linpc1 java BcastIntArrayMain
> > --------------------------------------------------------------------------
> > It looks like opal_init failed for some reason; your parallel process is
> > likely to abort. There are many reasons that a parallel process can
> > fail during opal_init; some of which are due to configuration or
> > environment problems. This failure appears to be an internal failure;
> > here's some additional information (which may only be relevant to an
> > Open MPI developer):
> >
> > mca_base_open failed
> > --> Returned value -2 instead of OPAL_SUCCESS
> > ...
> > ompi_mpi_init: orte_init failed
> > --> Returned "Out of resource" (-2) instead of "Success" (0)
> > --------------------------------------------------------------------------
> > *** An error occurred in MPI_Init
> > *** on a NULL communicator
> > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> > *** and potentially your MPI job)
> > [(null):10586] Local abort before MPI_INIT completed successfully; not able
to
> > aggregate error messages, and not able to guarantee that all other processes
> > were killed!
> > -------------------------------------------------------
> > Primary job terminated normally, but 1 process returned
> > a non-zero exit code.. Per user-direction, the job has been aborted.
> > -------------------------------------------------------
> > --------------------------------------------------------------------------
> > mpiexec detected that one or more processes exited with non-zero status,
thus
> > causing
> > the job to be terminated. The first process to do so was:
> >
> > Process name: [[16706,1],1]
> > Exit code: 1
> > --------------------------------------------------------------------------
> >
> >
> >
> > I use a valid environment on all machines. The problem occurs as well
> > when I compile and run the program directly on the Linux system.
> >
> > linpc1 java 101 mpijavac BcastIntMain.java
> > linpc1 java 102 mpiexec -np 2 -host linpc0,linpc1 java -cp `pwd`
BcastIntMain
> > --------------------------------------------------------------------------
> > It looks like opal_init failed for some reason; your parallel process is
> > likely to abort. There are many reasons that a parallel process can
> > fail during opal_init; some of which are due to configuration or
> > environment problems. This failure appears to be an internal failure;
> > here's some additional information (which may only be relevant to an
> > Open MPI developer):
> >
> > mca_base_open failed
> > --> Returned value -2 instead of OPAL_SUCCESS
>