Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] broadcasting basic data items in Java
From: Siegmar Gross (Siegmar.Gross_at_[hidden])
Date: 2012-12-21 12:46:37


Hi

> Hmmm...weird. Well, it looks like OMPI itself is okay, so the issue
> appears to be in the Java side of things. For whatever reason, your
> Java VM is refusing to allow a malloc to succeed. I suspect it has
> something to do with its setup, but I'm not enough of a Java person
> to point you to the problem.
>
> Is it possible that the program was compiled against a different
> (perhaps incompatible) version of Java?

No, I don't think so. A small Java program without MPI methods works.

linpc1 bin 122 which mpicc
/usr/local/openmpi-1.9_64_cc/bin/mpicc
linpc1 bin 123 pwd
/usr/local/openmpi-1.9_64_cc/bin
linpc1 bin 124 grep jdk *
mpijavac:my $my_compiler = "/usr/local/jdk1.7.0_07-64/bin/javac";
mpijavac.pl:my $my_compiler = "/usr/local/jdk1.7.0_07-64/bin/javac";
linpc1 bin 125 which java
/usr/local/jdk1.7.0_07-64/bin/java
linpc1 bin 126

linpc1 prog 110 javac MiniProgMain.java
linpc1 prog 111 java MiniProgMain
Message 0
Message 1
Message 2
Message 3
Message 4
linpc1 prog 112 mpiexec java MiniProgMain
Message 0
Message 1
Message 2
Message 3
Message 4
linpc1 prog 113 mpiexec -np 2 java MiniProgMain
Message 0
Message 1
Message 2
Message 3
Message 4
Message 0
Message 1
Message 2
Message 3
Message 4

A small program which allocates buffer for a new string.
...
stringBUFLEN = new String (string.substring (0, len));
...

linpc1 prog 115 javac MemAllocMain.java
linpc1 prog 116 java MemAllocMain
Type something ("quit" terminates program): ffghhfhh
Received input: ffghhfhh
Converted to upper case: FFGHHFHH
Type something ("quit" terminates program): quit
Received input: quit
Converted to upper case: QUIT

linpc1 prog 117 mpiexec java MemAllocMain
Type something ("quit" terminates program): fbhshnhjs
Received input: fbhshnhjs
Converted to upper case: FBHSHNHJS
Type something ("quit" terminates program): quit
Received input: quit
Converted to upper case: QUIT
linpc1 prog 118

I'm not sure if this is of any help, but the problem starts with
MPI methods. The following program calls just the Init() and
Finalize() method.

tyr java 203 mpiexec -host linpc1 java InitFinalizeMain
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  mca_base_open failed
  --> Returned value -2 instead of OPAL_SUCCESS
...

Hopefully somebody will have an idea what goes wrong on my Linux
system. Thank you very much for any help in advance.

Kind regards

Siegmar

 
> Just shooting in the dark here - I suspect you'll have to ask someone
> more knowledgeable on JVMs.
>
>
> On Dec 21, 2012, at 7:32 AM, Siegmar Gross
<Siegmar.Gross_at_[hidden]> wrote:
>
> > Hi
> >
> >> I can't speak to the other issues, but for these - it looks like
> >> something isn't right in the system. Could be an incompatibility
> >> with Suse 12.1.
> >>
> >> What the errors are saying is that malloc is failing when used at
> >> a very early stage in starting the process. Can you run even a
> >> C-based MPI "hello" program?
> >
> > Yes. I have implemented more or less the same program in C and Java.
> >
> > tyr hello_1 131 mpiexec -np 2 -host linpc0,linpc1 hello_1_mpi
> > Process 0 of 2 running on linpc0
> > Process 1 of 2 running on linpc1
> >
> > Now 1 slave tasks are sending greetings.
> >
> > Greetings from task 1:
> > message type: 3
> > msg length: 132 characters
> > message:
> > hostname: linpc1
> > operating system: Linux
> > release: 3.1.10-1.16-desktop
> > processor: x86_64
> >
> >
> > tyr hello_1 132 mpiexec -np 2 -host linpc0,linpc1 java HelloMainWithBarrier
> > --------------------------------------------------------------------------
> > It looks like opal_init failed for some reason; your parallel process is
> > likely to abort. There are many reasons that a parallel process can
> > fail during opal_init; some of which are due to configuration or
> > environment problems. This failure appears to be an internal failure;
> > here's some additional information (which may only be relevant to an
> > Open MPI developer):
> >
> > mca_base_open failed
> > --> Returned value -2 instead of OPAL_SUCCESS
> > ...
> >
> >
> > Thank you very much for any help in advance.
> >
> > Kind regards
> >
> > Siegmar
> >
> >
> >
> >> On Dec 21, 2012, at 1:41 AM, Siegmar Gross
> > <Siegmar.Gross_at_[hidden]> wrote:
> >>
> >>> The program breaks if I use two Linux.x86_64 machines (Open Suse 12.1).
> >>>
> >>> linpc1 etc 101 mpiexec -np 2 -host linpc0,linpc1 java BcastIntArrayMain
> >>> --------------------------------------------------------------------------
> >>> It looks like opal_init failed for some reason; your parallel process is
> >>> likely to abort. There are many reasons that a parallel process can
> >>> fail during opal_init; some of which are due to configuration or
> >>> environment problems. This failure appears to be an internal failure;
> >>> here's some additional information (which may only be relevant to an
> >>> Open MPI developer):
> >>>
> >>> mca_base_open failed
> >>> --> Returned value -2 instead of OPAL_SUCCESS
> >>> ...
> >>> ompi_mpi_init: orte_init failed
> >>> --> Returned "Out of resource" (-2) instead of "Success" (0)
> >>> --------------------------------------------------------------------------
> >>> *** An error occurred in MPI_Init
> >>> *** on a NULL communicator
> >>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> >>> *** and potentially your MPI job)
> >>> [(null):10586] Local abort before MPI_INIT completed successfully; not
able
> > to
> >>> aggregate error messages, and not able to guarantee that all other
processes
> >>> were killed!
> >>> -------------------------------------------------------
> >>> Primary job terminated normally, but 1 process returned
> >>> a non-zero exit code.. Per user-direction, the job has been aborted.
> >>> -------------------------------------------------------
> >>> --------------------------------------------------------------------------
> >>> mpiexec detected that one or more processes exited with non-zero status,
> > thus
> >>> causing
> >>> the job to be terminated. The first process to do so was:
> >>>
> >>> Process name: [[16706,1],1]
> >>> Exit code: 1
> >>> --------------------------------------------------------------------------
> >>>
> >>>
> >>>
> >>> I use a valid environment on all machines. The problem occurs as well
> >>> when I compile and run the program directly on the Linux system.
> >>>
> >>> linpc1 java 101 mpijavac BcastIntMain.java
> >>> linpc1 java 102 mpiexec -np 2 -host linpc0,linpc1 java -cp `pwd`
> > BcastIntMain
> >>> --------------------------------------------------------------------------
> >>> It looks like opal_init failed for some reason; your parallel process is
> >>> likely to abort. There are many reasons that a parallel process can
> >>> fail during opal_init; some of which are due to configuration or
> >>> environment problems. This failure appears to be an internal failure;
> >>> here's some additional information (which may only be relevant to an
> >>> Open MPI developer):
> >>>
> >>> mca_base_open failed
> >>> --> Returned value -2 instead of OPAL_SUCCESS
> >>
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>