I can't speak to the other issues, but for these - it looks like something isn't right in the system. Could be an incompatibility with Suse 12.1.

What the errors are saying is that malloc is failing when used at a very early stage in starting the process. Can you run even a C-based MPI "hello" program?


On Dec 21, 2012, at 1:41 AM, Siegmar Gross <Siegmar.Gross@informatik.hs-fulda.de> wrote:

The program breaks if I use two Linux.x86_64 machines (Open Suse 12.1).

linpc1 etc 101 mpiexec -np 2 -host linpc0,linpc1 java BcastIntArrayMain
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

 mca_base_open failed
 --> Returned value -2 instead of OPAL_SUCCESS
...
 ompi_mpi_init: orte_init failed
 --> Returned "Out of resource" (-2) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[(null):10586] Local abort before MPI_INIT completed successfully; not able to 
aggregate error messages, and not able to guarantee that all other processes 
were killed!
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus 
causing
the job to be terminated. The first process to do so was:

 Process name: [[16706,1],1]
 Exit code:    1
--------------------------------------------------------------------------



I use a valid environment on all machines. The problem occurs as well
when I compile and run the program directly on the Linux system.

linpc1 java 101 mpijavac BcastIntMain.java 
linpc1 java 102 mpiexec -np 2 -host linpc0,linpc1 java -cp `pwd` BcastIntMain
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

 mca_base_open failed
 --> Returned value -2 instead of OPAL_SUCCESS