Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] internal error with mpiJava in openmpi-1.9a1r27380
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-10-10 10:37:18


I haven't tried heterogeneous apps on the Java code yet - could well not
work. At the least, I would expect you need to compile your Java app
against the corresponding OMPI install on each architecture, and ensure the
right one gets run on each node. Even though it's a Java app, the classes
need to get linked against the proper OMPI code for that node.

As for Linux-only operation: it works fine for me. Did you remember to (a)
build mpiexec on those linux machines (as opposed to using the Solaris
version), and (b) recompile your app against that OMPI installation?

On Wed, Oct 10, 2012 at 5:42 AM, Siegmar Gross <
Siegmar.Gross_at_[hidden]> wrote:

> Hi,
>
> I have built openmpi-1.9a1r27380 with Java support and implemented
> a small program that sends some kind of "hello" with Send/Recv.
>
> tyr java 164 make
> mpijavac -d /home/fd1026/mpi_classfiles MsgSendRecvMain.java
> ...
>
> Everything works fine, if I use Solaris 10 x86_84.
>
> tyr java 165 mpiexec -np 3 -host sunpc0,sunpc1 \
> java -cp $HOME/mpi_classfiles MsgSendRecvMain
>
> Now 2 processes are sending greetings.
>
> Greetings from process 2:
> message tag: 3
> message length: 6
> message: sunpc1
>
> Greetings from process 1:
> message tag: 3
> message length: 6
> message: sunpc0
>
>
> Everything works fine, if I use Solaris 10 Sparc.
>
> tyr java 166 mpiexec -np 3 -host rs0,rs1 \
> java -cp $HOME/mpi_classfiles MsgSendRecvMain
>
> Now 2 processes are sending greetings.
>
> Greetings from process 2:
> message tag: 3
> message length: 26
> message: rs1.informatik.hs-fulda.de
>
> Greetings from process 1:
> message tag: 3
> message length: 26
> message: rs0.informatik.hs-fulda.de
>
>
> The program breaks, if I use both systems.
>
> tyr java 167 mpiexec -np 3 -host rs0,sunpc0 \
> java -cp $HOME/mpi_classfiles MsgSendRecvMain
> [rs0.informatik.hs-fulda.de:9621] *** An error occurred in MPI_Comm_dup
> [rs0.informatik.hs-fulda.de:9621] *** reported by process [1976500225,0]
> [rs0.informatik.hs-fulda.de:9621] *** on communicator MPI_COMM_WORLD
> [rs0.informatik.hs-fulda.de:9621] *** MPI_ERR_INTERN: internal error
> [rs0.informatik.hs-fulda.de:9621] *** MPI_ERRORS_ARE_FATAL (processes
> in this communicator will now abort,
> [rs0.informatik.hs-fulda.de:9621] *** and potentially your MPI job)
> [tyr.informatik.hs-fulda.de:22491] 1 more process has sent help message
> help-mpi-errors.txt / mpi_errors_are_fatal
> [tyr.informatik.hs-fulda.de:22491] Set MCA parameter
> "orte_base_help_aggregate" to 0 to see all help / error messages
>
>
> The program breaks, if I use Linux x86_64.
>
> tyr java 168 mpiexec -np 3 -host linpc0,linpc1 \
> java -cp $HOME/mpi_classfiles MsgSendRecvMain
> --------------------------------------------------------------------------
> It looks like opal_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during opal_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> mca_base_open failed
> --> Returned value -2 instead of OPAL_SUCCESS
> --------------------------------------------------------------------------
> ...
>
> *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> *** and potentially your MPI job)
> [linpc0:20277] Local abort before MPI_INIT completed successfully;
> not able to aggregate error messages, and not able to guarantee that
> all other processes were killed!
> ...
>
>
> Please let me know if you need more information to track the problem.
> Thank you very much for any help in advance.
>
>
> Kind regards
>
> Siegmar
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>