Hi,
I have built openmpi-1.9a1r27380 with Java support and implemented
a small program that sends some kind of "hello" with Send/Recv.
tyr java 164 make
mpijavac -d /home/fd1026/mpi_classfiles MsgSendRecvMain.java
...
Everything works fine, if I use Solaris 10 x86_84.
tyr java 165 mpiexec -np 3 -host sunpc0,sunpc1 \
java -cp $HOME/mpi_classfiles MsgSendRecvMain
Now 2 processes are sending greetings.
Greetings from process 2:
message tag: 3
message length: 6
message: sunpc1
Greetings from process 1:
message tag: 3
message length: 6
message: sunpc0
Everything works fine, if I use Solaris 10 Sparc.
tyr java 166 mpiexec -np 3 -host rs0,rs1 \
java -cp $HOME/mpi_classfiles MsgSendRecvMain
Now 2 processes are sending greetings.
Greetings from process 2:
message tag: 3
message length: 26
message: rs1.informatik.hs-fulda.de
Greetings from process 1:
message tag: 3
message length: 26
message: rs0.informatik.hs-fulda.de
The program breaks, if I use both systems.
tyr java 167 mpiexec -np 3 -host rs0,sunpc0 \
java -cp $HOME/mpi_classfiles MsgSendRecvMain
[rs0.informatik.hs-fulda.de:9621] *** An error occurred in MPI_Comm_dup
[rs0.informatik.hs-fulda.de:9621] *** reported by process [1976500225,0]
[rs0.informatik.hs-fulda.de:9621] *** on communicator MPI_COMM_WORLD
[rs0.informatik.hs-fulda.de:9621] *** MPI_ERR_INTERN: internal error
[rs0.informatik.hs-fulda.de:9621] *** MPI_ERRORS_ARE_FATAL (processes
in this communicator will now abort,
[rs0.informatik.hs-fulda.de:9621] *** and potentially your MPI job)
[tyr.informatik.hs-fulda.de:22491] 1 more process has sent help message
help-mpi-errors.txt / mpi_errors_are_fatal
[tyr.informatik.hs-fulda.de:22491] Set MCA parameter
"orte_base_help_aggregate" to 0 to see all help / error messages
The program breaks, if I use Linux x86_64.
tyr java 168 mpiexec -np 3 -host linpc0,linpc1 \
java -cp $HOME/mpi_classfiles MsgSendRecvMain
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
mca_base_open failed
--> Returned value -2 instead of OPAL_SUCCESS
--------------------------------------------------------------------------
...
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[linpc0:20277] Local abort before MPI_INIT completed successfully;
not able to aggregate error messages, and not able to guarantee that
all other processes were killed!
...
Please let me know if you need more information to track the problem.
Thank you very much for any help in advance.
Kind regards
Siegmar
|