Yes.. The executables run initially and then gives the mentioned error in the first message!
i.e.
./mpirun -hostfile machines executable
--------------------------------------------------------------------------MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD
with errorcode 1.NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending onexactly when Open MPI kills them.
----------------------------------------------------------------------------------------------------------------------------------------------------
mpirun has exited due to process rank 2 with PID 15617 onnode sibar.pch.univie.ac.at exiting without calling "finalize". This may
have caused other processes in the application to beterminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------[2] Stack Traceback:
[0] CmiAbort+0x25 [0x8366f3e] [1] namd [0x830d4cd] [2] CmiHandleMessage+0x22 [0x8367c20]
[3] CsdScheduleForever+0x67 [0x8367dd2] [4] CsdScheduler+0x12 [0x8367d4c]
[5] _Z10slave_initiPPc+0x21 [0x80fa09d] [6] _ZN7BackEnd4initEiPPc+0x53 [0x80fa0f5]
[7] main+0x2e [0x80f65b6] [8] __libc_start_main+0xd3 [0x31cde3]
[9] __gxx_personality_v0+0x101 [0x80f3405][3] Stack Traceback:
[0] CmiAbort+0x25 [0x8366f3e] [1] namd [0x830d4cd] [2] CmiHandleMessage+0x22 [0x8367c20]
[3] CsdScheduleForever+0x67 [0x8367dd2] [4] CsdScheduler+0x12 [0x8367d4c]
[5] _Z10slave_initiPPc+0x21 [0x80fa09d] [6] _ZN7BackEnd4initEiPPc+0x53 [0x80fa0f5]
[7] main+0x2e [0x80f65b6] [8] __libc_start_main+0xd3 [0x137de3]
[9] __gxx_personality_v0+0x101 [0x80f3405]Running on MPI version: 2.1 multi-thread support: MPI_THREAD_SINGLE (max supported: MPI_THREAD_SINGLE)
cpu topology info is being gathered.2 unique compute nodes detected.
------------- Processor 2 Exiting: Called CmiAbort ------------Reason: Internal Error: Unknown-msg-type. Contact Developers.
------------- Processor 3 Exiting: Called CmiAbort ------------Reason: Internal Error: Unknown-msg-type. Contact Developers.
[studpc01.xxx.xxx.xx:15615] 1 more process has sent help message help-mpi-api.txt / mpi-abort[studpc01.xxx.xxx.xx:15615] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[studpc21.xx.xx.xx][[6986,1],0][btl_tcp_frag.c:124:mca_btl_tcp_frag_send] mca_btl_tcp_frag_send: writev failed: Connection reset by peer (104)
[studpc21.xx.xx.xx][[6986,1],0][btl_tcp_frag.c:124:mca_btl_tcp_frag_send] mca_btl_tcp_frag_send: writev failed: Connection reset by peer (104)
Yes, I put 64-bit executable on 1 machine (studpc21) & 32-bit executable on another machine (studpc01) with same name! But, I don't know whether they are being used separately or not. How can I check it?
Can we use this option " ./mpirun -hetero" for specifying the machines? The jobs run individually on each machine, but if used together, it doesn't!
Hope it will give some hint coming at the solution..