Hello
I've a program that allways works fine, but i'm trying it on a new cluster and fails when I execute it on more than one machine.
I mean, if I execute alone on each host, everything works fine.
radic@santacruz:~/gaps/caso3-i1$ mpirun -np 3 ../test parcorto.txt
But when I execute
radic@santacruz:~/gaps/caso3-i1$ mpirun -np 3 -machinefile /home/radic/mfile ../test parcorto.txt
I get this error:
mpirun has exited due to process rank 0 with PID 2132 on
node santacruz exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
Though the machinefile (mfile) had only one machine, the programs fails.
This is the current content:
radic@santacruz:~/gaps/caso3-i1$ cat /home/radic/mfile
santacruz
chubut
I've debug the program and the error occurs after proc0 do an
MPI_Recv(&nomproc,lennomproc,MPI_CHAR,i,tag,MPI_COMM_WORLD,&Stat);
from the remote process.
I've done several test I'll mention:
1) Change the order on machinefile
radic@santacruz:~/gaps/caso3-i1$ cat /home/radic/mfile
chubut
santacruz
In that case, I get this error:
[chubut:2194] *** An error occurred in MPI_Recv
[chubut:2194] *** on communicator MPI_COMM_WORLD
[chubut:2194] *** MPI_ERR_TRUNCATE: message truncated
[chubut:2194] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
and then
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 2194 on
node chubut exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
2) I've got the same error executing on host chubut intead of santacruz,
3) a simple mpi programs like MPI_Hello world are working fine, but I suppose that are very simple program.
radic@santacruz:~/gaps$ mpirun -np 3 -machinefile /home/radic/mfile MPI_Hello
Hola Mundo Hola Marce 1
Hola Mundo Hola Marce 0
Hola Mundo Hola Marce 2
This is the information you ask for tuntime problem.
a) radic@santacruz:~$ mpirun -version
mpirun (Open MPI) 1.4.1
b) i'm using ubuntu 10,04. I'm installing the packages using apt-get install, so, I don't have a config.log
c) The ompi_info --all is on the file ompi_info.zip
d) These are PATH and LD_LIBRARY_PATH
radic@santacruz:~$ echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
radic@santacruz:~$ echo $LD_LIBRARY_PATH
Thank you very much.
Marcela.