Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] runtime error
From: Marcela Castro León (mcastrol_at_[hidden])
Date: 2011-02-10 06:13:10


Hello

> I've a program that allways works fine, but i'm trying it on a new cluster
> and fails when I execute it on more than one machine.
> I mean, if I execute alone on each host, everything works fine.
> radic_at_santacruz:~/gaps/caso3-i1$ mpirun -np 3 ../test parcorto.txt
>
> But when I execute
> radic_at_santacruz:~/gaps/caso3-i1$ mpirun -np 3 -machinefile
> /home/radic/mfile ../test parcorto.txt
>
> I get this error:
>
> mpirun has exited due to process rank 0 with PID 2132 on
> node santacruz exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> --------------------------------------------------------------------------
>
> Though the machinefile (mfile) had only one machine, the programs fails.
> This is the current content:
>
> radic_at_santacruz:~/gaps/caso3-i1$ cat /home/radic/mfile
> santacruz
> chubut
>
> I've debug the program and the error occurs after proc0 do an
> MPI_Recv(&nomproc,lennomproc,MPI_CHAR,i,tag,MPI_COMM_WORLD,&Stat);
> from the remote process.
>
> I've done several test I'll mention:
>
> 1) Change the order on machinefile
> radic_at_santacruz:~/gaps/caso3-i1$ cat /home/radic/mfile
> chubut
> santacruz
>
> In that case, I get this error:
> [chubut:2194] *** An error occurred in MPI_Recv
> [chubut:2194] *** on communicator MPI_COMM_WORLD
> [chubut:2194] *** MPI_ERR_TRUNCATE: message truncated
> [chubut:2194] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
> and then
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 0 with PID 2194 on
> node chubut exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> --------------------------------------------------------------------------
>
> 2) I've got the same error executing on host chubut intead of santacruz,
> 3) a simple mpi programs like MPI_Hello world are working fine, but I
> suppose that are very simple program.
>
> radic_at_santacruz:~/gaps$ mpirun -np 3 -machinefile /home/radic/mfile
> MPI_Hello
> Hola Mundo Hola Marce 1
> Hola Mundo Hola Marce 0
> Hola Mundo Hola Marce 2
>
>
> This is the information you ask for tuntime problem.
> a) radic_at_santacruz:~$ mpirun -version
> mpirun (Open MPI) 1.4.1
> b) i'm using ubuntu 10,04. I'm installing the packages using apt-get
> install, so, I don't have a config.log
> c) The ompi_info --all is on the file ompi_info.zip
> d) These are PATH and LD_LIBRARY_PATH
> radic_at_santacruz:~$ echo $PATH
> /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
> radic_at_santacruz:~$ echo $LD_LIBRARY_PATH
>
>
> Thank you very much.
>
> Marcela.
>