Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] runtime error
From: Marcela Castro León (mcastrol_at_[hidden])
Date: 2011-02-11 06:11:15


Hello:

I've the same version ob Ubuntu 10.04. The original version was Ubuntu
Server 9.1 (64) and upgraded both of them to 10.04.
Yesterday I've updated and upgraded to the same level again. But I've got
the same error after that.
The machine are exactly the same, HP Compaq with inter Core I5.

Anyway I've compared the version of openmpi and gcc, and are the same too:
1.4.1-2 and 4.4.4.3 respectly. I'm attaching the exit of the dpkg-l on the
two system.

I would appreciate a lot any help to solve it.
Thank you.

Marcela.
2011/2/10 Jeff Squyres <jsquyres_at_[hidden]>

> I typically see these kinds of errors when there's an Open MPI version
> mismatch between the nodes, and/or if there are slightly different flavors
> of Linux installed on each node (i.e., you're technically in a heterogeneous
> situation, but you're trying to run a single application binary). Can you
> verify:
>
> 1. that you have exactly the same version of Open MPI installed on all
> nodes? (and that your application was compiled against that exact version)
>
> 2. that you have exactly the same OS/update level installed on all nodes
> (e.g., same versions of glibc, etc.)
>
>
> On Feb 10, 2011, at 3:13 AM, Marcela Castro León wrote:
>
> > Hello
> > I've a program that allways works fine, but i'm trying it on a new
> cluster and fails when I execute it on more than one machine.
> > I mean, if I execute alone on each host, everything works fine.
> > radic_at_santacruz:~/gaps/caso3-i1$ mpirun -np 3 ../test parcorto.txt
> >
> > But when I execute
> > radic_at_santacruz:~/gaps/caso3-i1$ mpirun -np 3 -machinefile
> /home/radic/mfile ../test parcorto.txt
> >
> > I get this error:
> >
> > mpirun has exited due to process rank 0 with PID 2132 on
> > node santacruz exiting without calling "finalize". This may
> > have caused other processes in the application to be
> > terminated by signals sent by mpirun (as reported here).
> >
> --------------------------------------------------------------------------
> >
> > Though the machinefile (mfile) had only one machine, the programs fails.
> > This is the current content:
> >
> > radic_at_santacruz:~/gaps/caso3-i1$ cat /home/radic/mfile
> > santacruz
> > chubut
> >
> > I've debug the program and the error occurs after proc0 do an
> > MPI_Recv(&nomproc,lennomproc,MPI_CHAR,i,tag,MPI_COMM_WORLD,&Stat);
> > from the remote process.
> >
> > I've done several test I'll mention:
> >
> > 1) Change the order on machinefile
> > radic_at_santacruz:~/gaps/caso3-i1$ cat /home/radic/mfile
> > chubut
> > santacruz
> >
> > In that case, I get this error:
> > [chubut:2194] *** An error occurred in MPI_Recv
> > [chubut:2194] *** on communicator MPI_COMM_WORLD
> > [chubut:2194] *** MPI_ERR_TRUNCATE: message truncated
> > [chubut:2194] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
> > and then
> >
> --------------------------------------------------------------------------
> > mpirun has exited due to process rank 0 with PID 2194 on
> > node chubut exiting without calling "finalize". This may
> > have caused other processes in the application to be
> > terminated by signals sent by mpirun (as reported here).
> >
> --------------------------------------------------------------------------
> >
> > 2) I've got the same error executing on host chubut intead of santacruz,
> > 3) a simple mpi programs like MPI_Hello world are working fine, but I
> suppose that are very simple program.
> >
> > radic_at_santacruz:~/gaps$ mpirun -np 3 -machinefile /home/radic/mfile
> MPI_Hello
> > Hola Mundo Hola Marce 1
> > Hola Mundo Hola Marce 0
> > Hola Mundo Hola Marce 2
> >
> >
> > This is the information you ask for tuntime problem.
> > a) radic_at_santacruz:~$ mpirun -version
> > mpirun (Open MPI) 1.4.1
> > b) i'm using ubuntu 10,04. I'm installing the packages using apt-get
> install, so, I don't have a config.log
> > c) The ompi_info --all is on the file ompi_info.zip
> > d) These are PATH and LD_LIBRARY_PATH
> > radic_at_santacruz:~$ echo $PATH
> > /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
> > radic_at_santacruz:~$ echo $LD_LIBRARY_PATH
> >
> >
> > Thank you very much.
> >
> > Marcela.
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>