Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] runtime error
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2011-02-14 13:13:42


What happens if you try to mpirun a non-MPI program like, "date" or "hostname"?

On Feb 11, 2011, at 6:14 AM, Marcela Castro León wrote:

> Excuse me. I forgot the attaching.
>
> 2011/2/11 Marcela Castro León <mcastrol_at_[hidden]>
> Hello:
>
> I've the same version ob Ubuntu 10.04. The original version was Ubuntu Server 9.1 (64) and upgraded both of them to 10.04.
> Yesterday I've updated and upgraded to the same level again. But I've got the same error after that.
> The machine are exactly the same, HP Compaq with inter Core I5.
>
> Anyway I've compared the version of openmpi and gcc, and are the same too: 1.4.1-2 and 4.4.4.3 respectly. I'm attaching the exit of the dpkg-l on the two system.
>
> I would appreciate a lot any help to solve it.
> Thank you.
>
> Marcela.
> 2011/2/10 Jeff Squyres <jsquyres_at_[hidden]>
>
> I typically see these kinds of errors when there's an Open MPI version mismatch between the nodes, and/or if there are slightly different flavors of Linux installed on each node (i.e., you're technically in a heterogeneous situation, but you're trying to run a single application binary). Can you verify:
>
> 1. that you have exactly the same version of Open MPI installed on all nodes? (and that your application was compiled against that exact version)
>
> 2. that you have exactly the same OS/update level installed on all nodes (e.g., same versions of glibc, etc.)
>
>
> On Feb 10, 2011, at 3:13 AM, Marcela Castro León wrote:
>
> > Hello
> > I've a program that allways works fine, but i'm trying it on a new cluster and fails when I execute it on more than one machine.
> > I mean, if I execute alone on each host, everything works fine.
> > radic_at_santacruz:~/gaps/caso3-i1$ mpirun -np 3 ../test parcorto.txt
> >
> > But when I execute
> > radic_at_santacruz:~/gaps/caso3-i1$ mpirun -np 3 -machinefile /home/radic/mfile ../test parcorto.txt
> >
> > I get this error:
> >
> > mpirun has exited due to process rank 0 with PID 2132 on
> > node santacruz exiting without calling "finalize". This may
> > have caused other processes in the application to be
> > terminated by signals sent by mpirun (as reported here).
> > --------------------------------------------------------------------------
> >
> > Though the machinefile (mfile) had only one machine, the programs fails.
> > This is the current content:
> >
> > radic_at_santacruz:~/gaps/caso3-i1$ cat /home/radic/mfile
> > santacruz
> > chubut
> >
> > I've debug the program and the error occurs after proc0 do an
> > MPI_Recv(&nomproc,lennomproc,MPI_CHAR,i,tag,MPI_COMM_WORLD,&Stat);
> > from the remote process.
> >
> > I've done several test I'll mention:
> >
> > 1) Change the order on machinefile
> > radic_at_santacruz:~/gaps/caso3-i1$ cat /home/radic/mfile
> > chubut
> > santacruz
> >
> > In that case, I get this error:
> > [chubut:2194] *** An error occurred in MPI_Recv
> > [chubut:2194] *** on communicator MPI_COMM_WORLD
> > [chubut:2194] *** MPI_ERR_TRUNCATE: message truncated
> > [chubut:2194] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
> > and then
> > --------------------------------------------------------------------------
> > mpirun has exited due to process rank 0 with PID 2194 on
> > node chubut exiting without calling "finalize". This may
> > have caused other processes in the application to be
> > terminated by signals sent by mpirun (as reported here).
> > --------------------------------------------------------------------------
> >
> > 2) I've got the same error executing on host chubut intead of santacruz,
> > 3) a simple mpi programs like MPI_Hello world are working fine, but I suppose that are very simple program.
> >
> > radic_at_santacruz:~/gaps$ mpirun -np 3 -machinefile /home/radic/mfile MPI_Hello
> > Hola Mundo Hola Marce 1
> > Hola Mundo Hola Marce 0
> > Hola Mundo Hola Marce 2
> >
> >
> > This is the information you ask for tuntime problem.
> > a) radic_at_santacruz:~$ mpirun -version
> > mpirun (Open MPI) 1.4.1
> > b) i'm using ubuntu 10,04. I'm installing the packages using apt-get install, so, I don't have a config.log
> > c) The ompi_info --all is on the file ompi_info.zip
> > d) These are PATH and LD_LIBRARY_PATH
> > radic_at_santacruz:~$ echo $PATH
> > /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
> > radic_at_santacruz:~$ echo $LD_LIBRARY_PATH
> >
> >
> > Thank you very much.
> >
> > Marcela.
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> <scgcc><scompi><chgcc><chompi>

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/