Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] runtime error
From: Marcela Castro León (mcastrol_at_[hidden])
Date: 2011-02-11 06:14:15


Excuse me. I forgot the attaching.

2011/2/11 Marcela Castro León <mcastrol_at_[hidden]>

> Hello:
>
> I've the same version ob Ubuntu 10.04. The original version was Ubuntu
> Server 9.1 (64) and upgraded both of them to 10.04.
> Yesterday I've updated and upgraded to the same level again. But I've got
> the same error after that.
> The machine are exactly the same, HP Compaq with inter Core I5.
>
> Anyway I've compared the version of openmpi and gcc, and are the same too:
> 1.4.1-2 and 4.4.4.3 respectly. I'm attaching the exit of the dpkg-l on the
> two system.
>
> I would appreciate a lot any help to solve it.
> Thank you.
>
> Marcela.
> 2011/2/10 Jeff Squyres <jsquyres_at_[hidden]>
>
> I typically see these kinds of errors when there's an Open MPI version
>> mismatch between the nodes, and/or if there are slightly different flavors
>> of Linux installed on each node (i.e., you're technically in a heterogeneous
>> situation, but you're trying to run a single application binary). Can you
>> verify:
>>
>> 1. that you have exactly the same version of Open MPI installed on all
>> nodes? (and that your application was compiled against that exact version)
>>
>> 2. that you have exactly the same OS/update level installed on all nodes
>> (e.g., same versions of glibc, etc.)
>>
>>
>> On Feb 10, 2011, at 3:13 AM, Marcela Castro León wrote:
>>
>> > Hello
>> > I've a program that allways works fine, but i'm trying it on a new
>> cluster and fails when I execute it on more than one machine.
>> > I mean, if I execute alone on each host, everything works fine.
>> > radic_at_santacruz:~/gaps/caso3-i1$ mpirun -np 3 ../test parcorto.txt
>> >
>> > But when I execute
>> > radic_at_santacruz:~/gaps/caso3-i1$ mpirun -np 3 -machinefile
>> /home/radic/mfile ../test parcorto.txt
>> >
>> > I get this error:
>> >
>> > mpirun has exited due to process rank 0 with PID 2132 on
>> > node santacruz exiting without calling "finalize". This may
>> > have caused other processes in the application to be
>> > terminated by signals sent by mpirun (as reported here).
>> >
>> --------------------------------------------------------------------------
>> >
>> > Though the machinefile (mfile) had only one machine, the programs fails.
>> > This is the current content:
>> >
>> > radic_at_santacruz:~/gaps/caso3-i1$ cat /home/radic/mfile
>> > santacruz
>> > chubut
>> >
>> > I've debug the program and the error occurs after proc0 do an
>> > MPI_Recv(&nomproc,lennomproc,MPI_CHAR,i,tag,MPI_COMM_WORLD,&Stat);
>> > from the remote process.
>> >
>> > I've done several test I'll mention:
>> >
>> > 1) Change the order on machinefile
>> > radic_at_santacruz:~/gaps/caso3-i1$ cat /home/radic/mfile
>> > chubut
>> > santacruz
>> >
>> > In that case, I get this error:
>> > [chubut:2194] *** An error occurred in MPI_Recv
>> > [chubut:2194] *** on communicator MPI_COMM_WORLD
>> > [chubut:2194] *** MPI_ERR_TRUNCATE: message truncated
>> > [chubut:2194] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
>> > and then
>> >
>> --------------------------------------------------------------------------
>> > mpirun has exited due to process rank 0 with PID 2194 on
>> > node chubut exiting without calling "finalize". This may
>> > have caused other processes in the application to be
>> > terminated by signals sent by mpirun (as reported here).
>> >
>> --------------------------------------------------------------------------
>> >
>> > 2) I've got the same error executing on host chubut intead of santacruz,
>> > 3) a simple mpi programs like MPI_Hello world are working fine, but I
>> suppose that are very simple program.
>> >
>> > radic_at_santacruz:~/gaps$ mpirun -np 3 -machinefile /home/radic/mfile
>> MPI_Hello
>> > Hola Mundo Hola Marce 1
>> > Hola Mundo Hola Marce 0
>> > Hola Mundo Hola Marce 2
>> >
>> >
>> > This is the information you ask for tuntime problem.
>> > a) radic_at_santacruz:~$ mpirun -version
>> > mpirun (Open MPI) 1.4.1
>> > b) i'm using ubuntu 10,04. I'm installing the packages using apt-get
>> install, so, I don't have a config.log
>> > c) The ompi_info --all is on the file ompi_info.zip
>> > d) These are PATH and LD_LIBRARY_PATH
>> > radic_at_santacruz:~$ echo $PATH
>> > /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
>> > radic_at_santacruz:~$ echo $LD_LIBRARY_PATH
>> >
>> >
>> > Thank you very much.
>> >
>> > Marcela.
>> >
>> > _______________________________________________
>> > users mailing list
>> > users_at_[hidden]
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> --
>> Jeff Squyres
>> jsquyres_at_[hidden]
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>




  • application/octet-stream attachment: scgcc

  • application/octet-stream attachment: scompi

  • application/octet-stream attachment: chgcc

  • application/octet-stream attachment: chompi