Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] ORTE_ERROR_LOG: Data unpack had inadequate space in file gpr_replica_cmd_processor.c at line 361
From: Ralph H Castain (rhc_at_[hidden])
Date: 2007-12-14 13:43:43


You can always run locally as it doesn't startup a new daemon - hence, there
are no communications involved, which is what is causing the error message.

Check the remote nodes (and your path on those nodes) to make sure that the
Open MPI version you would pickup is the same as the one on your head node.
I know you believe you are running with the same version, but you can be
surprised - people remove the other source, for example, but forget to
remove the libraries and binaries. Or their path when we ssh the daemons
points to a place where a different version is installed (remember, the path
is often different for a login vs ssh).

What environment are you operating in - are you using rsh to launch on the
remote nodes? Are the remote nodes the same architecture as the head node?

Ralph

On 12/14/07 9:59 AM, "Qiang Xu" <qxu2_at_[hidden]> wrote:

> Ralph:
>
> I did first install OpenMPI-1.2.3 and got the same error message.
> ORTE_ERROR_LOG: Data unpack had inadequate space in file dss/dss_unpack.c at
> line 90
> ORTE_ERROR_LOG: Data unpack had inadequate space in file
> gpr_replica_cmd_processor.c at line 361
>
> And after I reading the mailing list, I upgraded to OpenMPI-1.2.4.
> I remove the OpenMPI-1.2.3, but still show the same error message.
>
> Now I also upgraded to gcc4.1.1, so gfortran is the fortran compiler.
>
> ./configure --prefix=/home/qiang/OpenMPI-1.2.4/ CC=gcc F77=gfortran
> F90=gfortran
>
> [qiang_at_grid11 ~]$ gcc -v
> Using built-in specs.
> Target: i386-redhat-linux
> Configured with:
> ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info
> --enable-shared --enable-threads=posix --enable-checking=release
> --with-system-zlib
> --enable-__cxa_atexit --disable-libunwind-exceptions
> --with-gxx-include-dir=/usr/include/c++/3.4.3
> --enable-libgcj-multifile --enable-languages=c,c++,java,f95
> --enable-java-awt=gtk
> --disable-dssi --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre
> --with-cpu=generic
> --host=i386-redhat-linux
> Thread model: posix
> gcc version 4.1.1 20070105 (Red Hat 4.1.1-53)
>
> Still the problem is there. But I can run the NAS benchmark locally without
> specifying the machinefile.
> [qiang_at_compute-0-1 bin]$ mpirun -n 4 mg.B.4
>
>
> NAS Parallel Benchmarks 2.3 -- MG Benchmark
>
> No input file. Using compiled defaults
> Size: 256x256x256 (class B)
> Iterations: 20
> Number of processes: 4
>
> Initialization time: 6.783 seconds
>
> Benchmark completed
> VERIFICATION SUCCESSFUL
> L2 Norm is 0.180056440136E-05
> Error is 0.351679609371E-16
>
>
> MG Benchmark Completed.
> Class = B
> Size = 256x256x256
> Iterations = 20
> Time in seconds = 47.19
> Total processes = 4
> Compiled procs = 4
> Mop/s total = 412.43
> Mop/s/process = 103.11
> Operation type = floating point
> Verification = SUCCESSFUL
> Version = 2.3
> Compile date = 13 Dec 2007
>
> Compile options:
> MPIF77 = mpif77
> FLINK = mpif77
> FMPI_LIB = -L~/MyMPI/lib -lmpi_f77
> FMPI_INC = -I~/MyMPI/include
> FFLAGS = -O3
> FLINKFLAGS = (none)
> RAND = (none)
>
>
> Please send the results of this run to:
>
> NPB Development Team
> Internet: npb_at_[hidden]
>
> If email is not available, send this to:
>
> MS T27A-1
> NASA Ames Research Center
> Moffett Field, CA 94035-1000
>
> Fax: 415-604-3957
>
>
> If I try to use multiple nodes, I got the error messages:
> ORTE_ERROR_LOG: Data unpack had inadequate space in file dss/dss_unpack.c at
> line 90
> ORTE_ERROR_LOG: Data unpack had inadequate space in file
> gpr_replica_cmd_processor.c at line 361
>
> But only OpenMPI-1.2.4 was installed? Did I miss something?
>
> Qiang
>
>
>
>
>
>
> ----- Original Message -----
> From: "Ralph H Castain" <rhc_at_[hidden]>
> To: <users_at_[hidden]>; "Qiang Xu" <Qiang.Xu_at_[hidden]>
> Sent: Friday, December 14, 2007 7:34 AM
> Subject: Re: [OMPI users] ORTE_ERROR_LOG: Data unpack had inadequate space
> in file gpr_replica_cmd_processor.c at line 361
>
>
>> Hi Qiang
>>
>> This error message usually indicates that you have more than one Open MPI
>> installation around, and that the backend nodes are picking up a different
>> version than mpirun is using. Check to make sure that you have a
>> consistent
>> version across all the nodes.
>>
>> I also noted you were building with --enable-threads. As you've probably
>> seen on our discussion lists, remember that Open MPI isn't really thread
>> safe yet. I don't think that is the problem here, but wanted to be sure
>> you
>> were aware of the potential for problems.
>>
>> Ralph
>>
>>
>>
>> On 12/13/07 5:31 PM, "Qiang Xu" <Qiang.Xu_at_[hidden]> wrote:
>>
>>> I installed OpenMPI-1.2.4 on our cluster.
>>> Here is the compute node infor
>>>
>>> [qiang_at_compute-0-1 ~]$ uname -a
>>> Linux compute-0-1.local 2.6.9-42.0.2.ELsmp #1 SMP Wed Aug 23 00:17:26 CDT
>>> 2006
>>> i686 i686 i386 GNU/Linux
>>> [qiang_at_compute-0-1 bin]$ gcc -v
>>> Reading specs from /usr/lib/gcc/i386-redhat-linux/3.4.6/specs
>>> Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
>>> --infodir=/usr/share/info --enable-shared --enable-threads=posix
>>> --disable-checking --with-system-zlib --enable-__cxa_atexit
>>> --disable-libunwind-exceptions --enable-java-awt=gtk
>>> --host=i386-redhat-linux
>>> Thread model: posix
>>> gcc version 3.4.6 20060404 (Red Hat 3.4.6-8)
>>>
>>> Then I compiled NAS bechmarks, got some warning but went through.
>>> [qiang_at_compute-0-1 NPB2.3-MPI]$ make suite
>>> make[1]: Entering directory `/home/qiang/NPB2.3/NPB2.3-MPI'
>>> =========================================
>>> = NAS Parallel Benchmarks 2.3 =
>>> = MPI/F77/C =
>>> =========================================
>>>
>>> cd MG; make NPROCS=16 CLASS=B
>>> make[2]: Entering directory `/home/qiang/NPB2.3/NPB2.3-MPI/MG'
>>> make[3]: Entering directory `/home/qiang/NPB2.3/NPB2.3-MPI/sys'
>>> cc -g -o setparams setparams.c
>>> make[3]: Leaving directory `/home/qiang/NPB2.3/NPB2.3-MPI/sys'
>>> ../sys/setparams mg 16 B
>>> make.def modified. Rebuilding npbparams.h just in case
>>> rm -f npbparams.h
>>> ../sys/setparams mg 16 B
>>> mpif77 -c -I~/MyMPI/include mg.f
>>> mg.f: In subroutine `zran3':
>>> mg.f:1001: warning:
>>> call mpi_allreduce(rnmu,ss,1,dp_type,
>>> 1
>>> mg.f:2115: (continued):
>>> call mpi_allreduce(jg(0,i,1), jg_temp,4,MPI_INTEGER,
>>> 2
>>> Argument #1 of `mpi_allreduce' is one type at (2) but is some other type
>>> at
>>> (1) [info -f g77 M GLOBALS]
>>> mg.f:1001: warning:
>>> call mpi_allreduce(rnmu,ss,1,dp_type,
>>> 1
>>> mg.f:2115: (continued):
>>> call mpi_allreduce(jg(0,i,1), jg_temp,4,MPI_INTEGER,
>>> 2
>>> Argument #2 of `mpi_allreduce' is one type at (2) but is some other type
>>> at
>>> (1) [info -f g77 M GLOBALS]
>>> mg.f:1001: warning:
>>> call mpi_allreduce(rnmu,ss,1,dp_type,
>>> 1
>>> mg.f:2139: (continued):
>>> call mpi_allreduce(jg(0,i,0), jg_temp,4,MPI_INTEGER,
>>> 2
>>> Argument #1 of `mpi_allreduce' is one type at (2) but is some other type
>>> at
>>> (1) [info -f g77 M GLOBALS]
>>> mg.f:1001: warning:
>>> call mpi_allreduce(rnmu,ss,1,dp_type,
>>> 1
>>> mg.f:2139: (continued):
>>> call mpi_allreduce(jg(0,i,0), jg_temp,4,MPI_INTEGER,
>>> 2
>>> Argument #2 of `mpi_allreduce' is one type at (2) but is some other type
>>> at
>>> (1) [info -f g77 M GLOBALS]
>>> cd ../common; mpif77 -c -I~/MyMPI/include print_results.f
>>> cd ../common; mpif77 -c -I~/MyMPI/include randdp.f
>>> cd ../common; mpif77 -c -I~/MyMPI/include timers.f
>>> mpif77 -o ../bin/mg.B.16 mg.o ../common/print_results.o
>>> ../common/randdp.o
>>> ../common/timers.o -L~/MyMPI/lib -lmpi_f77
>>> make[2]: Leaving directory `/home/qiang/NPB2.3/NPB2.3-MPI/MG'
>>> make[1]: Leaving directory `/home/qiang/NPB2.3/NPB2.3-MPI'
>>> make[1]: Entering directory `/home/qiang/NPB2.3/NPB2.3-MPI'
>>> But when I tried to run it, I got the following error messages:
>>> [qiang_at_compute-0-1 bin]$ mpirun -machinefile m8 -n 16 mg.C.16
>>> [compute-0-1.local:11144] [0,0,0] ORTE_ERROR_LOG: Data unpack had
>>> inadequate
>>> space in file dss/dss_unpack.c at line 90
>>> [compute-0-1.local:11144] [0,0,0] ORTE_ERROR_LOG: Data unpack had
>>> inadequate
>>> space in file gpr_replica_cmd_processor.c at line 361
>>> I found some info on the mailling list, but it doesn't help for my case.
>>> Could anyone give me some advice? Or I have to upgrade the GNU compiler?
>>>
>>> Thanks.
>>>
>>> Qiang
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>>
>>
>
>