Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Prototypes for Fortran MPI_ commands using 64-bit indexing
From: Jim Parker (jimparker96313_at_[hidden])
Date: 2013-10-31 13:58:41


Some additional info that may jog some solutions. Calls to MPI_SEND do not
cause memory corruption. Only calls to MPI_RECV. Since the main
difference is the fact that MPI_RECV needs a "status" array and SEND does
not, seems to indicate to me that something is wrong with status.

Also, I can run a C version of the helloWorld program with no errors.
However, int types are only 4-byte. To send 8byte integers, I define
tempInt as long int and pass MPI_LONG as a type.

@Jeff,
  I got a copy of the openmpi conf.log. See attached.

Cheers,
--Jim

On Wed, Oct 30, 2013 at 10:55 PM, Jim Parker <jimparker96313_at_[hidden]>wrote:

> Ok, all, where to begin...
>
> Perhaps I should start with the most pressing issue for me. I need 64-bit
> indexing
>
> @Martin,
> you indicated that even if I get this up and running, the MPI library
> still uses signed 32-bit ints to count (your term), or index (my term) the
> recvbuffer lengths. More concretely,
> in a call to MPI_Allgatherv( buffer, count, MPI_Integer, recvbuf,
> recv-count, displ, MPI_integer, MPI_COMM_WORLD, status, mpierr): count,
> recvcounts, and displs must be 32-bit integers, not 64-bit. Actually, all
> I need is displs to hold 64-bit values...
> If this is true, then compiling OpenMPI this way is not a solution. I'll
> have to restructure my code to collect 31-bit chunks...
> Not that it matters, but I'm not using DIRAC, but a custom code to compute
> circuit analyses.
>
> @Jeff,
> Interesting, your runtime behavior has a different error than mine. You
> have problems with the passed variable tempInt, which would make sense for
> the reasons you gave. However, my problem involves the fact that the local
> variable "rank" gets overwritten by a memory corruption after MPI_RECV is
> called.
>
> Re: config.log. I will try to have the admin guy recompile tomorrow and
> see if I can get the log for you.
>
> BTW, I'm using the gcc 4.7.2 compiler suite on a Rocks 5.4 HPC cluster. I
> use the options -m64 and -fdefault-integer-8
>
> Cheers,
> --Jim
>
>
>
> On Wed, Oct 30, 2013 at 7:36 PM, Martin Siegert <siegert_at_[hidden]> wrote:
>
>> Hi Jim,
>>
>> I have quite a bit experience with compiling openmpi for dirac.
>> Here is what I use to configure openmpi:
>>
>> ./configure --prefix=$instdir \
>> --disable-silent-rules \
>> --enable-mpirun-prefix-by-default \
>> --with-threads=posix \
>> --enable-cxx-exceptions \
>> --with-tm=$torquedir \
>> --with-wrapper-ldflags="-Wl,-rpath,${instdir}/lib" \
>> --with-openib \
>> --with-hwloc=$hwlocdir \
>> CC=gcc \
>> CXX=g++ \
>> FC="$FC" \
>> F77="$FC" \
>> CFLAGS="-O3" \
>> CXXFLAGS="-O3" \
>> FFLAGS="-O3 $I8FLAG" \
>> FCFLAGS="-O3 $I8FLAG"
>>
>> You need to set FC to either ifort or gfortran (those are the two
>> compilers
>> that I have used) and set I8FLAG to -fdefault-integer-8 for gfortran or
>> -i8 for ifort.
>> Set torquedir to the directory where torque is installed ($torquedir/lib
>> must contain libtorque.so), if you are running jobs under torque;
>> otherwise
>> remove the --with-tm=... line.
>> Set hwlocdir to the directory where you have hwloc installed. You many not
>> need the -with-hwloc=... option because openmpi comes with a hwloc version
>> (I don't have experience with that because we install hwloc
>> independently).
>> Set instdir to the directory where you what to install openmpi.
>> You may or may not need the --with-openib option depending on whether
>> you have an Infiniband interconnect.
>>
>> After configure/make/make install this so compiled version can be used
>> with dirac without changing the dirac source code.
>> (there is one caveat: you should make sure that all "count" variables
>> in MPI calls in dirac are smaller than 2^31-1. I have run into a few cases
>> when that is not the case; this problem can be overcome by replacing
>> MPI_Allreduce calls in dirac with a wrapper that calls MPI_Allreduce
>> repeatedly). This is what I use to setup dirac:
>>
>> export PATH=$instdir/bin
>> ./setup --prefix=$diracinstdir \
>> --fc=mpif90 \
>> --cc=mpicc \
>> --int64 \
>> --explicit-libs="-lmkl_intel_ilp64 -lmkl_sequential -lmkl_core"
>>
>> where $instdir is the directory where you installed openmpi from above.
>>
>> I would never use the so-compiled openmpi version for anything other
>> than dirac though. I am not saying that it cannot work (at a minimum
>> you need to compile Fortran programs with the appropriate I8FLAG),
>> but it is an unnecessary complication: I have not encountered a piece
>> of software other than dirac that requires this.
>>
>> Cheers,
>> Martin
>>
>> --
>> Martin Siegert
>> Head, Research Computing
>> WestGrid/ComputeCanada Site Lead
>> Simon Fraser University
>> Burnaby, British Columbia
>> Canada
>>
>> On Wed, Oct 30, 2013 at 06:00:56PM -0500, Jim Parker wrote:
>> >
>> > Jeff,
>> > Here's what I know:
>> > 1. Checked FAQs. Done
>> > 2. Version 1.6.5
>> > 3. config.log file has been removed by the sysadmin...
>> > 4. ompi_info -a from head node is in attached as headnode.out
>> > 5. N/A
>> > 6. compute node info in attached as compute-x-yy.out
>> > 7. As discussed, local variables are being overwritten after calls to
>> > MPI_RECV from Fortran code
>> > 8. ifconfig output from head node and computes listed as
>> *-ifconfig.out
>> > Cheers,
>> > --Jim
>> >
>> > On Wed, Oct 30, 2013 at 5:29 PM, Jeff Squyres (jsquyres)
>> > <[1]jsquyres_at_[hidden]> wrote:
>> >
>> > Can you send the information listed here:
>> > [2]http://www.open-mpi.org/community/help/
>> >
>> > On Oct 30, 2013, at 6:22 PM, Jim Parker <[3]jimparker96313_at_[hidden]
>> >
>> > wrote:
>> > > Jeff and Ralph,
>> > > Ok, I downshifted to a helloWorld example (attached), bottom line
>> > after I hit the MPI_Recv call, my local variable (rank) gets borked.
>> > >
>> > > I have compiled with -m64 -fdefault-integer-8 and even have
>> assigned
>> > kind=8 to the integers (which would be the preferred method in my
>> case)
>> > >
>> > > Your help is appreciated.
>> > >
>> > > Cheers,
>> > > --Jim
>> > >
>> > >
>> > >
>> > > On Wed, Oct 30, 2013 at 4:49 PM, Jeff Squyres (jsquyres)
>> > <[4]jsquyres_at_[hidden]> wrote:
>> > > On Oct 30, 2013, at 4:35 PM, Jim Parker <[5]
>> jimparker96313_at_[hidden]>
>> > wrote:
>> > >
>> > > > I have recently built a cluster that uses the 64-bit indexing
>> > feature of OpenMPI following the directions at
>> > > >
>> > [6]
>> http://wiki.chem.vu.nl/dirac/index.php/How_to_build_MPI_libraries_fo
>> > r_64-bit_integers
>> > >
>> > > That should be correct (i.e., passing -i8 in FFLAGS and FCFLAGS for
>> > OMPI 1.6.x).
>> > >
>> > > > My question is what are the new prototypes for the MPI calls ?
>> > > > specifically
>> > > > MPI_RECV
>> > > > MPI_Allgathterv
>> > >
>> > > They're the same as they've always been.
>> > >
>> > > The magic is that the -i8 flag tells the compiler "make all Fortran
>> > INTEGERs be 8 bytes, not (the default) 4." So Ralph's answer was
>> > correct in that all the MPI parameters are INTEGERs -- but you can
>> tell
>> > the compiler that all INTEGERs are 8 bytes, not 4, and therefore get
>> > "large" integers.
>> > >
>> > > Note that this means that you need to compile your application with
>> > -i8, too. That will make *your* INTEGERs also be 8 bytes, and then
>> > you'll match what Open MPI is doing.
>> > >
>> > > > I'm curious because some off my local variables get killed (set
>> to
>> > null) upon my first call to MPI_RECV. Typically, this is due (in
>> > Fortran) to someone not setting the 'status' variable to an
>> appropriate
>> > array size.
>> > >
>> > > If you didn't compile your application with -i8, this could well be
>> > because your application is treating INTEGERs as 4 bytes, but OMPI is
>> > treating INTEGERs as 8 bytes. Nothing good can come from that.
>> > >
>> > > If you *did* compile your application with -i8 and you're seeing
>> this
>> > kind of wonkyness, we should dig deeper and see what's going on.
>> > >
>> > > > My review of mpif.h and mpi.h seem to indicate that the functions
>> > are defined as C int types and therefore , I assume, the coercion
>> > during the compile makes the library support 64-bit indexing. ie.
>> int
>> > -> long int
>> > >
>> > > FWIW: We actually define a type MPI_Fint; its actual type is
>> > determined by configure (int or long int, IIRC). When your Fortran
>> > code calls C, we use the MPI_Fint type for parameters, and so it will
>> > be either a 4 or 8 byte integer type.
>> > >
>> > > --
>> > > Jeff Squyres
>> > > [7]jsquyres_at_[hidden]
>> > > For corporate legal information go to:
>> > [8]http://www.cisco.com/web/about/doing_business/legal/cri/
>> > >
>> > > _______________________________________________
>> > > users mailing list
>> > > [9]users_at_[hidden]
>> > > [10]http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > >
>> >
>> > >
>> >
>> <mpi-test-64bit.tar.bz2>____________________________________________
>> > ___
>> >
>> > > users mailing list
>> > > [11]users_at_[hidden]
>> > > [12]http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > --
>> > Jeff Squyres
>> > [13]jsquyres_at_[hidden]
>> > For corporate legal information go to:
>> > [14]http://www.cisco.com/web/about/doing_business/legal/cri/
>> > _______________________________________________
>> > users mailing list
>> > [15]users_at_[hidden]
>> > [16]http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>> > References
>> >
>> > 1. mailto:jsquyres_at_[hidden]
>> > 2. http://www.open-mpi.org/community/help/
>> > 3. mailto:jimparker96313_at_[hidden]
>> > 4. mailto:jsquyres_at_[hidden]
>> > 5. mailto:jimparker96313_at_[hidden]
>> > 6.
>> http://wiki.chem.vu.nl/dirac/index.php/How_to_build_MPI_libraries_for_64-bit_integers
>> > 7. mailto:jsquyres_at_[hidden]
>> > 8. http://www.cisco.com/web/about/doing_business/legal/cri/
>> > 9. mailto:users_at_[hidden]
>> > 10. http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > 11. mailto:users_at_[hidden]
>> > 12. http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > 13. mailto:jsquyres_at_[hidden]
>> > 14. http://www.cisco.com/web/about/doing_business/legal/cri/
>> > 15. mailto:users_at_[hidden]
>> > 16. http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> > _______________________________________________
>> > users mailing list
>> > users_at_[hidden]
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>