Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Can't get a fully functional openmpi build on Mac OSX Mavericks
From: Ronald Cohen (rhcohen_at_[hidden])
Date: 2014-01-17 12:39:24


Thanks, I've just gotten an email with some suggestions (and promise of
more help) from the HDF5 support team. I will report back here, as it may
be of interest to others trying to build hdf5 on mavericks.

On Fri, Jan 17, 2014 at 9:08 AM, Ralph Castain <rhc_at_[hidden]> wrote:

> Afraid I have no idea, but hopefully someone else here with experience
> with HDF5 can chime in?
>
>
> On Jan 17, 2014, at 9:03 AM, Ronald Cohen <rhcohen_at_[hidden]> wrote:
>
> Still a timely response, thank you. The particular problem I noted
> hasn't recurred; for reasons I will explain shortly I had to rebuild
> openmpi again, and this time Sample_mpio.c compiled and ran successfully
> from the start.
>
> But now my problem is trying to get parallel HDF5 to run. In my first
> attempt to build HDF5 it failed in the load stage because of unsatisifed
> externals from openmpi, and I deduced the problem was having built openmpi
> with --disable-static. So I rebuilt with --enable-static and
> --disable-dlopen (emulating a successful openmpi + hdf5 combination I had
> built on Snow Leopard). Once again openmpi passed its make check's, and
> as noted above the Sample_mpio.c test compiled and ran fine. And the
> parallel hdf5 configure and make steps ran successfully. But when I ran
> make check for hdf5, the serial tests passed but none of the parallel tests
> did. Over a million test failures! Error messages like:
>
> Proc 0: *** MPIO File size range test...
> --------------------------------
> MPI_Offset is signed 8 bytes integeral type
> MPIO GB file write test MPItest.h5
> MPIO GB file write test MPItest.h5
> MPIO GB file write test MPItest.h5
> MPIO GB file write test MPItest.h5
> MPIO GB file write test MPItest.h5
> MPIO GB file write test MPItest.h5
> MPIO GB file read test MPItest.h5
> MPIO GB file read test MPItest.h5
> MPIO GB file read test MPItest.h5
> MPIO GB file read test MPItest.h5
> proc 3: found data error at [2141192192+0], expect -6, got 5
> proc 3: found data error at [2141192192+1], expect -6, got 5
>
> And -- the specific errors reported, which processor, which location, and
> the total number of errors changes if I rerun make check.
>
> I've sent configure, make and make check logs to the HDF5 help desk but
> haven't gotten a response.
>
> I am now configuring openmpi (still 1.7.4rc1) with:
>
> ./configure --prefix=/usr/local/openmpi CC=gcc CXX=g++ FC=gfortran
> F77=gfortran --enable-static --with-pic --disable-dlopen
> --enable-mpirun-prefix-by-default
>
> and configuring HDF5 (version 1.8.12) with:
>
> configure --prefix=/usr/local/hdf5/par CC=mpicc CFLAGS=-fPIC FC=mpif90
> FCFLAGS=-fPIC CXX=mpicxx CXXFLAGS=-fPIC --enable-parallel --enable-fortran
>
> This is the combination that worked for me with Snow Leopard (though it
> was then earlier versions of both openmpi and hdf5.)
>
> If it matters, the gcc is the stock one with Mavericks' XCode, and
> gfortran is 4.9.0.
>
> (I just noticed that the mpi fortran wrapper is now mpifort, but I also
> see that mpif90 is still there and is a just link to mpifort.)
>
> Any suggestions?
>
>
> On Fri, Jan 17, 2014 at 8:14 AM, Ralph Castain <rhc_at_[hidden]> wrote:
>
>> sorry for delayed response - just getting back from travel. I don't know
>> why you would get that behavior other than a race condition. Afraid that
>> code path is foreign to me, but perhaps one of the folks in the MPI-IO area
>> can respond
>>
>>
>> On Jan 15, 2014, at 4:26 PM, Ronald Cohen <rhcohen_at_[hidden]> wrote:
>>
>> Update: I reconfigured with enable_io_romio=yes, and this time -- mostly
>> -- the test using Sample_mpio.c passes. Oddly the very first time I
>> tried I got errors:
>>
>> % mpirun -np 2 sampleio
>> Proc 1: hostname=Ron-Cohen-MBP.local
>> Testing simple C MPIO program with 2 processes accessing file
>> ./mpitest.data
>> (Filename can be specified via program argument)
>> Proc 0: hostname=Ron-Cohen-MBP.local
>> Proc 1: read data[0:1] got 0, expect 1
>> Proc 1: read data[0:2] got 0, expect 2
>> Proc 1: read data[0:3] got 0, expect 3
>> Proc 1: read data[0:4] got 0, expect 4
>> Proc 1: read data[0:5] got 0, expect 5
>> Proc 1: read data[0:6] got 0, expect 6
>> Proc 1: read data[0:7] got 0, expect 7
>> Proc 1: read data[0:8] got 0, expect 8
>> Proc 1: read data[0:9] got 0, expect 9
>> Proc 1: read data[1:0] got 0, expect 10
>> Proc 1: read data[1:1] got 0, expect 11
>> Proc 1: read data[1:2] got 0, expect 12
>> Proc 1: read data[1:3] got 0, expect 13
>> Proc 1: read data[1:4] got 0, expect 14
>> Proc 1: read data[1:5] got 0, expect 15
>> Proc 1: read data[1:6] got 0, expect 16
>> Proc 1: read data[1:7] got 0, expect 17
>> Proc 1: read data[1:8] got 0, expect 18
>> Proc 1: read data[1:9] got 0, expect 19
>> --------------------------------------------------------------------------
>> MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD
>> with errorcode 1.
>>
>> But when I reran the same mpirun command, the test was successful. And
>> deleting the executable and recompiling and then again running the same
>> mpirun command, the test was successful. Can someone explain that?
>>
>>
>>
>>
>> On Wed, Jan 15, 2014 at 1:16 PM, Ronald Cohen <rhcohen_at_[hidden]> wrote:
>>
>>> Aha. I guess I didn't know what the io-romio option does. If you
>>> look at my config.log you will see my configure line included
>>> --disable-io-romio. Guess I should change --disable to --enable.
>>>
>>> You seem to imply that the nightly build is stable enough that I should
>>> probably switch to that rather than 1.7.4rc1. Am I reading between the
>>> lines correctly?
>>>
>>>
>>>
>>> On Wed, Jan 15, 2014 at 10:56 AM, Ralph Castain <rhc_at_[hidden]>wrote:
>>>
>>>> Oh, a word of caution on those config params - you might need to check
>>>> to ensure I don't disable romio in them. I don't normally build it as I
>>>> don't use it. Since that is what you are trying to use, just change the
>>>> "no" to "yes" (or delete that line altogether) and it will build.
>>>>
>>>>
>>>>
>>>> On Wed, Jan 15, 2014 at 10:53 AM, Ralph Castain <rhc_at_[hidden]>wrote:
>>>>
>>>>> You can find my configure options in the OMPI distribution at
>>>>> contrib/platform/intel/bend/mac. You are welcome to use them - just
>>>>> configure --with-platform=intel/bend/mac
>>>>>
>>>>> I work on the developer's trunk, of course, but also run the head of
>>>>> the 1.7.4 branch (essentially the nightly tarball) on a fairly regular
>>>>> basis.
>>>>>
>>>>> As for the opal_bitmap test: it wouldn't surprise me if that one was
>>>>> stale. I can check on it later tonight, but I'd suspect that the test is
>>>>> bad as we use that class in the code base and haven't seen an issue.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jan 15, 2014 at 10:49 AM, Ronald Cohen <rhcohen_at_[hidden]>wrote:
>>>>>
>>>>>> Ralph,
>>>>>>
>>>>>> I just sent out another post with the c file attached.
>>>>>>
>>>>>> If you can get that to work, and even if you can't can you tell me
>>>>>> what configure options you use, and what version of open-mpi? Thanks.
>>>>>>
>>>>>> Ron
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 15, 2014 at 10:36 AM, Ralph Castain <rhc_at_[hidden]>wrote:
>>>>>>
>>>>>>> BTW: could you send me your sample test code?
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jan 15, 2014 at 10:34 AM, Ralph Castain <rhc_at_[hidden]>wrote:
>>>>>>>
>>>>>>>> I regularly build on Mavericks and run without problem, though I
>>>>>>>> haven't tried a parallel IO app. I'll give yours a try later, when I get
>>>>>>>> back to my Mac.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jan 15, 2014 at 10:04 AM, Ronald Cohen <rhcohen_at_[hidden]>wrote:
>>>>>>>>
>>>>>>>>> I have been struggling trying to get a usable build of openmpi on
>>>>>>>>> Mac OSX Mavericks (10.9.1). I can get openmpi to configure and build
>>>>>>>>> without error, but have problems after that which depend on the openmpi
>>>>>>>>> version.
>>>>>>>>>
>>>>>>>>> With 1.6.5, make check fails the opal_datatype_test, ddt_test, and
>>>>>>>>> ddt_raw tests. The various atomic_* tests pass. See checklogs_1.6.5,
>>>>>>>>> attached as a .gz file.
>>>>>>>>>
>>>>>>>>> Following suggestions from openmpi discussions I tried openmpi
>>>>>>>>> version 1.7.4rc1. In this case make check indicates all tests passed. But
>>>>>>>>> when I proceeded to try to build a parallel code (parallel HDF5) it
>>>>>>>>> failed. Following an email exchange with the HDF5 support people, they
>>>>>>>>> suggested I try to compile and run the attached bit of simple code
>>>>>>>>> Sample_mpio.c (which they supplied) which does not use any HDF5, but just
>>>>>>>>> attempts a parallel write to a file and parallel read. That test failed
>>>>>>>>> when requesting more than 1 processor -- which they say indicates a failure
>>>>>>>>> of the openmpi installation. The error message was:
>>>>>>>>>
>>>>>>>>> MPI_INIT: argc 1
>>>>>>>>> MPI_INIT: argc 1
>>>>>>>>> Testing simple C MPIO program with 2 processes accessing file
>>>>>>>>> ./mpitest.data
>>>>>>>>> (Filename can be specified via program argument)
>>>>>>>>> Proc 0: hostname=Ron-Cohen-MBP.local
>>>>>>>>> Proc 1: hostname=Ron-Cohen-MBP.local
>>>>>>>>> MPI_BARRIER[0]: comm MPI_COMM_WORLD
>>>>>>>>> MPI_BARRIER[1]: comm MPI_COMM_WORLD
>>>>>>>>> Proc 0: MPI_File_open with MPI_MODE_EXCL failed (MPI_ERR_FILE:
>>>>>>>>> invalid file)
>>>>>>>>> MPI_ABORT[0]: comm MPI_COMM_WORLD errorcode 1
>>>>>>>>> MPI_BCAST[1]: buffer 7fff5a483048 count 1 datatype MPI_INT root 0
>>>>>>>>> comm MPI_COMM_WORLD
>>>>>>>>>
>>>>>>>>> I then went back to my openmpi directories and tried running some
>>>>>>>>> of the individual tests in the test and examples directories. In
>>>>>>>>> particular in test/class I found one test that seem to not be run as part
>>>>>>>>> of make check which failed, even with one processor; this is opal_bitmap.
>>>>>>>>> Not sure if this is because 1.7.4rc1 is incomplete, or there is something
>>>>>>>>> wrong with the installation, or maybe a 32 vs 64 bit thing? The error
>>>>>>>>> message is
>>>>>>>>>
>>>>>>>>> mpirun detected that one or more processes exited with non-zero
>>>>>>>>> status, thus causing the job to be terminated. The first process to do so
>>>>>>>>> was:
>>>>>>>>>
>>>>>>>>> Process name: [[48805,1],0]
>>>>>>>>> Exit code: 255
>>>>>>>>>
>>>>>>>>> Any suggestions?
>>>>>>>>>
>>>>>>>>> More generally has anyone out there gotten an openmpi build on
>>>>>>>>> Mavericks to work with sufficient success that they can get the attached
>>>>>>>>> Sample_mpio.c (or better yet, parallel HDF5) to build?
>>>>>>>>>
>>>>>>>>> Details: Running Mac OS X 10.9.1 on a mid-2009 Macbook pro with 4
>>>>>>>>> GB memory; tried openmpi 1.6.5 and 1.7.4rc1. Built openmpi against the
>>>>>>>>> stock gcc that comes with XCode 5.0.2, and gfortran 4.9.0.
>>>>>>>>>
>>>>>>>>> Files attached: config.log.gz, openmpialllog.gz (output of running
>>>>>>>>> ompi_info --all), checklog2.gz (output of make.check in top openmpi
>>>>>>>>> directory).
>>>>>>>>>
>>>>>>>>> I am not attaching logs of make and install since those seem to
>>>>>>>>> have been successful, but can generate those if that would be helpful.
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> users_at_[hidden]
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>