Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: Bring the lastest ROMIO version from MPICH2-1.3 into the trunk
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-12-14 18:02:10


Sorry for the delay; travel got in the way.

It was quite difficult to pull from your repo because you committed about a dozen generated files that all conflicted with mine (assumedly we have slightly different versions of flex and whatnot).

I tried to push back some minor changes but I don't have permission -- can you grant me write perms?

[15:01] svbu-mpi:~/hg/new-romio-for-openmpi % hg push
pushing to ssh://hg@bitbucket.org/devezep/new-romio-for-openmpi
searching for changes
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 3 changesets with 1 changes to 1 files
remote: error: pretxnchangegroup.bb_perm hook failed: You're not allowed to write to this repository.
remote: transaction abort!
remote: rollback completed
remote: abort: You're not allowed to write to this repository.
[15:01] svbu-mpi:~/hg/new-romio-for-openmpi %

On Dec 6, 2010, at 8:42 AM, Pascal Deveze wrote:

> Jeff,
>
> I removed ompi/mca/io/romio/romio/acinclude.m4. I put "autoreconf -ivf -I confdb" in autogen.sh. And I "chmod +x autogen.sh" (my
> stupid error is that this file wasn't executable).
> And all is now OK.
> These modifications have been pushed in bitbucket.
>
> I tried to run the ROMIO tests and got an error in ompi/mpi/c/profile/MPI_File_set_errhandler.c:
> OBJ_RELEASE(tmp) is calling an assertion:
>
> pfile_set_errhandler.c:75: PMPI_File_set_errhandler: Assertion `((0xdeafbeedULL << 32) + 0xdeafbeedULL) == ((opal_object_t *) (tmp))->obj_magic_id' failed.
> [cuzco10:10336] *** Process received signal ***
> [cuzco10:10336] Signal: Aborted (6)
> [cuzco10:10336] Signal code: (-6)
> [cuzco10:10336] [ 0] /lib64/libpthread.so.0() [0x3e8560f440]
> [cuzco10:10336] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x3e852329c5]
> [cuzco10:10336] [ 2] /lib64/libc.so.6(abort+0x175) [0x3e852341a5]
> [cuzco10:10336] [ 3] /lib64/libc.so.6(__assert_fail+0xf5) [0x3e8522b945]
> [cuzco10:10336] [ 4] /home_nfs/devezep/ATLAS/openmpi-default/lib/libmpi.so.0(MPI_File_set_errhandler+0x1e4) [0x7fcbee89d1d4]
> [cuzco10:10336] [ 5] /home_nfs/devezep/ATLAS/openmpi-default/lib/openmpi/mca_io_romio.so(mca_io_romio_dist_MPI_File_close+0x12a) [0x7fcbe7dbc4ea]
> [cuzco10:10336] [ 6] /home_nfs/devezep/ATLAS/openmpi-default/lib/openmpi/mca_io_romio.so(+0x9764) [0x7fcbe7d8e764]
> [cuzco10:10336] [ 7] /home_nfs/devezep/ATLAS/openmpi-default/lib/libmpi.so.0(+0x50309) [0x7fcbee853309]
> [cuzco10:10336] [ 8] /home_nfs/devezep/ATLAS/openmpi-default/lib/libmpi.so.0(+0x4faa0) [0x7fcbee852aa0]
> [cuzco10:10336] [ 9] /home_nfs/devezep/ATLAS/openmpi-default/lib/libmpi.so.0(PMPI_File_close+0xa2) [0x7fcbee896832]
> [cuzco10:10336] [10] ./a.out(main+0x3a4) [0x402434]
> [cuzco10:10336] [11] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3e8521ec5d]
> [cuzco10:10336] [12] ./a.out() [0x401fc9]
> [cuzco10:10336] *** End of error message ***
>
> I am currently analysing the problem (MPI_File_close() now calls MPI_File_set_errhandler()).
>
> Pascal
>
> Jeff Squyres a écrit :
>> On Dec 1, 2010, at 7:35 AM, Pascal Deveze wrote:
>>
>>
>>
>>> I am not on AIM nor on google talk. Sorry. In the case you think it is necessary, I could ask for an ID.
>>>
>>>
>>
>> FWIW. Many of us find it convenient for quickie/informal discussions. We can keep going here in email and switch to phone if it becomes necessary.
>>
>>
>>>> I see that we have the whole romio/confdb directory, so it seems like we should use that tree rather than copy to acinclude.m4.
>>>>
>>>>
>>>>
>>> I agree with you. But, as I said, I have a problem with the macro PAC_FUNC_NEEDS_DECL and the only way to solve it is to put it in acinclude.m4.
>>>
>>>
>>
>> Per below, I think this is now moot -- the romio/autogen.sh script should fix this.
>>
>>
>>
>>>> - there's no .hgignore file -- making "hg status" difficult. In your SVN+HG tree, can you run ./contrib/hg/build-hgignore.pl and commit/push the resulting .hgignore? That would be most helpful.
>>>>
>>>>
>>>>
>>> I have done it, and pushed.
>>>
>>>
>>
>> Awesome; thanks.
>>
>>
>>
>>>> - ompi/mca/io/romio/romio/adio/include/romioconf.h.in is in the hg repo, but should not be (it's generated).
>>>>
>>>>
>>>>
>>> I removed it and pushed the modification.
>>>
>>>
>>>> - I don't see a romio/acinclude.m4 file in the repo, so whatever you did there doesn't show up for me.
>>>>
>>>>
>>>>
>>> I see the file romio/romio/acinclude.m4 in bitbucket:
>>>
>>> http://bitbucket.org/devezep/new-romio-for-openmpi/src/f06f1a24c75b/ompi/mca/io/romio/romio/acinclude.m4
>>>
>>>
>>>
>>
>> Weird. Ok. But I think this is now moot.
>>
>>
>>
>>>> - I tried to add an ompi/mca/io/romio/romio/autogen.sh executable file that contained:
>>>>
>>>> :
>>>> autoreconf -ivf -I confdb
>>>>
>>>> and that seems to make everything work. Can you confirm/double check?
>>>>
>>>>
>>>>
>>> Yes I tried what you suggest (without acinclude.m4), it seems that everything work:
>>> autoreconf -ivf -I confdb
>>> autoreconf: Entering directory `.'
>>> autoreconf: configure.in: not using Gettext
>>> autoreconf: running: aclocal -I confdb --force
>>> autoreconf: configure.in: tracing
>>> autoreconf: running: libtoolize --copy --force
>>> libtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, `confdb'.
>>> libtoolize: copying file `confdb/ltmain.sh'
>>> libtoolize: Consider adding `AC_CONFIG_MACRO_DIR([m4])' to configure.in and
>>> libtoolize: rerunning libtoolize, to keep the correct libtool macros in-tree.
>>> libtoolize: Consider adding `-I m4' to ACLOCAL_AMFLAGS in Makefile.am.
>>> libtoolize: `AC_PROG_RANLIB' is rendered obsolete by `LT_INIT'
>>> autoreconf: running: /homes/openmpi/tools/2010-10-12/bin/autoconf --include=confdb --force
>>> autoreconf: running: /homes/openmpi/tools/2010-10-12/bin/autoheader --include=confdb --force
>>> autoreconf: running: automake --add-missing --copy --force-missing
>>> autoreconf: Leaving directory `.'
>>>
>>> If I try to generate the whole MPI, autogen.sh works but configure fails in the romio directory.
>>>
>>>
>>
>> I'm confused by this statement. Did you run the top-level autogen.sh first? That would should automatically invoke the romio/autogen.sh in the Right place, do a few extra things, etc. Then you should be able to run configure properly (and have it invoke ROMIO's configure at the Right time, etc.).
>>
>> Is that what you tried?
>>
>> I just did a fresh checkout of your hg, removed ompi/mca/io/romio/romio/acinclude.m4 and put in an autogen.sh (and made it executable) that contained:
>>
>> :
>> autoreconf -ivf -I confdb
>>
>> I then ran the top-level autogen.sh and configure, and it all worked.
>>
>> You can see that ompi/mca/io/romio/romio/aclocal.m4 m4_include()'s all the relevant m4 macro files in the confdb directory, including aclocal_cc.m4, which defines PAC_FUNC_NEEDS_DECL.
>>
>>
>>
>>> If I try your autoreconf, then it works for ROMIO.
>>> ===== This does not work without acinclude.m4 ==================
>>> ./autogen.sh
>>> ./configure --prefix=$HOME/bitbucket/new-romio-for-openmpi/install --disable-ipv6 --with-openib=${OFED_BUILDROOT}/usr --enable-openib-connectx-xrc --enable-contrib-no-build=libnbc,vt --with-io-romio-flags="CFLAGS=-I$LUSTRE_PATH/usr/include/ --with-file-system=ufs+nfs+lustre"
>>>
>>>
>>> ===== This works without acinclude.m4 ==================
>>> ./autogen.sh
>>> cd ompi/mca/io/romio/romio
>>> autoreconf -ivf -I confdb
>>> cd -
>>> ./configure --prefix=$HOME/bitbucket/new-romio-for-openmpi/install --disable-ipv6 --with-openib=${OFED_BUILDROOT}/usr --enable-openib-connectx-xrc --enable-contrib-no-build=libnbc,vt --with-io-romio-flags="CFLAGS=-I$LUSTRE_PATH/usr/include/ --with-file-system=ufs+nfs+lustre"
>>>
>>> My conclusion is: There is something to change in autogen.sh to deal with ROMIO (call autoreconf -ivf -I confdb). In that case, the file acinclude.m4 is no more usefull.
>>>
>>>
>>
>> I'm not sure what you mean...
>>
>> Maybe try getting a fresh checkout that does not have any auto* kruft in it at all, remove the aclocal/acinclude, and then put in the autogen.sh file and re-run the top-level autogen.sh to see what happens.
>>
>> I attached the stdout/stderr from running autogen.sh, configure, and make so that you can see what my output looks like.
>>
>>
>>
>>
>> _______________________________________________
>> devel mailing list
>>
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/