Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: Bring the lastest ROMIO version from MPICH2-1.3 into the trunk
From: Pascal Deveze (Pascal.Deveze_at_[hidden])
Date: 2010-12-06 08:42:26


Jeff,

I removed ompi/mca/io/romio/romio/acinclude.m4. I put "autoreconf -ivf
-I confdb" in autogen.sh. And I "chmod +x autogen.sh" (my
stupid error is that this file wasn't executable).
And all is now OK.
These modifications have been pushed in bitbucket.

I tried to run the ROMIO tests and got an error in
ompi/mpi/c/profile/MPI_File_set_errhandler.c:
OBJ_RELEASE(tmp) is calling an assertion:

 pfile_set_errhandler.c:75: PMPI_File_set_errhandler: Assertion
`((0xdeafbeedULL << 32) + 0xdeafbeedULL) == ((opal_object_t *)
(tmp))->obj_magic_id' failed.
[cuzco10:10336] *** Process received signal ***
[cuzco10:10336] Signal: Aborted (6)
[cuzco10:10336] Signal code: (-6)
[cuzco10:10336] [ 0] /lib64/libpthread.so.0() [0x3e8560f440]
[cuzco10:10336] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x3e852329c5]
[cuzco10:10336] [ 2] /lib64/libc.so.6(abort+0x175) [0x3e852341a5]
[cuzco10:10336] [ 3] /lib64/libc.so.6(__assert_fail+0xf5) [0x3e8522b945]
[cuzco10:10336] [ 4]
/home_nfs/devezep/ATLAS/openmpi-default/lib/libmpi.so.0(MPI_File_set_errhandler+0x1e4)
[0x7fcbee89d1d4]
[cuzco10:10336] [ 5]
/home_nfs/devezep/ATLAS/openmpi-default/lib/openmpi/mca_io_romio.so(mca_io_romio_dist_MPI_File_close+0x12a)
[0x7fcbe7dbc4ea]
[cuzco10:10336] [ 6]
/home_nfs/devezep/ATLAS/openmpi-default/lib/openmpi/mca_io_romio.so(+0x9764)
[0x7fcbe7d8e764]
[cuzco10:10336] [ 7]
/home_nfs/devezep/ATLAS/openmpi-default/lib/libmpi.so.0(+0x50309)
[0x7fcbee853309]
[cuzco10:10336] [ 8]
/home_nfs/devezep/ATLAS/openmpi-default/lib/libmpi.so.0(+0x4faa0)
[0x7fcbee852aa0]
[cuzco10:10336] [ 9]
/home_nfs/devezep/ATLAS/openmpi-default/lib/libmpi.so.0(PMPI_File_close+0xa2)
[0x7fcbee896832]
[cuzco10:10336] [10] ./a.out(main+0x3a4) [0x402434]
[cuzco10:10336] [11] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3e8521ec5d]
[cuzco10:10336] [12] ./a.out() [0x401fc9]
[cuzco10:10336] *** End of error message ***

I am currently analysing the problem (MPI_File_close() now calls
MPI_File_set_errhandler()).

Pascal

Jeff Squyres a écrit :
> On Dec 1, 2010, at 7:35 AM, Pascal Deveze wrote:
>
>
>> I am not on AIM nor on google talk. Sorry. In the case you think it is necessary, I could ask for an ID.
>>
>
> FWIW. Many of us find it convenient for quickie/informal discussions. We can keep going here in email and switch to phone if it becomes necessary.
>
>>> I see that we have the whole romio/confdb directory, so it seems like we should use that tree rather than copy to acinclude.m4.
>>>
>>>
>> I agree with you. But, as I said, I have a problem with the macro PAC_FUNC_NEEDS_DECL and the only way to solve it is to put it in acinclude.m4.
>>
>
> Per below, I think this is now moot -- the romio/autogen.sh script should fix this.
>
>
>>> - there's no .hgignore file -- making "hg status" difficult. In your SVN+HG tree, can you run ./contrib/hg/build-hgignore.pl and commit/push the resulting .hgignore? That would be most helpful.
>>>
>>>
>> I have done it, and pushed.
>>
>
> Awesome; thanks.
>
>
>>> - ompi/mca/io/romio/romio/adio/include/romioconf.h.in is in the hg repo, but should not be (it's generated).
>>>
>>>
>> I removed it and pushed the modification.
>>
>>> - I don't see a romio/acinclude.m4 file in the repo, so whatever you did there doesn't show up for me.
>>>
>>>
>> I see the file romio/romio/acinclude.m4 in bitbucket:
>> http://bitbucket.org/devezep/new-romio-for-openmpi/src/f06f1a24c75b/ompi/mca/io/romio/romio/acinclude.m4
>>
>
> Weird. Ok. But I think this is now moot.
>
>
>>> - I tried to add an ompi/mca/io/romio/romio/autogen.sh executable file that contained:
>>>
>>> :
>>> autoreconf -ivf -I confdb
>>>
>>> and that seems to make everything work. Can you confirm/double check?
>>>
>>>
>> Yes I tried what you suggest (without acinclude.m4), it seems that everything work:
>> autoreconf -ivf -I confdb
>> autoreconf: Entering directory `.'
>> autoreconf: configure.in: not using Gettext
>> autoreconf: running: aclocal -I confdb --force
>> autoreconf: configure.in: tracing
>> autoreconf: running: libtoolize --copy --force
>> libtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, `confdb'.
>> libtoolize: copying file `confdb/ltmain.sh'
>> libtoolize: Consider adding `AC_CONFIG_MACRO_DIR([m4])' to configure.in and
>> libtoolize: rerunning libtoolize, to keep the correct libtool macros in-tree.
>> libtoolize: Consider adding `-I m4' to ACLOCAL_AMFLAGS in Makefile.am.
>> libtoolize: `AC_PROG_RANLIB' is rendered obsolete by `LT_INIT'
>> autoreconf: running: /homes/openmpi/tools/2010-10-12/bin/autoconf --include=confdb --force
>> autoreconf: running: /homes/openmpi/tools/2010-10-12/bin/autoheader --include=confdb --force
>> autoreconf: running: automake --add-missing --copy --force-missing
>> autoreconf: Leaving directory `.'
>>
>> If I try to generate the whole MPI, autogen.sh works but configure fails in the romio directory.
>>
>
> I'm confused by this statement. Did you run the top-level autogen.sh first? That would should automatically invoke the romio/autogen.sh in the Right place, do a few extra things, etc. Then you should be able to run configure properly (and have it invoke ROMIO's configure at the Right time, etc.).
>
> Is that what you tried?
>
> I just did a fresh checkout of your hg, removed ompi/mca/io/romio/romio/acinclude.m4 and put in an autogen.sh (and made it executable) that contained:
>
> :
> autoreconf -ivf -I confdb
>
> I then ran the top-level autogen.sh and configure, and it all worked.
>
> You can see that ompi/mca/io/romio/romio/aclocal.m4 m4_include()'s all the relevant m4 macro files in the confdb directory, including aclocal_cc.m4, which defines PAC_FUNC_NEEDS_DECL.
>
>
>> If I try your autoreconf, then it works for ROMIO.
>> ===== This does not work without acinclude.m4 ==================
>> ./autogen.sh
>> ./configure --prefix=$HOME/bitbucket/new-romio-for-openmpi/install --disable-ipv6 --with-openib=${OFED_BUILDROOT}/usr --enable-openib-connectx-xrc --enable-contrib-no-build=libnbc,vt --with-io-romio-flags="CFLAGS=-I$LUSTRE_PATH/usr/include/ --with-file-system=ufs+nfs+lustre"
>>
>>
>> ===== This works without acinclude.m4 ==================
>> ./autogen.sh
>> cd ompi/mca/io/romio/romio
>> autoreconf -ivf -I confdb
>> cd -
>> ./configure --prefix=$HOME/bitbucket/new-romio-for-openmpi/install --disable-ipv6 --with-openib=${OFED_BUILDROOT}/usr --enable-openib-connectx-xrc --enable-contrib-no-build=libnbc,vt --with-io-romio-flags="CFLAGS=-I$LUSTRE_PATH/usr/include/ --with-file-system=ufs+nfs+lustre"
>>
>> My conclusion is: There is something to change in autogen.sh to deal with ROMIO (call autoreconf -ivf -I confdb). In that case, the file acinclude.m4 is no more usefull.
>>
>
> I'm not sure what you mean...
>
> Maybe try getting a fresh checkout that does not have any auto* kruft in it at all, remove the aclocal/acinclude, and then put in the autogen.sh file and re-run the top-level autogen.sh to see what happens.
>
> I attached the stdout/stderr from running autogen.sh, configure, and make so that you can see what my output looks like.
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel