Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] 1.7 rc4 compilation error
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-10-30 16:35:13


Okay, I tracked this silliness down. On odin, my platform file builds both shared and static. It appears that mpicc in that situation defaults to picking the static build, and so I wind up with a static executable. This behavior was unexpected - I thought we would default to dynamic, but support static if that flag was given to mpicc. Call me surprised, but at least now I know.

I found that the Lustre headers and libs are indeed on the system, and so your analysis of the problem is correct.

When I build with nothing on the configure line, we only build shared and so the executable is dynamic - and the problem goes away.

HTH
Ralph

On Oct 30, 2012, at 12:06 PM, Ralph Castain <rhc_at_[hidden]> wrote:

> Sure - I can do that.
>
> On Oct 30, 2012, at 11:29 AM, Edgar Gabriel <gabriel_at_[hidden]> wrote:
>
>> glad to hear that. However, since we are also having the problem with
>> the lustre-fs module for static builds, I think it would still make
>> sense to disable fs/lustre/ for 1.7.0
>>
>> Edgar
>>
>> On 10/30/2012 12:34 PM, Ralph Castain wrote:
>>> I hate odin :-(
>>>
>>> FWIW: it all works fine today, no matter how I configure it. No earthly idea what happened.
>>>
>>> Ignore these droids....
>>>
>>>
>>> On Oct 30, 2012, at 7:28 AM, Edgar Gabriel <gabriel_at_[hidden]> wrote:
>>>
>>>> ok, so a couple of things.
>>>>
>>>> I still think it is the same issue that I observed 1-2 days ago. Could
>>>> you try to remove the fs/lustre component from your compilation, e.g. by
>>>> adding an .ompi_ignore file into that directory, and see whether this
>>>> fixes the issue?
>>>>
>>>> I tried on my machine (no lustre, no ib) compilations with
>>>> --disable-mpi-io *or* --disable-io-romio, and both worked correctly and
>>>> I could run things. Note, that the flags are truly different meanwhile,
>>>> since the second flag is now equivalent to --enable-mca-no-build=io:romio
>>>> The first flag disables the io, fcoll, fs and sharedfp frameworks.
>>>> (prior to ompio they had basically the same effect).
>>>>
>>>> In your particular case this means, that you disabled romio, but the
>>>> entire ompio stack is still compiled, and error must come from that
>>>> portion. If my suspecion is correct, it is still liblustre
>>>> messing around with the malloc hooks, and that causes the stack frame to
>>>> be completely broken. I thought I fixed that since we did not have the
>>>> issue on trunk, but we did observe that in the 1.7 branch 1-2 days back
>>>> as well, and I was looking into that.
>>>>
>>>> That being said, there is another malloc-hooks issue that makes me a bit
>>>> nervous. The compilation of the otf stuff produced a ton of warnings on
>>>> my machine with gcc4.6.2 also with respect to the _malloc_hooks and
>>>> _realloc_hooks. Not sure whether this contributed to the problem as
>>>> well, just thought I bring it up since we seem to have a corrupted stack
>>>> frame problem.
>>>>
>>>> Thanks
>>>> Edgar
>>>>
>>>>
>>>> On 10/30/2012 8:29 AM, Edgar Gabriel wrote:
>>>>> ok, I'll look into this. I noticed a problem with static builds on
>>>>> lustre file systems recently, and I was wandering whether its the same
>>>>> issue or not. But I'll check what's going on.
>>>>>
>>>>> THanks
>>>>> Edgar
>>>>>
>>>>> On 10/30/2012 7:22 AM, Ralph Castain wrote:
>>>>>> No to Lustre, and I didn't build static
>>>>>>
>>>>>> I'm not sure what, if any, parallel file system might be present. In the case that works, I just built with no configure args other than prefix. ompi_info shows both romio and mpio built, but nothing more about what support they built internally.
>>>>>>
>>>>>>
>>>>>> On Oct 30, 2012, at 4:14 AM, Edgar Gabriel <gabriel_at_[hidden]> wrote:
>>>>>>
>>>>>>> Ralph,
>>>>>>>
>>>>>>> just out curiosity: is there a lustre file system on the machine and is
>>>>>>> this a static build ?
>>>>>>>
>>>>>>> Thanks
>>>>>>> Edgar
>>>>>>>
>>>>>>> On 10/29/2012 9:17 PM, Ralph Castain wrote:
>>>>>>>> Hmmm...I added that directory and tried this on odin (which is an IB-based machine). Any MPI proc segfaults:
>>>>>>>>
>>>>>>>> Core was generated by `./hello'.
>>>>>>>> Program terminated with signal 11, Segmentation fault.
>>>>>>>> w#0 _sysio_p_validate (pno=0x0, intnt=0x0, path=0x0) at src/inode.c:574
>>>>>>>> 574 src/inode.c: No such file or directory.
>>>>>>>> in src/inode.c
>>>>>>>> (gdb) where
>>>>>>>> #0 _sysio_p_validate (pno=0x0, intnt=0x0, path=0x0) at src/inode.c:574
>>>>>>>> #1 0x00002aaaabd3f3e9 in _sysio_path_walk (parent=0x0, nd=0x7fffffffd8e0) at src/namei.c:216
>>>>>>>> #2 0x00002aaaabd3faad in _sysio_namei (parent=0x0, path=<value optimized out>, flags=0, intnt=0x7fffffffd950, pnop=0x7fffffffd970) at src/namei.c:505
>>>>>>>> #3 0x00002aaaabd3fd98 in open (path=0x2aaaac24280f "/sys/devices/system/node", flags=<value optimized out>) at src/open.c:179
>>>>>>>> #4 0x00002aaaabd43d5b in opendir (name=0x2aaaac24280f "/sys/devices/system/node") at src/stddir.c:60
>>>>>>>> #5 0x00002aaaac241825 in numa_max_node () from /usr/lib64/libnuma.so.1
>>>>>>>> #6 0x00002aaaac241d13 in numa_init () from /usr/lib64/libnuma.so.1
>>>>>>>> #7 0x00002aaaaaab845b in call_init () from /lib64/ld-linux-x86-64.so.2
>>>>>>>> #8 0x00002aaaaaab8565 in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2
>>>>>>>> #9 0x00002aaaaaaabaaa in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
>>>>>>>> #10 0x0000000000000001 in ?? ()
>>>>>>>> #11 0x00007fffffffe03c in ?? ()
>>>>>>>> #12 0x0000000000000000 in ?? ()
>>>>>>>>
>>>>>>>> I got the same thing whether I excluded openib or not. I then ran on my Linux cluster, which doesn't have IB at all - and it ran fine. Also runs clean on the Mac. However, in both those cases, I had left IO romio enabled.
>>>>>>>>
>>>>>>>> Now on odin, I always disable-io-romio. So I tried deliberately enabling it, and everything works. So this appears to be something that the IO work has broken.
>>>>>>>>
>>>>>>>> Edgar: can you please fix --disable-io-romio?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Ralph
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Oct 29, 2012, at 11:55 AM, Edgar Gabriel <gabriel_at_[hidden]> wrote:
>>>>>>>>
>>>>>>>>> I'm sorry to add one more thing to the list, but beyond this file, it
>>>>>>>>> looks like also the entire ompi/mca/common/verbs/ directory is also
>>>>>>>>> missing in the 1.7 branch, but is required to compile the bcoll
>>>>>>>>> framework. It is there in the trunk, but missing in the 1.7 branch...
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Edgar
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 10/26/2012 5:31 PM, Ralph Castain wrote:
>>>>>>>>>> Okay, I'll fix for tonights tarball.
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>> On Oct 26, 2012, at 3:28 PM, "Shamis, Pavel" <shamisp_at_[hidden]> wrote:
>>>>>>>>>>
>>>>>>>>>>> There is a bug in makefile. The file existing in svn, but it is not listed in the Makefile.am. As a result, it wasn't pulled to the tarball.
>>>>>>>>>>>
>>>>>>>>>>> Pavel (Pasha) Shamis
>>>>>>>>>>> ---
>>>>>>>>>>> Computer Science Research Group
>>>>>>>>>>> Computer Science and Math Division
>>>>>>>>>>> Oak Ridge National Laboratory
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Oct 26, 2012, at 2:33 PM, Edgar Gabriel wrote:
>>>>>>>>>>>
>>>>>>>>>>> we have trouble compiling the 1.7 series on a machine in Dresden.
>>>>>>>>>>> Specifically, we receive an error message when compiling the
>>>>>>>>>>> bcol/iboffload component (other infiniband components compile fine).
>>>>>>>>>>>
>>>>>>>>>>> Any idea/suggestions what we might be doing wrong or what to look for?
>>>>>>>>>>>
>>>>>>>>>>> make[2]: Entering directory
>>>>>>>>>>> `/home/h2/gabriel/openmpi-1.7rc4/ompi/mca/bcol/iboffload'
>>>>>>>>>>> CC bcol_iboffload_module.lo
>>>>>>>>>>> CC bcol_iboffload_mca.lo
>>>>>>>>>>> CC bcol_iboffload_endpoint.lo
>>>>>>>>>>> CC bcol_iboffload_frag.lo
>>>>>>>>>>> In file included from bcol_iboffload_frag.c:16:0:
>>>>>>>>>>> bcol_iboffload.h:46:36: fatal error: bcol_iboffload_qp_info.h: No such
>>>>>>>>>>> file or directory
>>>>>>>>>>> compilation terminated.
>>>>>>>>>>> make[2]: *** [bcol_iboffload_frag.lo] Error 1
>>>>>>>>>>> make[2]: *** Waiting for unfinished jobs....
>>>>>>>>>>> In file included from bcol_iboffload_mca.c:18:0:
>>>>>>>>>>> bcol_iboffload.h:46:36: fatal error: bcol_iboffload_qp_info.h: No such
>>>>>>>>>>> file or directory
>>>>>>>>>>> compilation terminated.
>>>>>>>>>>> make[2]: *** [bcol_iboffload_mca.lo] Error 1
>>>>>>>>>>> In file included from bcol_iboffload_endpoint.c:23:0:
>>>>>>>>>>> bcol_iboffload.h:46:36: fatal error: bcol_iboffload_qp_info.h: No such
>>>>>>>>>>> file or directory
>>>>>>>>>>> compilation terminated.
>>>>>>>>>>> make[2]: *** [bcol_iboffload_endpoint.lo] Error 1
>>>>>>>>>>> In file included from bcol_iboffload_module.c:39:0:
>>>>>>>>>>> bcol_iboffload.h:46:36: fatal error: bcol_iboffload_qp_info.h: No such
>>>>>>>>>>> file or directory
>>>>>>>>>>> compilation terminated.
>>>>>>>>>>> make[2]: *** [bcol_iboffload_module.lo] Error 1
>>>>>>>>>>> make[2]: Leaving directory
>>>>>>>>>>> `/home/h2/gabriel/openmpi-1.7rc4/ompi/mca/bcol/iboffload'
>>>>>>>>>>> make[1]: *** [all-recursive] Error 1
>>>>>>>>>>> make[1]: Leaving directory `/home/h2/gabriel/openmpi-1.7rc4/ompi'
>>>>>>>>>>> make: *** [all-recursive] Error 1
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>> Edgar
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Edgar Gabriel
>>>>>>>>>>> Associate Professor
>>>>>>>>>>> Parallel Software Technologies Lab http://pstl.cs.uh.edu
>>>>>>>>>>> Department of Computer Science University of Houston
>>>>>>>>>>> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA
>>>>>>>>>>> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
>>>>>>>>>>>
>>>>>>>>>>> <signature.asc>_______________________________________________
>>>>>>>>>>> devel mailing list
>>>>>>>>>>> devel_at_[hidden]<mailto:devel_at_[hidden]>
>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> devel mailing list
>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> devel mailing list
>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Edgar Gabriel
>>>>>>>>> Associate Professor
>>>>>>>>> Parallel Software Technologies Lab http://pstl.cs.uh.edu
>>>>>>>>> Department of Computer Science University of Houston
>>>>>>>>> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA
>>>>>>>>> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> devel mailing list
>>>>>>>>> devel_at_[hidden]
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> devel_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Edgar Gabriel
>>>>>>> Associate Professor
>>>>>>> Parallel Software Technologies Lab http://pstl.cs.uh.edu
>>>>>>> Department of Computer Science University of Houston
>>>>>>> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA
>>>>>>> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> devel_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>
>>>> --
>>>> Edgar Gabriel
>>>> Associate Professor
>>>> Parallel Software Technologies Lab http://pstl.cs.uh.edu
>>>> Department of Computer Science University of Houston
>>>> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA
>>>> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>> --
>> Edgar Gabriel
>> Associate Professor
>> Parallel Software Technologies Lab http://pstl.cs.uh.edu
>> Department of Computer Science University of Houston
>> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA
>> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>