Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] 1.7 rc4 compilation error
From: Edgar Gabriel (gabriel_at_[hidden])
Date: 2012-10-30 09:29:14


ok, I'll look into this. I noticed a problem with static builds on
lustre file systems recently, and I was wandering whether its the same
issue or not. But I'll check what's going on.

THanks
Edgar

On 10/30/2012 7:22 AM, Ralph Castain wrote:
> No to Lustre, and I didn't build static
>
> I'm not sure what, if any, parallel file system might be present. In the case that works, I just built with no configure args other than prefix. ompi_info shows both romio and mpio built, but nothing more about what support they built internally.
>
>
> On Oct 30, 2012, at 4:14 AM, Edgar Gabriel <gabriel_at_[hidden]> wrote:
>
>> Ralph,
>>
>> just out curiosity: is there a lustre file system on the machine and is
>> this a static build ?
>>
>> Thanks
>> Edgar
>>
>> On 10/29/2012 9:17 PM, Ralph Castain wrote:
>>> Hmmm...I added that directory and tried this on odin (which is an IB-based machine). Any MPI proc segfaults:
>>>
>>> Core was generated by `./hello'.
>>> Program terminated with signal 11, Segmentation fault.
>>> w#0 _sysio_p_validate (pno=0x0, intnt=0x0, path=0x0) at src/inode.c:574
>>> 574 src/inode.c: No such file or directory.
>>> in src/inode.c
>>> (gdb) where
>>> #0 _sysio_p_validate (pno=0x0, intnt=0x0, path=0x0) at src/inode.c:574
>>> #1 0x00002aaaabd3f3e9 in _sysio_path_walk (parent=0x0, nd=0x7fffffffd8e0) at src/namei.c:216
>>> #2 0x00002aaaabd3faad in _sysio_namei (parent=0x0, path=<value optimized out>, flags=0, intnt=0x7fffffffd950, pnop=0x7fffffffd970) at src/namei.c:505
>>> #3 0x00002aaaabd3fd98 in open (path=0x2aaaac24280f "/sys/devices/system/node", flags=<value optimized out>) at src/open.c:179
>>> #4 0x00002aaaabd43d5b in opendir (name=0x2aaaac24280f "/sys/devices/system/node") at src/stddir.c:60
>>> #5 0x00002aaaac241825 in numa_max_node () from /usr/lib64/libnuma.so.1
>>> #6 0x00002aaaac241d13 in numa_init () from /usr/lib64/libnuma.so.1
>>> #7 0x00002aaaaaab845b in call_init () from /lib64/ld-linux-x86-64.so.2
>>> #8 0x00002aaaaaab8565 in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2
>>> #9 0x00002aaaaaaabaaa in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
>>> #10 0x0000000000000001 in ?? ()
>>> #11 0x00007fffffffe03c in ?? ()
>>> #12 0x0000000000000000 in ?? ()
>>>
>>> I got the same thing whether I excluded openib or not. I then ran on my Linux cluster, which doesn't have IB at all - and it ran fine. Also runs clean on the Mac. However, in both those cases, I had left IO romio enabled.
>>>
>>> Now on odin, I always disable-io-romio. So I tried deliberately enabling it, and everything works. So this appears to be something that the IO work has broken.
>>>
>>> Edgar: can you please fix --disable-io-romio?
>>>
>>> Thanks
>>> Ralph
>>>
>>>
>>>
>>>
>>> On Oct 29, 2012, at 11:55 AM, Edgar Gabriel <gabriel_at_[hidden]> wrote:
>>>
>>>> I'm sorry to add one more thing to the list, but beyond this file, it
>>>> looks like also the entire ompi/mca/common/verbs/ directory is also
>>>> missing in the 1.7 branch, but is required to compile the bcoll
>>>> framework. It is there in the trunk, but missing in the 1.7 branch...
>>>>
>>>> Thanks
>>>> Edgar
>>>>
>>>>
>>>> On 10/26/2012 5:31 PM, Ralph Castain wrote:
>>>>> Okay, I'll fix for tonights tarball.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> On Oct 26, 2012, at 3:28 PM, "Shamis, Pavel" <shamisp_at_[hidden]> wrote:
>>>>>
>>>>>> There is a bug in makefile. The file existing in svn, but it is not listed in the Makefile.am. As a result, it wasn't pulled to the tarball.
>>>>>>
>>>>>> Pavel (Pasha) Shamis
>>>>>> ---
>>>>>> Computer Science Research Group
>>>>>> Computer Science and Math Division
>>>>>> Oak Ridge National Laboratory
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Oct 26, 2012, at 2:33 PM, Edgar Gabriel wrote:
>>>>>>
>>>>>> we have trouble compiling the 1.7 series on a machine in Dresden.
>>>>>> Specifically, we receive an error message when compiling the
>>>>>> bcol/iboffload component (other infiniband components compile fine).
>>>>>>
>>>>>> Any idea/suggestions what we might be doing wrong or what to look for?
>>>>>>
>>>>>> make[2]: Entering directory
>>>>>> `/home/h2/gabriel/openmpi-1.7rc4/ompi/mca/bcol/iboffload'
>>>>>> CC bcol_iboffload_module.lo
>>>>>> CC bcol_iboffload_mca.lo
>>>>>> CC bcol_iboffload_endpoint.lo
>>>>>> CC bcol_iboffload_frag.lo
>>>>>> In file included from bcol_iboffload_frag.c:16:0:
>>>>>> bcol_iboffload.h:46:36: fatal error: bcol_iboffload_qp_info.h: No such
>>>>>> file or directory
>>>>>> compilation terminated.
>>>>>> make[2]: *** [bcol_iboffload_frag.lo] Error 1
>>>>>> make[2]: *** Waiting for unfinished jobs....
>>>>>> In file included from bcol_iboffload_mca.c:18:0:
>>>>>> bcol_iboffload.h:46:36: fatal error: bcol_iboffload_qp_info.h: No such
>>>>>> file or directory
>>>>>> compilation terminated.
>>>>>> make[2]: *** [bcol_iboffload_mca.lo] Error 1
>>>>>> In file included from bcol_iboffload_endpoint.c:23:0:
>>>>>> bcol_iboffload.h:46:36: fatal error: bcol_iboffload_qp_info.h: No such
>>>>>> file or directory
>>>>>> compilation terminated.
>>>>>> make[2]: *** [bcol_iboffload_endpoint.lo] Error 1
>>>>>> In file included from bcol_iboffload_module.c:39:0:
>>>>>> bcol_iboffload.h:46:36: fatal error: bcol_iboffload_qp_info.h: No such
>>>>>> file or directory
>>>>>> compilation terminated.
>>>>>> make[2]: *** [bcol_iboffload_module.lo] Error 1
>>>>>> make[2]: Leaving directory
>>>>>> `/home/h2/gabriel/openmpi-1.7rc4/ompi/mca/bcol/iboffload'
>>>>>> make[1]: *** [all-recursive] Error 1
>>>>>> make[1]: Leaving directory `/home/h2/gabriel/openmpi-1.7rc4/ompi'
>>>>>> make: *** [all-recursive] Error 1
>>>>>>
>>>>>> Thanks
>>>>>> Edgar
>>>>>>
>>>>>> --
>>>>>> Edgar Gabriel
>>>>>> Associate Professor
>>>>>> Parallel Software Technologies Lab http://pstl.cs.uh.edu
>>>>>> Department of Computer Science University of Houston
>>>>>> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA
>>>>>> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
>>>>>>
>>>>>> <signature.asc>_______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]<mailto:devel_at_[hidden]>
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>
>>>> --
>>>> Edgar Gabriel
>>>> Associate Professor
>>>> Parallel Software Technologies Lab http://pstl.cs.uh.edu
>>>> Department of Computer Science University of Houston
>>>> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA
>>>> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>> --
>> Edgar Gabriel
>> Associate Professor
>> Parallel Software Technologies Lab http://pstl.cs.uh.edu
>> Department of Computer Science University of Houston
>> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA
>> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

-- 
Edgar Gabriel
Associate Professor
Parallel Software Technologies Lab      http://pstl.cs.uh.edu
Department of Computer Science          University of Houston
Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335