Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] Open MPI (not quite) on Cray XC30
From: Paul Hargrove (phhargrove_at_[hidden])
Date: 2013-01-25 18:17:53


Adding --without-lustre to my configure args allowed me to compile and link
ring_c.
I am in the queue now and will report later on run results.

-Paul

On Fri, Jan 25, 2013 at 2:13 PM, Paul Hargrove <phhargrove_at_[hidden]> wrote:

> Still having problems on the Cray XC30, but now they are when linking an
> MPI app:
>
> $ ./INSTALL/bin/mpicc -o ring_c examples/ring_c.c
>> fs_lustre_file_open.c:(.text+0x130): undefined reference to
>> `llapi_file_create'
>> fs_lustre_file_open.c:(.text+0x17e): undefined reference to
>> `llapi_file_get_stripe'
>> /usr/bin/ld: link errors found, deleting executable `ring_c'
>> collect2: error: ld returned 1 exit status
>
>
> It appears that lustre support was found at configure time using a test
> that used "-llustre -llusterapi":
>
>> configure:157666: checking if possible to link LUSTRE
>> configure:157680: cc -std=gnu99 -o conftest -O3 -DNDEBUG
>> -finline-functions -fno-strict-aliasing -fexceptions -D_REENTRANT
>> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.9a1r27905/opal/mca/hwloc/hwloc151/hwloc/include
>> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.9a1r27905/BUILD-edison/opal/mca/hwloc/hwloc151/hwloc/include
>> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.9a1r27905/opal/mca/event/libevent2019/libevent
>> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.9a1r27905/opal/mca/event/libevent2019/libevent/include
>> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.9a1r27905/BUILD-edison/opal/mca/event/libevent2019/libevent/include
>> -I/opt/cray/pmi/default/include -I/opt/cray/pmi/default/include
>> -I/opt/cray/pmi/default/include -I/opt/cray/pmi/default/include
>> -I/usr//include/lustre/ -fexceptions -L/usr//lib64 conftest.c -lnsl
>> -lutil -lnsl -lutil -llustre -llustreapi
>
>
> However, those two libs are NOT included when linking an MPI application:
>
>> $ ./INSTALL/bin/mpicc -o ring_c examples/ring_c.c -v 2>&1 | grep collect
>> /opt/gcc/4.7.2/snos/libexec/gcc/x86_64-suse-linux/4.7.2/collect2
>> --sysroot= -m elf_x86_64 -static -o ring_c -u pthread_mutex_trylock -u
>> pthread_mutex_destroy -u pthread_create /usr/lib/../lib64/crt1.o
>> /usr/lib/../lib64/crti.o
>> /opt/gcc/4.7.2/snos/lib/gcc/x86_64-suse-linux/4.7.2/crtbeginT.o
>> -L/opt/cray/pmi/default/lib64 -L/opt/cray/alps/default/lib64
>> -L/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.9a1r27905/INSTALL/lib
>> -L/opt/cray/udreg/2.3.2-1.0500.5931.3.1.ari/lib64
>> -L/opt/cray/ugni/4.0-1.0500.5836.7.58.ari/lib64
>> -L/opt/cray/pmi/4.0.0-1.0000.9282.69.4.ari/lib64
>> -L/opt/cray/dmapp/4.0.1-1.0500.5932.6.5.ari/lib64
>> -L/opt/cray/xpmem/0.1-2.0500.36799.3.6.ari/lib64
>> -L/opt/cray/alps/5.0.1-2.0500.7663.1.1.ari/lib64
>> -L/opt/cray/rca/1.0.0-2.0500.37705.3.12.ari/lib64
>> -L/opt/cray/mpt/5.6.0/gni/mpich2-gnu/47/lib
>> -L/opt/cray/mpt/5.6.0/gni/sma/lib64
>> -L/opt/cray/libsci/12.0.00/gnu/47/sandybridge/lib
>> -L/opt/cray/alps/5.0.1-2.0500.7663.1.1.ari/lib64
>> -L/opt/gcc/4.7.2/snos/lib/gcc/x86_64-suse-linux/4.7.2
>> -L/opt/gcc/4.7.2/snos/lib/gcc/x86_64-suse-linux/4.7.2/../../../../lib64
>> -L/lib/../lib64 -L/usr/lib/../lib64
>> -L/opt/gcc/4.7.2/snos/lib/gcc/x86_64-suse-linux/4.7.2/../../..
>> /scratch1/scratchdirs/hargrove/cceRJNtp.o -lmpi -lpmi -lalpslli -lalpsutil
>> -lnsl -lutil -lnsl -lutil -lopen-rte -lpmi -lalpslli -lalpsutil -lnsl
>> -lutil -lnsl -lutil -lopen-pal -lpmi -lalpslli -lalpsutil -lnsl -lutil
>> -lnsl -lutil -lrca -L/opt/cray/atp/1.6.0/lib/ --undefined=_ATP_Data_Globals
>> --undefined=__atpHandlerInstall -lAtpSigHCommData -lAtpSigHandler
>> --start-group -lgfortran -lscicpp_gnu -lsci_gnu_mp -lstdc++ -lgfortran
>> -lmpich_gnu_47 -lmpl -lrt -lsma -lxpmem -ldmapp -lugni -lpmi -lalpslli
>> -lalpsutil -lalps -ludreg -lpthread -lm --end-group -lgomp -lpthread
>> --start-group -lgcc -lgcc_eh -lc --end-group
>> /opt/gcc/4.7.2/snos/lib/gcc/x86_64-suse-linux/4.7.2/crtend.o
>> /usr/lib/../lib64/crtn.o
>> collect2: error: ld returned 1 exit status
>
>
> Of course the obvious work-around to try is adding "-llustre -llustreapi"
> to my command line. However, that doesn't work because mpicc places my
> "-l" args BEFORE its own "-lmpi". Since "-static" is also among the
> arguments, no symbols are picked up from the luster libs when they appear
> on the command line before "-lmpi", from which lustre symbols are
> referenced.
>
> Best guess(es):
> EITHER config/ompi_check_lustre.m4 is failing to add "-llustre
> -llustreapi" to some variable
> OR the variable set by config/ompi_check_lustre.m4 isn't making its way
> into the application link command for some reason
>
> Note that this is a --disable-shared/--enable-static build which may
> differ from other systems where LUSTRE support gets used/tested.
>
> -Paul
>
>
> On Fri, Jan 25, 2013 at 12:01 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>
>> Thanks Paul
>>
>> I'm currently tracking down a problem on the Cray XE6 - it appears that
>> recent OS release changed the way alps stores allocation info :-(
>>
>> Will hopefully have it running soon.
>>
>> On Jan 25, 2013, at 10:50 AM, Paul Hargrove <phhargrove_at_[hidden]> wrote:
>>
>> I was able to compile with openmpi-1.9a1r27905.tar.bz
>>
>> I'll report again when I've had an opportunity to run something like
>> ring_c.
>>
>> Thanks,
>> -Paul
>>
>>
>> On Tue, Jan 22, 2013 at 6:08 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>
>>> I went ahead and removed the duplicate code, so this should work now.
>>> The problem is that we re-factored the ompi_info/orte-info code, but didn't
>>> complete the job - specifically, the orte-info tool didn't get updated.
>>> It's about to get revamped yet again when the ompi-rte branch gets
>>> committed to the trunk, so I'd rather not do any more with it now.
>>>
>>> Hopefully, this will be the minimum required.
>>>
>>>
>>> On Jan 22, 2013, at 4:20 PM, Paul Hargrove <phhargrove_at_[hidden]> wrote:
>>>
>>> I am using the openmpi-1.9a1r27886 tarball and I still see an error for
>>> one of the two duplicate symbols:
>>>
>>> CCLD orte-info
>>> ../../../orte/.libs/libopen-rte.a(orte_info_support.o): In function
>>> `orte_info_show_orte_version':
>>> ../../orte/runtime/orte_info_support.c:(.text+0xe10): multiple
>>> definition of `orte_info_show_orte_version'
>>> version.o:../../../../orte/tools/orte-info/version.c:(.text+0x2370):
>>> first defined here
>>>
>>> -Paul
>>>
>>>
>>> On Fri, Jan 18, 2013 at 3:52 AM, George Bosilca <bosilca_at_[hidden]>wrote:
>>>
>>>> Luckily for us all the definitions contain the same constant (orte).
>>>> r27864 should fix this.
>>>>
>>>> George.
>>>>
>>>>
>>>> On Jan 18, 2013, at 06:21 , Paul Hargrove <PHHargrove_at_[hidden]> wrote:
>>>>
>>>> My employer has a nice new Cray XC30 (aka Cascade), and I thought I'd
>>>> give Open MPI a quick test.
>>>>
>>>> Given that it is INTENDED to be API-compatible with the XE series, I
>>>> began configuring with
>>>> CC=cc CXX=CC FC=ftn
>>>> --with-platform=lanl/cray_xe6/optimized-nopanasas
>>>> However, since this is Intel h/w, I commented-out the following 2 lines
>>>> in the platform file:
>>>> with_wrapper_cflags="-march=amdfam10"
>>>> CFLAGS=-march=amdfam10
>>>>
>>>> I am using PrgEnv-gnu/5.0.15, though PrgEnv-intel is the default on our
>>>> system
>>>>
>>>> As far as I know, use of 1.6.x is out - no ugni at all, right?
>>>> So, I didn't even try.
>>>>
>>>> I gave openmpi-1.7rc6 a try, but the ALPS headers and libs have moved
>>>> (as mentioned in ompi-trunk/config/orte_check_alps.m4).
>>>> Perhaps one should CMR the updated-for-CLE-5 configure logic to the 1.7
>>>> branch?
>>>>
>>>> Next, I tried a trunk nightly tarball: openmpi-1.9a1r27862.tar.bz2
>>>> As I mentioned above, the trunk has the right logic for locating ALPS.
>>>> However, it looks like there is some untested code, protected by "#if
>>>> WANT_CRAY_PMI2_EXT", that needs work:
>>>>
>>>> make[2]: Entering directory
>>>> `/global/scratch/sd/hargrove/OMPI/openmpi-1.9a1r27862/BUILD/orte/mca/db/pmi'
>>>> CC db_pmi_component.lo
>>>> CC db_pmi.lo
>>>> ../../../../../orte/mca/db/pmi/db_pmi.c: In function 'store':
>>>> ../../../../../orte/mca/db/pmi/db_pmi.c:202: error: 'ptr' undeclared
>>>> (first use in this function)
>>>> ../../../../../orte/mca/db/pmi/db_pmi.c:202: error: (Each undeclared
>>>> identifier is reported only once
>>>> ../../../../../orte/mca/db/pmi/db_pmi.c:202: error: for each function
>>>> it appears in.)
>>>> make[2]: *** [db_pmi.lo] Error 1
>>>> make[2]: Leaving directory
>>>> `/global/scratch/sd/hargrove/OMPI/openmpi-1.9a1r27862/BUILD/orte/mca/db/pmi'
>>>> make[1]: *** [all-recursive] Error 1
>>>> make[1]: Leaving directory
>>>> `/global/scratch/sd/hargrove/OMPI/openmpi-1.9a1r27862/BUILD/orte'
>>>> make: *** [all-recursive] Error 1
>>>>
>>>> I added the missing "char *ptr" declaration a few lines before it's
>>>> first use, and resumed the build.
>>>> This time the build terminated at
>>>>
>>>> make[2]: Entering directory
>>>> `/global/scratch/sd/hargrove/OMPI/openmpi-1.9a1r27862/BUILD/opal/tools/wrappers'
>>>> CC opal_wrapper.o
>>>> CCLD opal_wrapper
>>>> /usr/bin/ld: attempted static link of dynamic object
>>>> `../../../opal/.libs/libopen-pal.so'
>>>> collect2: error: ld returned 1 exit status
>>>>
>>>> So I went back to the platform file and changed
>>>> enable_shared=yes
>>>> to
>>>> enable_shared=no
>>>> No big deal there - I had to make the same change for our XE6.
>>>>
>>>> And so I started back at configure (after a "make distclean", to be
>>>> safe), and here is the next error:
>>>>
>>>> Making all in tools/orte-info
>>>> make[2]: Entering directory
>>>> `/global/scratch/sd/hargrove/OMPI/openmpi-1.9a1r27862/BUILD/orte/tools/orte-info'
>>>> CCLD orte-info
>>>> ../../../orte/.libs/libopen-rte.a(orte_info_support.o): In function
>>>> `orte_info_show_orte_version':
>>>> orte_info_support.c:(.text+0xd70): multiple definition of
>>>> `orte_info_show_orte_version'
>>>> version.o:version.c:(.text+0x4b0): first defined here
>>>> ../../../orte/.libs/libopen-rte.a(orte_info_support.o):(.data+0x0):
>>>> multiple definition of `orte_info_type_orte'
>>>> orte-info.o:(.data+0x10): first defined here
>>>> /usr/bin/ld: link errors found, deleting executable `orte-info'
>>>> collect2: error: ld returned 1 exit status
>>>> make[2]: *** [orte-info] Error 1
>>>>
>>>> I am not sure how to fix this, but I would guess this is probably a
>>>> simple fix for somebody who knows OMPI's build infrastructure better than I.
>>>>
>>>> -Paul
>>>>
>>>> --
>>>> Paul H. Hargrove PHHargrove_at_[hidden]
>>>> Future Technologies Group
>>>> Computer and Data Sciences Department Tel: +1-510-495-2352
>>>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>
>>>
>>>
>>> --
>>> Paul H. Hargrove PHHargrove_at_[hidden]
>>> Future Technologies Group
>>> Computer and Data Sciences Department Tel: +1-510-495-2352
>>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>>
>>
>> --
>> Paul H. Hargrove PHHargrove_at_[hidden]
>> Future Technologies Group
>> Computer and Data Sciences Department Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>
>
> --
> Paul H. Hargrove PHHargrove_at_[hidden]
> Future Technologies Group
> Computer and Data Sciences Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>

-- 
Paul H. Hargrove                          PHHargrove_at_[hidden]
Future Technologies Group
Computer and Data Sciences Department     Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900