Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Open MPI (not quite) on Cray XC30
From: Paul Hargrove (phhargrove_at_[hidden])
Date: 2013-01-25 18:10:37


Nathan,

Cray's "cc" wrapper is adding xpmem, ugni, pmi, alps and others already:

$ cc -v hello.c 2>&1 | grep collect
> /opt/gcc/4.7.2/snos/libexec/gcc/x86_64-suse-linux/4.7.2/collect2
> --sysroot= -m elf_x86_64 -static -u pthread_mutex_trylock -u
> pthread_mutex_destroy -u pthread_create /usr/lib/../lib64/crt1.o
> /usr/lib/../lib64/crti.o
> /opt/gcc/4.7.2/snos/lib/gcc/x86_64-suse-linux/4.7.2/crtbeginT.o
> -L/opt/cray/udreg/2.3.2-1.0500.5931.3.1.ari/lib64
> -L/opt/cray/ugni/4.0-1.0500.5836.7.58.ari/lib64
> -L/opt/cray/pmi/4.0.0-1.0000.9282.69.4.ari/lib64
> -L/opt/cray/dmapp/4.0.1-1.0500.5932.6.5.ari/lib64
> -L/opt/cray/xpmem/0.1-2.0500.36799.3.6.ari/lib64
> -L/opt/cray/alps/5.0.1-2.0500.7663.1.1.ari/lib64
> -L/opt/cray/rca/1.0.0-2.0500.37705.3.12.ari/lib64
> -L/opt/cray/mpt/5.6.0/gni/mpich2-gnu/47/lib
> -L/opt/cray/mpt/5.6.0/gni/sma/lib64
> -L/opt/cray/libsci/12.0.00/gnu/47/sandybridge/lib
> -L/opt/cray/alps/5.0.1-2.0500.7663.1.1.ari/lib64
> -L/opt/gcc/4.7.2/snos/lib/gcc/x86_64-suse-linux/4.7.2
> -L/opt/gcc/4.7.2/snos/lib/gcc/x86_64-suse-linux/4.7.2/../../../../lib64
> -L/lib/../lib64 -L/usr/lib/../lib64
> -L/opt/gcc/4.7.2/snos/lib/gcc/x86_64-suse-linux/4.7.2/../../..
> /scratch1/scratchdirs/hargrove/ccQ1f0sx.o -lrca -L/opt/cray/atp/1.6.0/lib/
> --undefined=_ATP_Data_Globals --undefined=__atpHandlerInstall
> -lAtpSigHCommData -lAtpSigHandler --start-group -lgfortran -lscicpp_gnu
> -lsci_gnu_mp -lstdc++ -lgfortran -lmpich_gnu_47 -lmpl -lrt -lsma -lxpmem
> -ldmapp -lugni -lpmi -lalpslli -lalpsutil -lalps -ludreg -lpthread -lm
> --end-group -lgomp -lpthread --start-group -lgcc -lgcc_eh -lc --end-group
> /opt/gcc/4.7.2/snos/lib/gcc/x86_64-suse-linux/4.7.2/crtend.o
> /usr/lib/../lib64/crtn.o

-Paul

On Fri, Jan 25, 2013 at 2:46 PM, Nathan Hjelm <hjelmn_at_[hidden]> wrote:

> Something is wrong with the wrappers. A number of libraries (-lxpmem,
> -lugni, etc) are missing from libs_static. Might be a similar issue with eh
> missing -llustreapi. Going to create a critical bug to track this issue.
>
> Works in 1.7 :-/ ... If you add -lnuma to libs_static in
> mpicc-wrapper-data.txt.
>
> -Nathan
> HPC-3, LANL
>
> On Fri, Jan 25, 2013 at 02:13:41PM -0800, Paul Hargrove wrote:
> > Still having problems on the Cray XC30, but now they are when linking an
> > MPI app:
> >
> > $ ./INSTALL/bin/mpicc -o ring_c examples/ring_c.c
> > > fs_lustre_file_open.c:(.text+0x130): undefined reference to
> > > `llapi_file_create'
> > > fs_lustre_file_open.c:(.text+0x17e): undefined reference to
> > > `llapi_file_get_stripe'
> > > /usr/bin/ld: link errors found, deleting executable `ring_c'
> > > collect2: error: ld returned 1 exit status
> >
> >
> > It appears that lustre support was found at configure time using a test
> > that used "-llustre -llusterapi":
> >
> > > configure:157666: checking if possible to link LUSTRE
> > > configure:157680: cc -std=gnu99 -o conftest -O3 -DNDEBUG
> > > -finline-functions -fno-strict-aliasing -fexceptions -D_REENTRANT
> > >
> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.9a1r27905/opal/mca/hwloc/hwloc151/hwloc/include
> > >
> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.9a1r27905/BUILD-edison/opal/mca/hwloc/hwloc151/hwloc/include
> > >
> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.9a1r27905/opal/mca/event/libevent2019/libevent
> > >
> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.9a1r27905/opal/mca/event/libevent2019/libevent/include
> > >
> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.9a1r27905/BUILD-edison/opal/mca/event/libevent2019/libevent/include
> > > -I/opt/cray/pmi/default/include -I/opt/cray/pmi/default/include
> > > -I/opt/cray/pmi/default/include -I/opt/cray/pmi/default/include
> > > -I/usr//include/lustre/ -fexceptions -L/usr//lib64 conftest.c -lnsl
> > > -lutil -lnsl -lutil -llustre -llustreapi
> >
> >
> > However, those two libs are NOT included when linking an MPI application:
> >
> > > $ ./INSTALL/bin/mpicc -o ring_c examples/ring_c.c -v 2>&1 | grep
> collect
> > > /opt/gcc/4.7.2/snos/libexec/gcc/x86_64-suse-linux/4.7.2/collect2
> > > --sysroot= -m elf_x86_64 -static -o ring_c -u pthread_mutex_trylock -u
> > > pthread_mutex_destroy -u pthread_create /usr/lib/../lib64/crt1.o
> > > /usr/lib/../lib64/crti.o
> > > /opt/gcc/4.7.2/snos/lib/gcc/x86_64-suse-linux/4.7.2/crtbeginT.o
> > > -L/opt/cray/pmi/default/lib64 -L/opt/cray/alps/default/lib64
> > >
> -L/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.9a1r27905/INSTALL/lib
> > > -L/opt/cray/udreg/2.3.2-1.0500.5931.3.1.ari/lib64
> > > -L/opt/cray/ugni/4.0-1.0500.5836.7.58.ari/lib64
> > > -L/opt/cray/pmi/4.0.0-1.0000.9282.69.4.ari/lib64
> > > -L/opt/cray/dmapp/4.0.1-1.0500.5932.6.5.ari/lib64
> > > -L/opt/cray/xpmem/0.1-2.0500.36799.3.6.ari/lib64
> > > -L/opt/cray/alps/5.0.1-2.0500.7663.1.1.ari/lib64
> > > -L/opt/cray/rca/1.0.0-2.0500.37705.3.12.ari/lib64
> > > -L/opt/cray/mpt/5.6.0/gni/mpich2-gnu/47/lib
> > > -L/opt/cray/mpt/5.6.0/gni/sma/lib64
> > > -L/opt/cray/libsci/12.0.00/gnu/47/sandybridge/lib
> > > -L/opt/cray/alps/5.0.1-2.0500.7663.1.1.ari/lib64
> > > -L/opt/gcc/4.7.2/snos/lib/gcc/x86_64-suse-linux/4.7.2
> > > -L/opt/gcc/4.7.2/snos/lib/gcc/x86_64-suse-linux/4.7.2/../../../../lib64
> > > -L/lib/../lib64 -L/usr/lib/../lib64
> > > -L/opt/gcc/4.7.2/snos/lib/gcc/x86_64-suse-linux/4.7.2/../../..
> > > /scratch1/scratchdirs/hargrove/cceRJNtp.o -lmpi -lpmi -lalpslli
> -lalpsutil
> > > -lnsl -lutil -lnsl -lutil -lopen-rte -lpmi -lalpslli -lalpsutil -lnsl
> > > -lutil -lnsl -lutil -lopen-pal -lpmi -lalpslli -lalpsutil -lnsl -lutil
> > > -lnsl -lutil -lrca -L/opt/cray/atp/1.6.0/lib/
> --undefined=_ATP_Data_Globals
> > > --undefined=__atpHandlerInstall -lAtpSigHCommData -lAtpSigHandler
> > > --start-group -lgfortran -lscicpp_gnu -lsci_gnu_mp -lstdc++ -lgfortran
> > > -lmpich_gnu_47 -lmpl -lrt -lsma -lxpmem -ldmapp -lugni -lpmi -lalpslli
> > > -lalpsutil -lalps -ludreg -lpthread -lm --end-group -lgomp -lpthread
> > > --start-group -lgcc -lgcc_eh -lc --end-group
> > > /opt/gcc/4.7.2/snos/lib/gcc/x86_64-suse-linux/4.7.2/crtend.o
> > > /usr/lib/../lib64/crtn.o
> > > collect2: error: ld returned 1 exit status
> >
> >
> > Of course the obvious work-around to try is adding "-llustre -llustreapi"
> > to my command line. However, that doesn't work because mpicc places my
> > "-l" args BEFORE its own "-lmpi". Since "-static" is also among the
> > arguments, no symbols are picked up from the luster libs when they appear
> > on the command line before "-lmpi", from which lustre symbols are
> > referenced.
> >
> > Best guess(es):
> > EITHER config/ompi_check_lustre.m4 is failing to add "-llustre
> -llustreapi"
> > to some variable
> > OR the variable set by config/ompi_check_lustre.m4 isn't making its way
> > into the application link command for some reason
> >
> > Note that this is a --disable-shared/--enable-static build which may
> differ
> > from other systems where LUSTRE support gets used/tested.
> >
> > -Paul
> >
> >
> > On Fri, Jan 25, 2013 at 12:01 PM, Ralph Castain <rhc_at_[hidden]>
> wrote:
> >
> > > Thanks Paul
> > >
> > > I'm currently tracking down a problem on the Cray XE6 - it appears that
> > > recent OS release changed the way alps stores allocation info :-(
> > >
> > > Will hopefully have it running soon.
> > >
> > > On Jan 25, 2013, at 10:50 AM, Paul Hargrove <phhargrove_at_[hidden]>
> wrote:
> > >
> > > I was able to compile with openmpi-1.9a1r27905.tar.bz
> > >
> > > I'll report again when I've had an opportunity to run something like
> > > ring_c.
> > >
> > > Thanks,
> > > -Paul
> > >
> > >
> > > On Tue, Jan 22, 2013 at 6:08 PM, Ralph Castain <rhc_at_[hidden]>
> wrote:
> > >
> > >> I went ahead and removed the duplicate code, so this should work now.
> The
> > >> problem is that we re-factored the ompi_info/orte-info code, but
> didn't
> > >> complete the job - specifically, the orte-info tool didn't get
> updated.
> > >> It's about to get revamped yet again when the ompi-rte branch gets
> > >> committed to the trunk, so I'd rather not do any more with it now.
> > >>
> > >> Hopefully, this will be the minimum required.
> > >>
> > >>
> > >> On Jan 22, 2013, at 4:20 PM, Paul Hargrove <phhargrove_at_[hidden]>
> wrote:
> > >>
> > >> I am using the openmpi-1.9a1r27886 tarball and I still see an error
> for
> > >> one of the two duplicate symbols:
> > >>
> > >> CCLD orte-info
> > >> ../../../orte/.libs/libopen-rte.a(orte_info_support.o): In function
> > >> `orte_info_show_orte_version':
> > >> ../../orte/runtime/orte_info_support.c:(.text+0xe10): multiple
> definition
> > >> of `orte_info_show_orte_version'
> > >> version.o:../../../../orte/tools/orte-info/version.c:(.text+0x2370):
> > >> first defined here
> > >>
> > >> -Paul
> > >>
> > >>
> > >> On Fri, Jan 18, 2013 at 3:52 AM, George Bosilca <bosilca_at_[hidden]
> >wrote:
> > >>
> > >>> Luckily for us all the definitions contain the same constant (orte).
> > >>> r27864 should fix this.
> > >>>
> > >>> George.
> > >>>
> > >>>
> > >>> On Jan 18, 2013, at 06:21 , Paul Hargrove <PHHargrove_at_[hidden]>
> wrote:
> > >>>
> > >>> My employer has a nice new Cray XC30 (aka Cascade), and I thought I'd
> > >>> give Open MPI a quick test.
> > >>>
> > >>> Given that it is INTENDED to be API-compatible with the XE series, I
> > >>> began configuring with
> > >>> CC=cc CXX=CC FC=ftn
> --with-platform=lanl/cray_xe6/optimized-nopanasas
> > >>> However, since this is Intel h/w, I commented-out the following 2
> lines
> > >>> in the platform file:
> > >>> with_wrapper_cflags="-march=amdfam10"
> > >>> CFLAGS=-march=amdfam10
> > >>>
> > >>> I am using PrgEnv-gnu/5.0.15, though PrgEnv-intel is the default on
> our
> > >>> system
> > >>>
> > >>> As far as I know, use of 1.6.x is out - no ugni at all, right?
> > >>> So, I didn't even try.
> > >>>
> > >>> I gave openmpi-1.7rc6 a try, but the ALPS headers and libs have moved
> > >>> (as mentioned in ompi-trunk/config/orte_check_alps.m4).
> > >>> Perhaps one should CMR the updated-for-CLE-5 configure logic to the
> 1.7
> > >>> branch?
> > >>>
> > >>> Next, I tried a trunk nightly tarball: openmpi-1.9a1r27862.tar.bz2
> > >>> As I mentioned above, the trunk has the right logic for locating
> ALPS.
> > >>> However, it looks like there is some untested code, protected by "#if
> > >>> WANT_CRAY_PMI2_EXT", that needs work:
> > >>>
> > >>> make[2]: Entering directory
> > >>>
> `/global/scratch/sd/hargrove/OMPI/openmpi-1.9a1r27862/BUILD/orte/mca/db/pmi'
> > >>> CC db_pmi_component.lo
> > >>> CC db_pmi.lo
> > >>> ../../../../../orte/mca/db/pmi/db_pmi.c: In function 'store':
> > >>> ../../../../../orte/mca/db/pmi/db_pmi.c:202: error: 'ptr' undeclared
> > >>> (first use in this function)
> > >>> ../../../../../orte/mca/db/pmi/db_pmi.c:202: error: (Each undeclared
> > >>> identifier is reported only once
> > >>> ../../../../../orte/mca/db/pmi/db_pmi.c:202: error: for each
> function it
> > >>> appears in.)
> > >>> make[2]: *** [db_pmi.lo] Error 1
> > >>> make[2]: Leaving directory
> > >>>
> `/global/scratch/sd/hargrove/OMPI/openmpi-1.9a1r27862/BUILD/orte/mca/db/pmi'
> > >>> make[1]: *** [all-recursive] Error 1
> > >>> make[1]: Leaving directory
> > >>> `/global/scratch/sd/hargrove/OMPI/openmpi-1.9a1r27862/BUILD/orte'
> > >>> make: *** [all-recursive] Error 1
> > >>>
> > >>> I added the missing "char *ptr" declaration a few lines before it's
> > >>> first use, and resumed the build.
> > >>> This time the build terminated at
> > >>>
> > >>> make[2]: Entering directory
> > >>>
> `/global/scratch/sd/hargrove/OMPI/openmpi-1.9a1r27862/BUILD/opal/tools/wrappers'
> > >>> CC opal_wrapper.o
> > >>> CCLD opal_wrapper
> > >>> /usr/bin/ld: attempted static link of dynamic object
> > >>> `../../../opal/.libs/libopen-pal.so'
> > >>> collect2: error: ld returned 1 exit status
> > >>>
> > >>> So I went back to the platform file and changed
> > >>> enable_shared=yes
> > >>> to
> > >>> enable_shared=no
> > >>> No big deal there - I had to make the same change for our XE6.
> > >>>
> > >>> And so I started back at configure (after a "make distclean", to be
> > >>> safe), and here is the next error:
> > >>>
> > >>> Making all in tools/orte-info
> > >>> make[2]: Entering directory
> > >>>
> `/global/scratch/sd/hargrove/OMPI/openmpi-1.9a1r27862/BUILD/orte/tools/orte-info'
> > >>> CCLD orte-info
> > >>> ../../../orte/.libs/libopen-rte.a(orte_info_support.o): In function
> > >>> `orte_info_show_orte_version':
> > >>> orte_info_support.c:(.text+0xd70): multiple definition of
> > >>> `orte_info_show_orte_version'
> > >>> version.o:version.c:(.text+0x4b0): first defined here
> > >>> ../../../orte/.libs/libopen-rte.a(orte_info_support.o):(.data+0x0):
> > >>> multiple definition of `orte_info_type_orte'
> > >>> orte-info.o:(.data+0x10): first defined here
> > >>> /usr/bin/ld: link errors found, deleting executable `orte-info'
> > >>> collect2: error: ld returned 1 exit status
> > >>> make[2]: *** [orte-info] Error 1
> > >>>
> > >>> I am not sure how to fix this, but I would guess this is probably a
> > >>> simple fix for somebody who knows OMPI's build infrastructure better
> than I.
> > >>>
> > >>> -Paul
> > >>>
> > >>> --
> > >>> Paul H. Hargrove PHHargrove_at_[hidden]
> > >>> Future Technologies Group
> > >>> Computer and Data Sciences Department Tel: +1-510-495-2352
> > >>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> > >>> _______________________________________________
> > >>> devel mailing list
> > >>> devel_at_[hidden]
> > >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > >>>
> > >>>
> > >>>
> > >>> _______________________________________________
> > >>> devel mailing list
> > >>> devel_at_[hidden]
> > >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> Paul H. Hargrove PHHargrove_at_[hidden]
> > >> Future Technologies Group
> > >> Computer and Data Sciences Department Tel: +1-510-495-2352
> > >> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> > >> _______________________________________________
> > >> devel mailing list
> > >> devel_at_[hidden]
> > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > >>
> > >>
> > >>
> > >> _______________________________________________
> > >> devel mailing list
> > >> devel_at_[hidden]
> > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > >>
> > >
> > >
> > >
> > > --
> > > Paul H. Hargrove PHHargrove_at_[hidden]
> > > Future Technologies Group
> > > Computer and Data Sciences Department Tel: +1-510-495-2352
> > > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> > > _______________________________________________
> > > devel mailing list
> > > devel_at_[hidden]
> > > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > >
> > >
> > >
> > > _______________________________________________
> > > devel mailing list
> > > devel_at_[hidden]
> > > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > >
> >
> >
> >
> > --
> > Paul H. Hargrove PHHargrove_at_[hidden]
> > Future Technologies Group
> > Computer and Data Sciences Department Tel: +1-510-495-2352
> > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

-- 
Paul H. Hargrove                          PHHargrove_at_[hidden]
Future Technologies Group
Computer and Data Sciences Department     Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900