Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] Open MPI (not quite) on Cray XC30
From: Paul Hargrove (phhargrove_at_[hidden])
Date: 2013-01-18 00:21:58


My employer has a nice new Cray XC30 (aka Cascade), and I thought I'd give
Open MPI a quick test.

Given that it is INTENDED to be API-compatible with the XE series, I began
configuring with
    CC=cc CXX=CC FC=ftn --with-platform=lanl/cray_xe6/optimized-nopanasas
However, since this is Intel h/w, I commented-out the following 2 lines in
the platform file:
    with_wrapper_cflags="-march=amdfam10"
    CFLAGS=-march=amdfam10

I am using PrgEnv-gnu/5.0.15, though PrgEnv-intel is the default on our
system

As far as I know, use of 1.6.x is out - no ugni at all, right?
So, I didn't even try.

I gave openmpi-1.7rc6 a try, but the ALPS headers and libs have moved (as
mentioned in ompi-trunk/config/orte_check_alps.m4).
Perhaps one should CMR the updated-for-CLE-5 configure logic to the 1.7
branch?

Next, I tried a trunk nightly tarball: openmpi-1.9a1r27862.tar.bz2
As I mentioned above, the trunk has the right logic for locating ALPS.
However, it looks like there is some untested code, protected by "#if
WANT_CRAY_PMI2_EXT", that needs work:

make[2]: Entering directory
`/global/scratch/sd/hargrove/OMPI/openmpi-1.9a1r27862/BUILD/orte/mca/db/pmi'
  CC db_pmi_component.lo
  CC db_pmi.lo
../../../../../orte/mca/db/pmi/db_pmi.c: In function 'store':
../../../../../orte/mca/db/pmi/db_pmi.c:202: error: 'ptr' undeclared (first
use in this function)
../../../../../orte/mca/db/pmi/db_pmi.c:202: error: (Each undeclared
identifier is reported only once
../../../../../orte/mca/db/pmi/db_pmi.c:202: error: for each function it
appears in.)
make[2]: *** [db_pmi.lo] Error 1
make[2]: Leaving directory
`/global/scratch/sd/hargrove/OMPI/openmpi-1.9a1r27862/BUILD/orte/mca/db/pmi'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory
`/global/scratch/sd/hargrove/OMPI/openmpi-1.9a1r27862/BUILD/orte'
make: *** [all-recursive] Error 1

I added the missing "char *ptr" declaration a few lines before it's first
use, and resumed the build.
This time the build terminated at

make[2]: Entering directory
`/global/scratch/sd/hargrove/OMPI/openmpi-1.9a1r27862/BUILD/opal/tools/wrappers'
  CC opal_wrapper.o
  CCLD opal_wrapper
/usr/bin/ld: attempted static link of dynamic object
`../../../opal/.libs/libopen-pal.so'
collect2: error: ld returned 1 exit status

So I went back to the platform file and changed
   enable_shared=yes
to
   enable_shared=no
No big deal there - I had to make the same change for our XE6.

And so I started back at configure (after a "make distclean", to be safe),
and here is the next error:

Making all in tools/orte-info
make[2]: Entering directory
`/global/scratch/sd/hargrove/OMPI/openmpi-1.9a1r27862/BUILD/orte/tools/orte-info'
  CCLD orte-info
../../../orte/.libs/libopen-rte.a(orte_info_support.o): In function
`orte_info_show_orte_version':
orte_info_support.c:(.text+0xd70): multiple definition of
`orte_info_show_orte_version'
version.o:version.c:(.text+0x4b0): first defined here
../../../orte/.libs/libopen-rte.a(orte_info_support.o):(.data+0x0):
multiple definition of `orte_info_type_orte'
orte-info.o:(.data+0x10): first defined here
/usr/bin/ld: link errors found, deleting executable `orte-info'
collect2: error: ld returned 1 exit status
make[2]: *** [orte-info] Error 1

I am not sure how to fix this, but I would guess this is probably a simple
fix for somebody who knows OMPI's build infrastructure better than I.

-Paul

-- 
Paul H. Hargrove                          PHHargrove_at_[hidden]
Future Technologies Group
Computer and Data Sciences Department     Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900