Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: [OMPI devel] Open MPI (not quite) on Cray XC30
From: Paul Hargrove (phhargrove_at_[hidden])
Date: 2013-01-18 00:21:58


My employer has a nice new Cray XC30 (aka Cascade), and I thought I'd give
Open MPI a quick test.

Given that it is INTENDED to be API-compatible with the XE series, I began
configuring with
    CC=cc CXX=CC FC=ftn --with-platform=lanl/cray_xe6/optimized-nopanasas
However, since this is Intel h/w, I commented-out the following 2 lines in
the platform file:
    with_wrapper_cflags="-march=amdfam10"
    CFLAGS=-march=amdfam10

I am using PrgEnv-gnu/5.0.15, though PrgEnv-intel is the default on our
system

As far as I know, use of 1.6.x is out - no ugni at all, right?
So, I didn't even try.

I gave openmpi-1.7rc6 a try, but the ALPS headers and libs have moved (as
mentioned in ompi-trunk/config/orte_check_alps.m4).
Perhaps one should CMR the updated-for-CLE-5 configure logic to the 1.7
branch?

Next, I tried a trunk nightly tarball: openmpi-1.9a1r27862.tar.bz2
As I mentioned above, the trunk has the right logic for locating ALPS.
However, it looks like there is some untested code, protected by "#if
WANT_CRAY_PMI2_EXT", that needs work:

make[2]: Entering directory
`/global/scratch/sd/hargrove/OMPI/openmpi-1.9a1r27862/BUILD/orte/mca/db/pmi'
  CC db_pmi_component.lo
  CC db_pmi.lo
../../../../../orte/mca/db/pmi/db_pmi.c: In function 'store':
../../../../../orte/mca/db/pmi/db_pmi.c:202: error: 'ptr' undeclared (first
use in this function)
../../../../../orte/mca/db/pmi/db_pmi.c:202: error: (Each undeclared
identifier is reported only once
../../../../../orte/mca/db/pmi/db_pmi.c:202: error: for each function it
appears in.)
make[2]: *** [db_pmi.lo] Error 1
make[2]: Leaving directory
`/global/scratch/sd/hargrove/OMPI/openmpi-1.9a1r27862/BUILD/orte/mca/db/pmi'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory
`/global/scratch/sd/hargrove/OMPI/openmpi-1.9a1r27862/BUILD/orte'
make: *** [all-recursive] Error 1

I added the missing "char *ptr" declaration a few lines before it's first
use, and resumed the build.
This time the build terminated at

make[2]: Entering directory
`/global/scratch/sd/hargrove/OMPI/openmpi-1.9a1r27862/BUILD/opal/tools/wrappers'
  CC opal_wrapper.o
  CCLD opal_wrapper
/usr/bin/ld: attempted static link of dynamic object
`../../../opal/.libs/libopen-pal.so'
collect2: error: ld returned 1 exit status

So I went back to the platform file and changed
   enable_shared=yes
to
   enable_shared=no
No big deal there - I had to make the same change for our XE6.

And so I started back at configure (after a "make distclean", to be safe),
and here is the next error:

Making all in tools/orte-info
make[2]: Entering directory
`/global/scratch/sd/hargrove/OMPI/openmpi-1.9a1r27862/BUILD/orte/tools/orte-info'
  CCLD orte-info
../../../orte/.libs/libopen-rte.a(orte_info_support.o): In function
`orte_info_show_orte_version':
orte_info_support.c:(.text+0xd70): multiple definition of
`orte_info_show_orte_version'
version.o:version.c:(.text+0x4b0): first defined here
../../../orte/.libs/libopen-rte.a(orte_info_support.o):(.data+0x0):
multiple definition of `orte_info_type_orte'
orte-info.o:(.data+0x10): first defined here
/usr/bin/ld: link errors found, deleting executable `orte-info'
collect2: error: ld returned 1 exit status
make[2]: *** [orte-info] Error 1

I am not sure how to fix this, but I would guess this is probably a simple
fix for somebody who knows OMPI's build infrastructure better than I.

-Paul

-- 
Paul H. Hargrove                          PHHargrove_at_[hidden]
Future Technologies Group
Computer and Data Sciences Department     Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900