Following up as I promised...

My results on NERSC's small Cray XE6 (the test/dev rack "Grace", rather than the full-sized "Hopper") match those I get on the Cray XC30 (Edison), and don't follow those Ralph reports for LANL's XE6.

An attempt to build/link hello_c.c results in unresolved symbols from libnuma, libxpmem and libugni.
A complete list is available if it matters.

This is still with last night's openmpi-1.9a1r27905 tarball, and the following 1-line mod to the platform file:
- enable_shared=yes
+ enable_shared=no

If it will help determine what is going on, I can probably get NERSC accounts for any of the DOE Lab folks easily.
They will only get access to the full-sized XE6 (Hopper) for now.

In case any of these are helpful clues to the difference(s):
$ module list
Currently Loaded Modulefiles:
  1) modules/3.2.6.6                         18) dvs/1.8.6_0.9.0-1.0401.1401.1.120
  2) torque/4.1.4-snap.201211160904          19) csa/3.0.0-1_2.0401.37452.4.50.gem
  3) moab/6.0.4                              20) job/1.5.5-0.1_2.0401.35380.1.10.gem
  4) xtpe-network-gemini                     21) xpmem/0.1-2.0401.36790.4.3.gem
  5) cray-mpich2/5.6.0                       22) gni-headers/2.1-1.0401.5675.4.4.gem
  6) atp/1.6.0                               23) dmapp/3.2.1-1.0401.5983.4.5.gem
  7) xe-sysroot/4.1.40                       24) pmi/4.0.0-1.0000.9282.69.4.gem
  8) switch/1.0-1.0401.36779.2.72.gem        25) ugni/4.0-1.0401.5928.9.5.gem
  9) shared-root/1.0-1.0401.37253.3.50.gem   26) udreg/2.3.2-1.0401.5929.3.3.gem
 10) pdsh/2.26-1.0401.37449.1.1.gem          27) xt-libsci/12.0.00
 11) nodehealth/5.0-1.0401.38460.12.18.gem   28) gcc/4.7.2
 12) lbcd/2.1-1.0401.35360.1.2.gem           29) xt-asyncpe/5.16
 13) hosts/1.0-1.0401.35364.1.115.gem        30) eswrap/1.0.10
 14) configuration/1.0-1.0401.35391.1.2.gem  31) xtpe-mc12
 15) ccm/2.2.0-1.0401.37254.2.142            32) cray-shmem/5.6.0
 16) audit/1.0.0-1.0401.37969.2.32.gem       33) PrgEnv-gnu/4.1.40
 17) rca/1.0.0-2.0401.38656.2.2.gem


-Paul


On Fri, Jan 25, 2013 at 5:50 PM, Paul Hargrove <phhargrove@lbl.gov> wrote:
Ralph,

Again our results differ.
I did NOT need the additional #include to link a simple test program.
I am going to try on our XE6 shortly.

I suspect you are right about something in the configury being different.
I am willing to try a few more nightly tarballs if somebody thinks they have the proper fix.

-Paul


On Fri, Jan 25, 2013 at 5:45 PM, Ralph Castain <rhc@open-mpi.org> wrote:

On Jan 25, 2013, at 5:12 PM, Paul Hargrove <phhargrove@lbl.gov> wrote:

Ralph,

Those are the result of the missing -lnuma that Nathan already identified earlier as missing in BOTH 1.7 and trunk.
I see MORE missing symbols, which include ones from libxpmem and libugni.

Alright, let me try to be clearer. We are missing -lnuma as well as the required include file - both are necessary to remove the issue.

I find both the xpmem and ugni libraries *are* correctly included in both 1.7 and trunk. It could be a case of finding them in the configury, but we are finding them *and* correctly including them on the XE6.

HTH
Ralph


-Paul


On Fri, Jan 25, 2013 at 4:59 PM, Ralph Castain <rhc@open-mpi.org> wrote:

On Jan 25, 2013, at 4:53 PM, Ralph Castain <rhc@open-mpi.org> wrote:
> The repeated libs is something we obviously should fix, but all the libs are there - including lustre. I guess those were dropped due to the shared lib setting, so we probably should fix that in the platform file.
>
> Perhaps that is the cause of Nathan's issue? shrug...regardless, apps build and run just fine using mpicc for me.

Correction - turns out I misspoke. I find apps *don't* build correctly with this setup:

mpicc -g    hello_c.c   -o hello_c
/usr/aprojects/hpctools/rhc/build/lib/libopen-pal.a(topology-linux.o): In function `hwloc_linux_set_area_membind':
/lscratch1/rcastain/openmpi-1.9a1/opal/mca/hwloc/hwloc151/hwloc/src/topology-linux.c:1116: undefined reference to `mbind'
/lscratch1/rcastain/openmpi-1.9a1/opal/mca/hwloc/hwloc151/hwloc/src/topology-linux.c:1135: undefined reference to `mbind'
/usr/aprojects/hpctools/rhc/build/lib/libopen-pal.a(topology-linux.o): In function `hwloc_linux_get_area_membind':
/lscratch1/rcastain/openmpi-1.9a1/opal/mca/hwloc/hwloc151/hwloc/src/topology-linux.c:1337: undefined reference to `get_mempolicy'
/usr/aprojects/hpctools/rhc/build/lib/libopen-pal.a(topology-linux.o): In function `hwloc_linux_find_kernel_max_numnodes':
/lscratch1/rcastain/openmpi-1.9a1/opal/mca/hwloc/hwloc151/hwloc/src/topology-linux.c:1239: undefined reference to `get_mempolicy'
/usr/aprojects/hpctools/rhc/build/lib/libopen-pal.a(topology-linux.o): In function `hwloc_linux_set_thisthread_membind':
/lscratch1/rcastain/openmpi-1.9a1/opal/mca/hwloc/hwloc151/hwloc/src/topology-linux.c:1183: undefined reference to `set_mempolicy'
/lscratch1/rcastain/openmpi-1.9a1/opal/mca/hwloc/hwloc151/hwloc/src/topology-linux.c:1194: undefined reference to `migrate_pages'
/lscratch1/rcastain/openmpi-1.9a1/opal/mca/hwloc/hwloc151/hwloc/src/topology-linux.c:1206: undefined reference to `set_mempolicy'
/usr/aprojects/hpctools/rhc/build/lib/libopen-pal.a(topology-linux.o): In function `hwloc_linux_get_thisthread_membind':
/lscratch1/rcastain/openmpi-1.9a1/opal/mca/hwloc/hwloc151/hwloc/src/topology-linux.c:1284: undefined reference to `get_mempolicy'
/usr/aprojects/hpctools/rhc/build/lib/libopen-pal.a(topology-linux.o): In function `hwloc_linux_find_kernel_max_numnodes':
/lscratch1/rcastain/openmpi-1.9a1/opal/mca/hwloc/hwloc151/hwloc/src/topology-linux.c:1239: undefined reference to `get_mempolicy'
collect2: ld returned 1 exit status
make: *** [hello_c] Error 1

So it looks like hwloc is borked when built static.

Sigh
Ralph


_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Paul H. Hargrove                          PHHargrove@lbl.gov
Future Technologies Group
Computer and Data Sciences Department     Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Paul H. Hargrove                          PHHargrove@lbl.gov
Future Technologies Group
Computer and Data Sciences Department     Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900



--
Paul H. Hargrove                          PHHargrove@lbl.gov
Future Technologies Group
Computer and Data Sciences Department     Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900