Open MPI logo

Hardware Locality Users' Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Users mailing list

Subject: [hwloc-users] Blue Gene/Q support
From: Jeff Hammond (jhammond_at_[hidden])
Date: 2012-05-03 10:41:35


Hi,

I'm interested in seeing hwloc supported on Blue Gene/Q.

As it is not listed on http://www.open-mpi.org/projects/hwloc/ nor
does it use a standard operating system (although CNK is POSIX and
very Linux-like in general), I didn't have a reasonable expectation
that it would work without some development, but I verified that some
of the tests fail, just to be sure.

Fortunately, almost all of the tests passed, except for glibc-sched
and hwloc_bind. I suspect this is due to the various
incompatibilities between glibc in CNK vs. Linux. The failure of both
tests occurs with XLC and GCC, although I report the details below for
XLC only.

Should I report this as a bug? Is it sufficient to port to Blue
Gene/Q if I provide the kernel API calls related to thread location,
etc. or would any hwloc developers be interested in Blue Gene/Q access
for the purposes of development and testing?

Thanks,

Jeff

[jhammond_at_cetuslac1 tests]$ ../configure
--prefix=/home/jhammond/HWLOC/hwloc-1.4.2rc1/install --enable-static
--disable-shared CC=bgxlc_r --host=powerpc64-bgq-linux

==> 16927.output <==

==> 16927.cobaltlog <==
/usr/bin/qsub.py -t 15 -n 1 --mode=c1 ./glibc-sched

submitted with cwd set to: /veas_home/jhammond/HWLOC/hwloc-1.4.2rc1/build/tests

Command: '/bgsys/drivers/ppcfloor/bin/runjob' '--np' '1'
'--ranks-per-node' '1' '--cwd'
'/veas_home/jhammond/HWLOC/hwloc-1.4.2rc1/build/tests' '--block'
'EAS-20040-31371-128' '--corner' 'R00-M1-N04-J09' '--shape'
'1x1x1x1x1' '--envs' 'COBALT_JOBID=16927' '--envs'
'BG_SHAREDMEMSIZE=32' '--verbose' '4' ':'
'/veas_home/jhammond/HWLOC/hwloc-1.4.2rc1/build/tests/./glibc-sched'

Info: stdin received from /dev/null
Info: stdout sent to
/veas_home/jhammond/HWLOC/hwloc-1.4.2rc1/build/tests/16927.output
Info: stderr sent to
/veas_home/jhammond/HWLOC/hwloc-1.4.2rc1/build/tests/16927.error

Job 16927/jhammond/19558: Block EAS-20040-31371-128 for location
EAS-31151-31151-1 already booted. Starting task for job 16927. (APG)
Info: task completed normally with an exit code of 134; initiating job
cleanup and removal

==> 16927.error <==
2012-05-03 14:06:58.888 (INFO ) [0xfffac848a40]
EAS-20040-31371-128:1498:ibm.runjob.client.options.Parser: set local
socket to runjob_mux from properties file
2012-05-03 14:07:00.892 (INFO ) [0xfffac848a40]
EAS-20040-31371-128:34615:ibm.runjob.client.Job: job 34615 started
glibc-sched: ../../tests/glibc-sched.c:43: main: Assertion `!err' failed.
2012-05-03 14:07:02.772 (WARN ) [0xfffac848a40]
EAS-20040-31371-128:34615:ibm.runjob.client.Job: terminated by signal
6
2012-05-03 14:07:02.772 (WARN ) [0xfffac848a40]
EAS-20040-31371-128:34615:ibm.runjob.client.Job: abnormal termination
by signal 6 from rank 0

==> 16912.output <==
system set is 0xffffffff,0xffffffff
Bind this singlethreaded process : FAILED (3, No such process)

==> 16912.cobaltlog <==
/usr/bin/qsub.py -t 15 -n 1 --mode=c1 ./hwloc_bind

submitted with cwd set to: /veas_home/jhammond/HWLOC/hwloc-1.4.2rc1/build/tests

Command: '/bgsys/drivers/ppcfloor/bin/runjob' '--np' '1'
'--ranks-per-node' '1' '--cwd'
'/veas_home/jhammond/HWLOC/hwloc-1.4.2rc1/build/tests' '--block'
'EAS-20040-31371-128' '--corner' 'R00-M1-N04-J27' '--shape'
'1x1x1x1x1' '--envs' 'COBALT_JOBID=16912' '--envs'
'BG_SHAREDMEMSIZE=32' '--verbose' '4' ':'
'/veas_home/jhammond/HWLOC/hwloc-1.4.2rc1/build/tests/./hwloc_bind'

Info: stdin received from /dev/null
Info: stdout sent to
/veas_home/jhammond/HWLOC/hwloc-1.4.2rc1/build/tests/16912.output
Info: stderr sent to
/veas_home/jhammond/HWLOC/hwloc-1.4.2rc1/build/tests/16912.error

Job 16912/jhammond/19536: Block EAS-20040-31371-128 for location
EAS-20141-20141-1 already booted. Starting task for job 16912. (APG)
Info: task completed normally with an exit code of 134; initiating job
cleanup and removal

==> 16912.error <==
2012-05-03 14:02:09.516 (INFO ) [0xfffb35c8a40]
EAS-20040-31371-128:31883:ibm.runjob.client.options.Parser: set local
socket to runjob_mux from properties file
2012-05-03 14:02:11.526 (INFO ) [0xfffb35c8a40]
EAS-20040-31371-128:34593:ibm.runjob.client.Job: job 34593 started
*** glibc detected ***
/veas_home/jhammond/HWLOC/hwloc-1.4.2rc1/build/tests/./hwloc_bind:
free(): invalid next size (fast): 0x00000019c5016b00 ***
======= Backtrace: =========
[0x103d398]
[0x1042ad8]
[0x1049800]
[0x101e178]
[0x101e220]
[0x1024520]
[0x1024974]
[0x1024ccc]
[0x1024eb0]
[0x1016478]
[0x10014a4]
[0x1001958]
[0x10260d8]
[0x10263d4]
======= Memory map: ========
10000000-10060000 r-xp 00000000 00:0f 1504118
  /sbin/sysiod
10060000-10070000 rw-p 00050000 00:0f 1504118
  /sbin/sysiod
10070000-10100000 rw-p 00000000 00:00 0 [heap]
fff90000000-fff90030000 rw-p 00000000 00:00 0
fff90030000-fff94000000 ---p 00000000 00:00 0
fff94c00000-fff94d00000 rw-p 00000000 00:00 0
fff95a00000-fff95a30000 rw-p 00000000 00:00 0
fff95a30000-fff95a40000 -w-s 3fcc48f0000 00:0e 11206
  /dev/infiniband/uverbs0
fff95a40000-fff95a50000 -w-s 3fcc08f0000 00:0e 11206
  /dev/infiniband/uverbs0
fff95a50000-fff95a60000 r-xp 00000000 00:0f 3369690
  /usr/lib64/libbgvrnic-rdmav2.so
fff95a60000-fff95a70000 rw-p 00000000 00:0f 3369690
  /usr/lib64/libbgvrnic-rdmav2.so
fff95a70000-fff95a80000 ---p 00000000 00:00 0
fff95a80000-fff96480000 rw-p 00000000 00:00 0
fff96480000-fff964b0000 r-xp 00000000 00:0f 3350116
  /lib64/libselinux.so.1
fff964b0000-fff964c0000 r--p 00020000 00:0f 3350116
  /lib64/libselinux.so.1
fff964c0000-fff964d0000 rw-p 00030000 00:0f 3350116
  /lib64/libselinux.so.1
fff964d0000-fff96540000 r-xp 00000000 00:0f 2638150
  /lib64/libfreebl3.so
fff96540000-fff96550000 r--p 00060000 00:0f 2638150
  /lib64/libfreebl3.so
fff96550000-fff96560000 rw-p 00070000 00:0f 2638150
  /lib64/libfreebl3.so
fff96560000-fff96570000 r-xp 00000000 00:0f 2939812
  /lib64/libkeyutils.so.1.3
fff96570000-fff96580000 r--p 00000000 00:0f 2939812
  /lib64/libkeyutils.so.1.3
fff96580000-fff96590000 rw-p 00010000 00:0f 2939812
  /lib64/libkeyutils.so.1.3
fff96590000-fff965a0000 r-xp 00000000 00:0f 2924424
  /lib64/libkrb5support.so.0.1
fff965a0000-fff965b0000 r--p 00000000 00:0f 2924424
  /lib64/libkrb5support.so.0.1
fff965b0000-fff965c0000 rw-p 00010000 00:0f 2924424
  /lib64/libkrb5support.so.0.1
fff965c0000-fff97510000 r-xp 00000000 00:0f 914026
  /usr/lib64/libicudata.so.42.1
fff97510000-fff97520000 rw-p 00f40000 00:0f 914026
  /usr/lib64/libicudata.so.42.1
fff97520000-fff97550000 r-xp 00000000 00:0f 1277798
  /usr/lib64/libsasl2.so.2.0.23
fff97550000-fff97560000 r--p 00020000 00:0f 1277798
  /usr/lib64/libsasl2.so.2.0.23
fff97560000-fff97570000 rw-p 00030000 00:0f 1277798
  /usr/lib64/libsasl2.so.2.0.23
fff97570000-fff975d0000 r-xp 00000000 00:0f 1327117
  /lib64/libnspr4.so
fff975d0000-fff975e0000 r--p 00050000 00:0f 1327117
  /lib64/libnspr4.so
fff975e0000-fff975f0000 rw-p 00060000 00:0f 1327117
  /lib64/libnspr4.so
fff975f0000-fff97600000 r-xp 00000000 00:0f 1250566
  /lib64/libplc4.so
fff97600000-fff97610000 r--p 00000000 00:0f 1250566
  /lib64/libplc4.so
fff97610000-fff97620000 rw-p 00010000 00:0f 1250566
  /lib64/libplc4.so
fff97620000-fff97630000 r-xp 00000000 00:0f 2447916
  /lib64/libplds4.so
fff97630000-fff97640000 r--p 00000000 00:0f 2447916
  /lib64/libplds4.so
fff97640000-fff97650000 rw-p 00010000 00:0f 2447916
  /lib64/libplds4.so
fff97650000-fff97680000 r-xp 00000000 00:0f 2121934
  /usr/lib64/libnssutil3.so
fff97680000-fff97690000 r--p 00020000 00:0f 2121934
  /usr/lib64/libnssutil3.so
fff97690000-fff976a0000 rw-p 00030000 00:0f 2121934
  /usr/lib64/libnssutil3.so
fff976a0000-fff976b0000 rw-p 00000000 00:00 0
fff976b0000-fff97840000 r-xp 00000000 00:0f 811615
  /usr/lib64/libnss3.so
fff97840000-fff97850000 r--p 00180000 00:0f 811615
  /usr/lib64/libnss3.so
fff97850000-fff97860000 rw-p 00190000 00:0f 811615
  /usr/lib64/libnss3.so
fff97860000-fff97870000 rw-p 00000000 00:00 0
fff97870000-fff978b0000 r-xp 00000000 00:0f 1001793
  /usr/lib64/libsmime3.so
fff978b0000-fff978c0000 r--p 00030000 00:0f 1001793
  /usr/lib64/libsmime3.so
fff978c0000-fff978d0000 rw-p 00040000 00:0f 1001793
  /usr/lib64/libsmime3.so
fff978d0000-fff97920000 r-xp 00000000 00:0f 2769018
  /usr/lib64/libssl3.so
fff97920000-fff97930000 r--p 00040000 00:0f 2769018
  /usr/lib64/libssl3.so
fff97930000-fff97940000 rw-p 00050000 00:0f 2769018
  /usr/lib64/libssl3.so
fff97940000-fff97960000 r-xp 00000000 00:0f 3219112
  /lib64/libresolv-2.12.so
fff97960000-fff97970000 r--p 00010000 00:0f 3219112
  /lib64/libresolv-2.12.so
fff97970000-fff97980000 rw-p 00020000 00:0f 3219112
  /lib64/libresolv-2.12.so
fff97980000-fff97990000 r-xp 00000000 00:0f 822832
  /lib64/libcrypt-2.12.so
fff97990000-fff979a0000 r--p 00000000 00:0f 822832
  /lib64/libcrypt-2.12.so
fff979a0000-fff979b0000 rw-p 00010000 00:0f 822832
  /lib64/libcrypt-2.12.so
fff979b0000-fff979d0000 rw-p 00000000 00:00 0
fff979d0000-fff979e0000 r-xp 00000000 00:0f 1327124
  /lib64/libuuid.so.1.3.0
fff979e0000-fff979f0000 rw-p 00000000 00:0f 1327124
  /lib64/libuuid.so.1.3.0
fff979f0000-fff97a10000 r-xp 00000000 00:0f 2029697
  /lib64/libz.so.1.2.3
fff97a10000-fff97a20000 r--p 00010000 00:0f 2029697
  /lib64/libz.so.1.2.3
fff97a20000-fff97a30000 rw-p 00020000 00:0f 2029697
  /lib64/libz.so.1.2.3
fff97a30000-fff97c50000 r-xp 00000000 00:0f 2098005
  /usr/lib64/libcrypto.so.1.0.0
fff97c50000-fff97c70000 r--p 00210000 00:0f 2098005
  /usr/lib64/libcrypto.so.1.0.0
fff97c70000-fff97c90000 rw-p 00230000 00:0f 2098005
  /usr/lib64/libcrypto.so.1.0.0
fff97c90000-fff97ca0000 rw-p 00000000 00:00 0
fff97ca0000-fff97ce0000 r-xp 00000000 00:0f 1250555
  /lib64/libk5crypto.so.3.1
fff97ce0000-fff97cf0000 r--p 00030000 00:0f 1250555
  /lib64/libk5crypto.so.3.1
fff97cf0000-fff97d00000 rw-p 00040000 00:0f 1250555
  /lib64/libk5crypto.so.3.1
fff97d00000-fff97d10000 r-xp 00000000 00:0f 1327131
  /lib64/libcom_err.so.2.1
fff97d10000-fff97d20000 r--p 00000000 00:0f 1327131
  /lib64/libcom_err.so.2.1
fff97d20000-fff97d30000 rw-p 00010000 00:0f 1327131
  /lib64/libcom_err.so.2.1
fff97d30000-fff97e40000 r-xp 00000000 00:0f 3350124
  /lib64/libkrb5.so.3.3
fff97e40000-fff97e50000 r--p 00100000 00:0f 3350124
  /lib64/libkrb5.so.3.3
fff97e50000-fff97e60000 rw-p 00110000 00:0f 3350124
  /lib64/libkrb5.so.3.3
fff97e60000-fff97eb0000 r-xp 00000000 00:0f 1327132
  /lib64/libgssapi_krb5.so.2.2
fff97eb0000-fff97ec0000 r--p 00040000 00:0f 1327132
  /lib64/libgssapi_krb5.so.2.2
fff97ec0000-fff97ed0000 rw-p 00050000 00:0f 1327132
  /lib64/libgssapi_krb5.so.2.2
fff97ed0000-fff980f0000 r-xp 00000000 00:0f 159290
  /usr/lib64/libicui18n.so.42.1
fff980f0000-fff98120000 rw-p 00210000 00:0f 159290
  /usr/lib64/libicui18n.so.42.1
fff98120000-fff982c0000 r-xp 00000000 00:0f 2142969
  /usr/lib64/libicuuc.so.42.1
fff982c0000-fff982e0000 rw-p 001a0000 00:0f 2142969
  /usr/lib64/libicuuc.so.42.1
fff982e0000-fff982f0000 r-xp 00000000 00:0f 1250562
  /lib64/libdl-2.12.so
fff982f0000-fff98300000 r--p 00000000 00:0f 1250562
  /lib64/libdl-2.12.so
fff98300000-fff98310000 rw-p 00010000 00:0f 1250562
  /lib64/libdl-2.12.so
fff98310000-fff98350000 r-xp 00000000 00:0f 78455
  /usr/lib64/libapr-1.so.0.3.9
fff98350000-fff98360000 rw-p 00040000 00:0f 78455
  /usr/lib64/libapr-1.so.0.3.9
fff98360000-fff98530000 r-xp 00000000 00:0f 3221811
  /lib64/libdb-4.7.so
fff98530000-fff98550000 rw-p 001d0000 00:0f 3221811
  /lib64/libdb-4.7.so
fff98550000-fff98590000 r-xp 00000000 00:0f 3219122
  /lib64/libexpat.so.1.5.2
fff98590000-fff985a0000 rw-p 00030000 00:0f 3219122
  /lib64/libexpat.so.1.5.2
fff985a0000-fff985c0000 r-xp 00000000 00:0f 2029706
  /lib64/liblber-2.4.so.2.5.6
fff985c0000-fff985d0000 r--p 00010000 00:0f 2029706
  /lib64/liblber-2.4.so.2.5.6
fff985d0000-fff985e0000 rw-p 00020000 00:0f 2029706
  /lib64/liblber-2.4.so.2.5.6
fff985e0000-fff98650000 r-xp 00000000 00:0f 1327127
  /lib64/libldap-2.4.so.2.5.6
fff98650000-fff98660000 r--p 00060000 00:0f 1327127
  /lib64/libldap-2.4.so.2.5.6
fff98660000-fff98670000 rw-p 00070000 00:0f 1327127
  /lib64/libldap-2.4.so.2.5.6
fff98670000-fff986b0000 r-xp 00000000 00:0f 571721
  /usr/lib64/libaprutil-1.so.0.3.9
fff986b0000-fff986c0000 rw-p 00030000 00:0f 571721
  /usr/lib64/libaprutil-1.so.0.3.9
fff986c0000-fff986d0000 r-xp 00000000 00:0f 2632456
  /lib64/librt-2.12.so
fff986d0000-fff986e0000 r--p 00000000 00:0f 2632456
  /lib64/librt-2.12.so
fff986e0000-fff986f0000 rw-p 00010000 00:0f 2632456
  /lib64/librt-2.12.so
fff986f0000-fff98760000 r-xp 00000000 00:0f 2078098
  /usr/lib64/libssl.so.1.0.0
fff98760000-fff98770000 r--p 00060000 00:0f 2078098
  /usr/lib64/libssl.so.1.0.0
fff98770000-fff98780000 rw-p 00070000 00:0f 2078098
  /usr/lib64/libssl.so.1.0.0
fff98780000-fff988b0000 r-xp 00000000 00:0f 3397951
  /usr/lib64/libboost_regex-mt.so.5
fff988b0000-fff988c0000 rw-p 00130000 00:0f 3397951
  /usr/lib64/libboost_regex-mt.so.5
fff988c0000-fff988e0000 r-xp 00000000 00:0f 3135862
  /usr/lib64/libboost_thread-mt.so.5
fff988e0000-fff988f0000 rw-p 00010000 00:0f 3135862
  /usr/lib64/libboost_thread-mt.so.5
fff988f0000-fff98980000 r-xp 00000000 00:0f 784193
  /usr/lib64/libboost_serialization-mt.so.5
fff98980000-fff98990000 rw-p 00080000 00:0f 784193
  /usr/lib64/libboost_serialization-mt.so.5
fff98990000-fff989a0000 r-xp 00000000 00:0f 2912814
  /usr/lib64/libmlx4-rdmav2.so
fff989a0000-fff989b0000 rw-p 00000000 00:0f 2912814
  /usr/lib64/libmlx4-rdmav2.so
fff989b0000-fff989c0000 r-xp 00000000 00:0f 2213992
  /usr/lib64/librdmacm.so.1.0.0
fff989c0000-fff989d0000 rw-p 00000000 00:0f 2213992
  /usr/lib64/librdmacm.so.1.0.0
fff989d0000-fff989f0000 r-xp 00000000 00:0f 948700
  /usr/lib64/libibverbs.so.1.0.0
fff989f0000-fff98a00000 rw-p 00010000 00:0f 948700
  /usr/lib64/libibverbs.so.1.0.0
fff98a00000-fff98d60000 r-xp 00000000 00:0f 2989947
  /lib/liblog4cxx.so.10.0.0
fff98d60000-fff98db0000 rw-p 00360000 00:0f 2989947
  /lib/liblog4cxx.so.10.0.0
fff98db0000-fff98dc0000 rw-p 00000000 00:00 0
fff98dc0000-fff98f80000 r-xp 00000000 00:0f 76138
  /lib64/libc-2.12.so
fff98f80000-fff98f90000 r--p 001b0000 00:0f 76138
  /lib64/libc-2.12.so
fff98f90000-fff98fa0000 rw-p 001c0000 00:0f 76138
  /lib64/libc-2.12.so
fff98fa0000-fff98fb0000 rw-p 00000000 00:00 0
fff98fb0000-fff98fd0000 r-xp 00000000 00:0f 3350121
  /lib64/libpthread-2.12.so
fff98fd0000-fff98fe0000 r--p 00010000 00:0f 3350121
  /lib64/libpthread-2.12.so
fff98fe0000-fff98ff0000 rw-p 00020000 00:0f 3350121
  /lib64/libpthread-2.12.so
fff98ff0000-fff99010000 r-xp 00000000 00:0f 822834
  /lib64/libgcc_s-4.4.6-20110824.so.1
fff99010000-fff99020000 rw-p 00010000 00:0f 822834
  /lib64/libgcc_s-4.4.6-20110824.so.1
fff99020000-fff99100000 r-xp 00000000 00:0f 3219113
  /lib64/libm-2.12.so
fff99100000-fff99110000 r--p 000d0000 00:0f 3219113
  /lib64/libm-2.12.so
fff99110000-fff99120000 rw-p 000e0000 00:0f 3219113
  /lib64/libm-2.12.so
fff99120000-fff99280000 r-xp 00000000 00:0f 2869896
  /usr/lib64/libstdc++.so.6.0.13
fff99280000-fff99290000 r--p 00150000 00:0f 2869896
  /usr/lib64/libstdc++.so.6.0.13
fff99290000-fff992a0000 rw-p 00160000 00:0f 2869896
  /usr/lib64/libstdc++.so.6.0.13
fff992a0000-fff992c0000 rw-p 00000000 00:00 0
fff992c0000-fff99320000 r-xp 00000000 00:0f 1455532
  /lib64/libbgcios.so.1.0.0
fff99320000-fff99330000 rw-p 00060000 00:0f 1455532
  /lib64/libbgcios.so.1.0.0
fff99330000-fff99390000 r-xp 00000000 00:0f 2565847
  /usr/lib64/libboost_program_options-mt.so.5
fff99390000-fff993a0000 rw-p 00060000 00:0f 2565847
  /usr/lib64/libboost_program_options-mt.so.5
fff993a0000-fff993b0000 r-xp 00000000 00:0f 3263171
  /usr/lib64/libboost_system-mt.so.5
fff993b0000-fff993c0000 rw-p 00000000 00:0f 3263171
  /usr/lib64/libboost_system-mt.so.5
fff993c0000-fff99860000 r-xp 00000000 00:0f 2989944
  /lib/libbgutility.so.1.0.0
fff99860000-fff99890000 rw-p 00490000 00:0f 2989944
  /lib/libbgutility.so.1.0.0
fff99890000-fff998b0000 rw-p 00000000 00:00 0
fff998b0000-fff998d0000 r-xp 00000000 00:00 0 [vdso]
fff998d0000-fff99900000 r-xp 00000000 00:0f 2939827
  /lib64/ld-2.12.so
fff99900000-fff99910000 r--p 00020000 00:0f 2939827
  /lib64/ld-2.12.so
fff99910000-fff99920000 rw-p 00030000 00:0f 2939827
  /lib64/ld-2.12.so
fffca310000-fffca460000 rw-p 00000000 00:00 0 [stack]
2012-05-03 14:02:13.391 (WARN ) [0xfffb35c8a40]
EAS-20040-31371-128:34593:ibm.runjob.client.Job: terminated by signal
6
2012-05-03 14:02:13.392 (WARN ) [0xfffb35c8a40]
EAS-20040-31371-128:34593:ibm.runjob.client.Job: abnormal termination
by signal 6 from rank 0

-- 
Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond_at_[hidden] / (630) 252-5381
http://www.linkedin.com/in/jeffhammond
https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond (in-progress)
https://wiki.alcf.anl.gov/old/index.php/User:Jhammond (deprecated)
https://wiki-old.alcf.anl.gov/index.php/User:Jhammond(deprecated)