Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Eric Thibodeau (kyron_at_[hidden])
Date: 2006-07-15 16:58:26


Hello all,

        I've been trying to set up a small test cluster with a dual Opteron head and Athlon nodes. My environment in both cases is Gentoo and the nodes boot off PXE using an image built and stored on the master node. I chroot into the node's environment using:

linux32 chroot ${ROOT} /bin/bash

To cross over the 64/32bit barrier. My user's home direcory is loop-mounted into that environment and NFS exported to the nodes. I build OpenMPI in the following way:

In the build folder of OpenMPI-1.1:
./configure --cache-file=config_`uname -m`.cache --enable-pretty-print-stacktrace --prefix=$HOME/openmpi_`uname -m`
make -j4 && make install

I perform this exact same command in the Opteron and chrooted environment for the Athlon machines. This then gives me the following folders in my $HOME:
/home/kyron/openmpi_i686
/home/kyron/openmpi_x86_64

But, for some reason, on the Athlon node (in their image on the server I should say) OpenMPI still doesn't seem to be built correctly since it crashes as follows:

kyron_at_node0 ~ $ mpirun -np 1 uptime
Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
Failing at addr:(nil)
[0] func:/home/kyron/openmpi_i686/lib/libopal.so.0 [0xb7f6258f]
[1] func:[0xffffe440]
[2] func:/home/kyron/openmpi_i686/lib/liborte.so.0(orte_init_stage1+0x1d7) [0xb7fa0227]
[3] func:/home/kyron/openmpi_i686/lib/liborte.so.0(orte_system_init+0x23) [0xb7fa3683]
[4] func:/home/kyron/openmpi_i686/lib/liborte.so.0(orte_init+0x5f) [0xb7f9ff7f]
[5] func:mpirun(orterun+0x255) [0x804a015]
[6] func:mpirun(main+0x22) [0x8049db6]
[7] func:/lib/tls/libc.so.6(__libc_start_main+0xdb) [0xb7de8f0b]
[8] func:mpirun [0x8049d11]
*** End of error message ***
Segmentation fault

The crash happens both in the chrooted env and on the nodes. I configured both systems to have Linux and POSIX threads, though I see openmpi is calling the POSIX version (a message on the mailling list had hinted on keeping the Linux threads around...I have to anyways since sone apps like Matlab extensions still depend on this...). The following is the output for the libc info.

kyron_at_headless ~ $ /lib/tls/libc.so.6
GNU C Library stable release version 2.3.6, by Roland McGrath et al.
Copyright (C) 2005 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 4.1.1 (Gentoo 4.1.1).
Compiled on a Linux 2.6.11 system on 2006-07-14.
Available extensions:
        GNU libio by Per Bothner
        crypt add-on version 2.1 by Michael Glad and others
        Native POSIX Threads Library by Ulrich Drepper et al
        The C stubs add-on version 2.1.2.
        GNU Libidn by Simon Josefsson
        BIND-8.2.3-T5B
        NIS(YP)/NIS+ NSS modules 0.19 by Thorsten Kukuk
Thread-local storage support included.
For bug reporting instructions, please see:
<http://www.gnu.org/software/libc/bugs.html>.

I am attaching the config.log and ompi_info for both platforms. Before sending this e-mail I tried compiling OpenMPI on one of the nodes (booted off the image) and I am getting the exact same problem (so chroot vs local build doesn't seem to be a factor). The attached file contains:

config.log.x86_64 <--config log for the Opteron build (works locally)
config.log_node0 <--config log for the Athlon build (on the node)
ompi_info.i686 <--ompi_info on the Athlon node
ompi_info.x86_64 <--ompi_info on the Opteron Master

Thanks,

-- 
Eric Thibodeau
Neural Bucket Solutions Inc.
T. (514) 736-1436
C. (514) 710-0517