Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Jeff Squyres \(jsquyres\) (jsquyres_at_[hidden])
Date: 2006-06-28 08:56:36


Bummer! :-(
 
Just to be sure -- you had a clean config.cache file before you ran configure, right? (e.g., the file didn't exist -- just to be sure it didn't get potentially erroneous values from a previous run of configure) Also, FWIW, it's not necessary to specify --enable-ltdl-convenience; that should be automatic.
 
If you had a clean configure, we *suspect* that this might be due to alignment issues on Solaris 64 bit platforms, but thought that we might have had a pretty good handle on it in 1.1. Obviously we didn't solve everything. Bonk.
 
Did you get a corefile, perchance? If you could send a stack trace, that would be most helpful.

________________________________

        From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On Behalf Of Eric Thibodeau
        Sent: Tuesday, June 20, 2006 8:36 PM
        To: users_at_[hidden]
        Subject: Re: [OMPI users] Installing OpenMPI on a solaris
        
        

        Hello Brian (and all),

        

        Well, the joy was short lived. On a 12 CPU Enterprise machine and on a 4 CPU one, I seem to be able to start up to 4 processes. Above 4, I seem to inevitably get BUS_ADRALN (Bus collisions?). Below are some traces of the failling runs as well as a detailed (mpirun -d) of one of these situations and ompi_info output. Obviously, don't hesitate to ask if more information is requred.

        

        Buid version: openmpi-1.1b5r10421

        Config parameters:

        Open MPI config.status 1.1b5

        configured by ./configure, generated by GNU Autoconf 2.59,

        with options \"'--cache-file=config.cache' 'CFLAGS=-mcpu=v9' 'CXXFLAGS=-mcpu=v9' 'FFLAGS=-mcpu=v9' '--prefix=/export/lca/home/lca0/etudiants/ac38820/openmp

        i_sun4u' --enable-ltdl-convenience\"

        

        The traces:

        sshd_at_enterprise ~/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2 $ ~/openmpi_sun4u/bin/mpirun -np 10 mandelbrot-mpi 100 400 400

        Signal:10 info.si_errno:0(Error 0) si_code:1(BUS_ADRALN)

        Failing at addr:2f4f04

        *** End of error message ***

        sshd_at_enterprise ~/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2 $ ~/openmpi_sun4u/bin/mpirun -np 8 mandelbrot-mpi 100 400 400

        Signal:10 info.si_errno:0(Error 0) si_code:1(BUS_ADRALN)

        Failing at addr:2b354c

        *** End of error message ***

        sshd_at_enterprise ~/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2 $ ~/openmpi_sun4u/bin/mpirun -np 6 mandelbrot-mpi 100 400 400

        Signal:10 info.si_errno:0(Error 0) si_code:1(BUS_ADRALN)

        Failing at addr:2b1ecc

        *** End of error message ***

        sshd_at_enterprise ~/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2 $ ~/openmpi_sun4u/bin/mpirun -np 5 mandelbrot-mpi 100 400 400

        Signal:10 info.si_errno:0(Error 0) si_code:1(BUS_ADRALN)

        Failing at addr:2b12cc

        *** End of error message ***

        sshd_at_enterprise ~/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2 $ ~/openmpi_sun4u/bin/mpirun -np 4 mandelbrot-mpi 100 400 400

        maxiter = 100, width = 400, height = 400

        execution time in seconds = 1.48

        Taper q pour quitter le programme, autrement, on fait un refresh

        q

        sshd_at_enterprise ~/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2 $ ~/openmpi_sun4u/bin/mpirun -np 5 mandelbrot-mpi 100 400 400

        Signal:10 info.si_errno:0(Error 0) si_code:1(BUS_ADRALN)

        Failing at addr:2b12cc

        *** End of error message ***

        

        I also got the same behaviour on a different machine (with the exact same code base, $HOME is an NFS mount) and same hardware but limited to 4 CPUs. The following is a debug run of such the failling execution:

        

        sshd_at_enterprise ~/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2 $ ~/openmpi_sun4u/bin/mpirun -d -v -np 5 mandelbrot-mpi 100 400 400

        [enterprise:24786] [0,0,0] setting up session dir with

        [enterprise:24786] universe default-universe

        [enterprise:24786] user sshd

        [enterprise:24786] host enterprise

        [enterprise:24786] jobid 0

        [enterprise:24786] procid 0

        [enterprise:24786] procdir: /tmp/openmpi-sessions-sshd_at_enterprise_0/default-universe/0/0

        [enterprise:24786] jobdir: /tmp/openmpi-sessions-sshd_at_enterprise_0/default-universe/0

        [enterprise:24786] unidir: /tmp/openmpi-sessions-sshd_at_enterprise_0/default-universe

        [enterprise:24786] top: openmpi-sessions-sshd_at_enterprise_0

        [enterprise:24786] tmp: /tmp

        [enterprise:24786] [0,0,0] contact_file /tmp/openmpi-sessions-sshd_at_enterprise_0/default-universe/universe-setup.txt

        [enterprise:24786] [0,0,0] wrote setup file

        [enterprise:24786] pls:rsh: local csh: 0, local bash: 0

        [enterprise:24786] pls:rsh: assuming same remote shell as local shell

        [enterprise:24786] pls:rsh: remote csh: 0, remote bash: 0

        [enterprise:24786] pls:rsh: final template argv:

        [enterprise:24786] pls:rsh: /usr/local/bin/ssh <template> ( ! [ -e ./.profile ] || . ./.profile; orted --debug --bootproxy 1 --name <template> --num_procs 2 --vpid_start 0 --nodename <template> --universe sshd_at_enterprise:default-universe --nsreplica "0.0.0;tcp://10.45.117.37:40236" --gprreplica "0.0.0;tcp://10.45.117.37:40236" --mpi-call-yield 0 )

        [enterprise:24786] pls:rsh: launching on node localhost

        [enterprise:24786] pls:rsh: oversubscribed -- setting mpi_yield_when_idle to 1 (1 5)

        [enterprise:24786] pls:rsh: localhost is a LOCAL node

        [enterprise:24786] pls:rsh: reset PATH: /export/lca/home/lca0/etudiants/ac38820/openmpi_sun4u/bin:/bin:/usr/local/bin:/usr/bin:/usr/sbin:/usr/ccs/bin:/usr/dt/bin:/usr/local/lam-mpi/7.1.1/bin:/export/lca/appl/Forte/SUNWspro/WS6U2/bin:/opt/sfw/bin:/usr/bin:/usr/ucb:/etc:/usr/local/bin:.

        [enterprise:24786] pls:rsh: reset LD_LIBRARY_PATH: /export/lca/home/lca0/etudiants/ac38820/openmpi_sun4u/lib:/export/lca/appl/Forte/SUNWspro/WS6U2/lib:/usr/local/lib:/usr/local/lam-mpi/7.1.1/lib:/opt/sfw/lib

        [enterprise:24786] pls:rsh: changing to directory /export/lca/home/lca0/etudiants/ac38820

        [enterprise:24786] pls:rsh: executing: orted --debug --bootproxy 1 --name 0.0.1 --num_procs 2 --vpid_start 0 --nodename localhost --universe sshd_at_enterprise:default-universe --nsreplica "0.0.0;tcp://10.45.117.37:40236" --gprreplica "0.0.0;tcp://10.45.117.37:40236" --mpi-call-yield 1

        [enterprise:24787] [0,0,1] setting up session dir with

        [enterprise:24787] universe default-universe

        [enterprise:24787] user sshd

        [enterprise:24787] host localhost

        [enterprise:24787] jobid 0

        [enterprise:24787] procid 1

        [enterprise:24787] procdir: /tmp/openmpi-sessions-sshd_at_localhost_0/default-universe/0/1

        [enterprise:24787] jobdir: /tmp/openmpi-sessions-sshd_at_localhost_0/default-universe/0

        [enterprise:24787] unidir: /tmp/openmpi-sessions-sshd_at_localhost_0/default-universe

        [enterprise:24787] top: openmpi-sessions-sshd_at_localhost_0

        [enterprise:24787] tmp: /tmp

        [enterprise:24789] [0,1,0] setting up session dir with

        [enterprise:24789] universe default-universe

        [enterprise:24789] user sshd

        [enterprise:24789] host localhost

        [enterprise:24789] jobid 1

        [enterprise:24789] procid 0

        [enterprise:24789] procdir: /tmp/openmpi-sessions-sshd_at_localhost_0/default-universe/1/0

        [enterprise:24789] jobdir: /tmp/openmpi-sessions-sshd_at_localhost_0/default-universe/1

        [enterprise:24789] unidir: /tmp/openmpi-sessions-sshd_at_localhost_0/default-universe

        [enterprise:24789] top: openmpi-sessions-sshd_at_localhost_0

        [enterprise:24789] tmp: /tmp

        [enterprise:24791] [0,1,1] setting up session dir with

        [enterprise:24791] universe default-universe

        [enterprise:24791] user sshd

        [enterprise:24791] host localhost

        [enterprise:24791] jobid 1

        [enterprise:24791] procid 1

        [enterprise:24791] procdir: /tmp/openmpi-sessions-sshd_at_localhost_0/default-universe/1/1

        [enterprise:24791] jobdir: /tmp/openmpi-sessions-sshd_at_localhost_0/default-universe/1

        [enterprise:24791] unidir: /tmp/openmpi-sessions-sshd_at_localhost_0/default-universe

        [enterprise:24791] top: openmpi-sessions-sshd_at_localhost_0

        [enterprise:24791] tmp: /tmp

        [enterprise:24793] [0,1,2] setting up session dir with

        [enterprise:24793] universe default-universe

        [enterprise:24793] user sshd

        [enterprise:24793] host localhost

        [enterprise:24793] jobid 1

        [enterprise:24793] procid 2

        [enterprise:24793] procdir: /tmp/openmpi-sessions-sshd_at_localhost_0/default-universe/1/2

        [enterprise:24793] jobdir: /tmp/openmpi-sessions-sshd_at_localhost_0/default-universe/1

        [enterprise:24793] unidir: /tmp/openmpi-sessions-sshd_at_localhost_0/default-universe

        [enterprise:24793] top: openmpi-sessions-sshd_at_localhost_0

        [enterprise:24793] tmp: /tmp

        [enterprise:24795] [0,1,3] setting up session dir with

        [enterprise:24795] universe default-universe

        [enterprise:24795] user sshd

        [enterprise:24795] host localhost

        [enterprise:24795] jobid 1

        [enterprise:24795] procid 3

        [enterprise:24795] procdir: /tmp/openmpi-sessions-sshd_at_localhost_0/default-universe/1/3

        [enterprise:24795] jobdir: /tmp/openmpi-sessions-sshd_at_localhost_0/default-universe/1

        [enterprise:24795] unidir: /tmp/openmpi-sessions-sshd_at_localhost_0/default-universe

        [enterprise:24795] top: openmpi-sessions-sshd_at_localhost_0

        [enterprise:24795] tmp: /tmp

        [enterprise:24797] [0,1,4] setting up session dir with

        [enterprise:24797] universe default-universe

        [enterprise:24797] user sshd

        [enterprise:24797] host localhost

        [enterprise:24797] jobid 1

        [enterprise:24797] procid 4

        [enterprise:24797] procdir: /tmp/openmpi-sessions-sshd_at_localhost_0/default-universe/1/4

        [enterprise:24797] jobdir: /tmp/openmpi-sessions-sshd_at_localhost_0/default-universe/1

        [enterprise:24797] unidir: /tmp/openmpi-sessions-sshd_at_localhost_0/default-universe

        [enterprise:24797] top: openmpi-sessions-sshd_at_localhost_0

        [enterprise:24797] tmp: /tmp

        [enterprise:24786] spawn: in job_state_callback(jobid = 1, state = 0x4)

        [enterprise:24786] Info: Setting up debugger process table for applications

        MPIR_being_debugged = 0

        MPIR_debug_gate = 0

        MPIR_debug_state = 1

        MPIR_acquired_pre_main = 0

        MPIR_i_am_starter = 0

        MPIR_proctable_size = 5

        MPIR_proctable:

        (i, host, exe, pid) = (0, localhost, mandelbrot-mpi, 24789)

        (i, host, exe, pid) = (1, localhost, mandelbrot-mpi, 24791)

        (i, host, exe, pid) = (2, localhost, mandelbrot-mpi, 24793)

        (i, host, exe, pid) = (3, localhost, mandelbrot-mpi, 24795)

        (i, host, exe, pid) = (4, localhost, mandelbrot-mpi, 24797)

        [enterprise:24789] [0,1,0] ompi_mpi_init completed

        [enterprise:24791] [0,1,1] ompi_mpi_init completed

        [enterprise:24793] [0,1,2] ompi_mpi_init completed

        [enterprise:24795] [0,1,3] ompi_mpi_init completed

        [enterprise:24797] [0,1,4] ompi_mpi_init completed

        Signal:10 info.si_errno:0(Error 0) si_code:1(BUS_ADRALN)

        Failing at addr:2b12cc

        *** End of error message ***

        [enterprise:24787] sess_dir_finalize: found proc session dir empty - deleting

        [enterprise:24787] sess_dir_finalize: job session dir not empty - leaving

        [enterprise:24787] orted: job_state_callback(jobid = 1, state = ORTE_PROC_STATE_ABORTED)

        [enterprise:24787] sess_dir_finalize: found job session dir empty - deleting

        [enterprise:24787] sess_dir_finalize: univ session dir not empty - leaving

        --------------------------------------------------------------------------

        WARNING: A process refused to die!

        

        Host: enterprise

        PID: 24789

        

        This process may still be running and/or consuming resources.

        --------------------------------------------------------------------------

        --------------------------------------------------------------------------

        WARNING: A process refused to die!

        

        Host: enterprise

        PID: 24791

        

        This process may still be running and/or consuming resources.

        --------------------------------------------------------------------------

        --------------------------------------------------------------------------

        WARNING: A process refused to die!

        

        Host: enterprise

        PID: 24793

        

        This process may still be running and/or consuming resources.

        --------------------------------------------------------------------------

        --------------------------------------------------------------------------

        WARNING: A process refused to die!

        

        Host: enterprise

        PID: 24795

        

        This process may still be running and/or consuming resources.

        --------------------------------------------------------------------------

        --------------------------------------------------------------------------

        WARNING: A process refused to die!

        

        Host: enterprise

        PID: 24797

        

        This process may still be running and/or consuming resources.

        --------------------------------------------------------------------------

        --------------------------------------------------------------------------

        WARNING: A process refused to die!

        

        Host: enterprise

        PID: 24789

        

        This process may still be running and/or consuming resources.

        --------------------------------------------------------------------------

        --------------------------------------------------------------------------

        WARNING: A process refused to die!

        

        Host: enterprise

        PID: 24791

        

        This process may still be running and/or consuming resources.

        --------------------------------------------------------------------------

        --------------------------------------------------------------------------

        WARNING: A process refused to die!

        

        Host: enterprise

        PID: 24793

        

        This process may still be running and/or consuming resources.

        --------------------------------------------------------------------------

        --------------------------------------------------------------------------

        WARNING: A process refused to die!

        

        Host: enterprise

        PID: 24795

        

        This process may still be running and/or consuming resources.

        --------------------------------------------------------------------------

        --------------------------------------------------------------------------

        WARNING: A process refused to die!

        

        Host: enterprise

        PID: 24797

        

        This process may still be running and/or consuming resources.

        --------------------------------------------------------------------------

        [enterprise:24787] sess_dir_finalize: proc session dir not empty - leaving

        [enterprise:24787] sess_dir_finalize: proc session dir not empty - leaving

        [enterprise:24787] sess_dir_finalize: proc session dir not empty - leaving

        [enterprise:24787] sess_dir_finalize: proc session dir not empty - leaving

        [enterprise:24787] orted: job_state_callback(jobid = 1, state = ORTE_PROC_STATE_TERMINATED)

        [enterprise:24787] sess_dir_finalize: found proc session dir empty - deleting

        [enterprise:24787] sess_dir_finalize: found job session dir empty - deleting

        [enterprise:24787] sess_dir_finalize: found univ session dir empty - deleting

        [enterprise:24787] sess_dir_finalize: found top session dir empty - deleting

        

        ompi_info output:

        sshd_at_enterprise ~ $ ~/openmpi_sun4u/bin/ompi_info

        Open MPI: 1.1b5r10421

        Open MPI SVN revision: r10421

        Open RTE: 1.1b5r10421

        Open RTE SVN revision: r10421

        OPAL: 1.1b5r10421

        OPAL SVN revision: r10421

        Prefix: /export/lca/home/lca0/etudiants/ac38820/openmpi_sun4u

        Configured architecture: sparc-sun-solaris2.8

        Configured by: sshd

        Configured on: Tue Jun 20 15:25:44 EDT 2006

        Configure host: averoes

        Built by: ac38820

        Built on: Tue Jun 20 15:59:47 EDT 2006

        Built host: averoes

        C bindings: yes

        C++ bindings: yes

        Fortran77 bindings: yes (all)

        Fortran90 bindings: no

        Fortran90 bindings size: na

        C compiler: gcc

        C compiler absolute: /usr/local/bin/gcc

        C++ compiler: g++

        C++ compiler absolute: /usr/local/bin/g++

        Fortran77 compiler: g77

        Fortran77 compiler abs: /usr/local/bin/g77

        Fortran90 compiler: f90

        Fortran90 compiler abs: /export/lca/appl/Forte/SUNWspro/WS6U2/bin/f90

        C profiling: yes

        C++ profiling: yes

        Fortran77 profiling: yes

        Fortran90 profiling: no

        C++ exceptions: no

        Thread support: solaris (mpi: no, progress: no)

        Internal debug support: no

        MPI parameter check: runtime

        Memory profiling support: no

        Memory debugging support: no

        libltdl support: yes

        MCA paffinity: solaris (MCA v1.0, API v1.0, Component v1.1)

        MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.1)

        MCA timer: solaris (MCA v1.0, API v1.0, Component v1.1)

        MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)

        MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)

        MCA coll: basic (MCA v1.0, API v1.0, Component v1.1)

        MCA coll: hierarch (MCA v1.0, API v1.0, Component v1.1)

        MCA coll: self (MCA v1.0, API v1.0, Component v1.1)

        MCA coll: sm (MCA v1.0, API v1.0, Component v1.1)

        MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1)

        MCA io: romio (MCA v1.0, API v1.0, Component v1.1)

        MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1)

        MCA pml: dr (MCA v1.0, API v1.0, Component v1.1)

        MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1)

        MCA bml: r2 (MCA v1.0, API v1.0, Component v1.1)

        MCA rcache: rb (MCA v1.0, API v1.0, Component v1.1)

        MCA btl: self (MCA v1.0, API v1.0, Component v1.1)

        MCA btl: sm (MCA v1.0, API v1.0, Component v1.1)

        MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)

        MCA topo: unity (MCA v1.0, API v1.0, Component v1.1)

        MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.0)

        MCA gpr: null (MCA v1.0, API v1.0, Component v1.1)

        MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.1)

        MCA gpr: replica (MCA v1.0, API v1.0, Component v1.1)

        MCA iof: proxy (MCA v1.0, API v1.0, Component v1.1)

        MCA iof: svc (MCA v1.0, API v1.0, Component v1.1)

        MCA ns: proxy (MCA v1.0, API v1.0, Component v1.1)

        MCA ns: replica (MCA v1.0, API v1.0, Component v1.1)

        MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)

        MCA ras: dash_host (MCA v1.0, API v1.0, Component v1.1)

        MCA ras: hostfile (MCA v1.0, API v1.0, Component v1.1)

        MCA ras: localhost (MCA v1.0, API v1.0, Component v1.1)

        MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.1)

        MCA rds: resfile (MCA v1.0, API v1.0, Component v1.1)

        MCA rmaps: round_robin (MCA v1.0, API v1.0, Component v1.1)

        MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.1)

        MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.1)

        MCA rml: oob (MCA v1.0, API v1.0, Component v1.1)

        MCA pls: fork (MCA v1.0, API v1.0, Component v1.1)

        MCA pls: rsh (MCA v1.0, API v1.0, Component v1.1)

        MCA sds: env (MCA v1.0, API v1.0, Component v1.1)

        MCA sds: pipe (MCA v1.0, API v1.0, Component v1.1)

        MCA sds: seed (MCA v1.0, API v1.0, Component v1.1)

        MCA sds: singleton (MCA v1.0, API v1.0, Component v1.1)

        

        Le mardi 20 juin 2006 17:06, Eric Thibodeau a écrit :

> Thanks for the pointer, it WORKS!! (yay)

>

> Le mardi 20 juin 2006 12:21, Brian Barrett a écrit :

> > On Jun 19, 2006, at 12:15 PM, Eric Thibodeau wrote:

> >

> > > I checked the thread with the same title as this e-mail and tried

> > > compiling openmpi-1.1b4r10418 with:

> > >

> > > ./configure CFLAGS="-mv8plus" CXXFLAGS="-mv8plus" FFLAGS="-mv8plus"

> > > FCFLAGS="-mv8plus" --prefix=$HOME/openmpi-SUN-`uname -r` --enable-

> > > pretty-print-stacktrace

> > I put the incorrect flags in the error message - can you try again with:

> >

> >

> > ./configure CFLAGS=-mcpu=v9 CXXFLAGS=-mcpu=v9 FFLAGS=-mcpu=v9

> > FCFLAGS=-mcpu=v9 --prefix=$HOME/openmpi-SUN-`uname -r` --enable-

> > pretty-print-stacktrace

> >

> >

> > and see if that helps? By the way, I'm not sure if Solaris has the

> > required support for the pretty-print stack trace feature. It likely

> > will print what signal caused the error, but will not actually print

> > the stack trace. It's enabled by default on Solaris, with this

> > limited functionality (the option exists for platforms that have

> > broken half-support for GNU libc's stack trace feature, and for users

> > that don't like us registering a signal handler to do the work).

> >

> > Brian

> >

> >

>

        

        --

        Eric Thibodeau

        Neural Bucket Solutions Inc.

        T. (514) 736-1436

        C. (514) 710-0517