Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: openmpi-user (openmpi-user_at_[hidden])
Date: 2006-06-29 15:39:41


I hope this is of any help (debugged with TotalView):

Enclose you will find a graph from TotalView as well as this:
/Created process 2 (7633), named "mpirun"
Thread 2.1 has appeared
Thread 2.2 has appeared
Thread 2.1 received a signal (Segmentation Violation)/

and the stack trace:
/ _mca_pls_xgrid_set_node_name, FP=bffff090
     -[PlsXGridClient launchJob:], FP=bffff100
     _orte_pls_xgrid_launch, FP=bffff240
     _orte_rmgr_urm_spawn, FP=bffff290
orterun, FP=bffff310
main, FP=bffff3b0
_start, FP=bffff400/

and this (bold crashed):
/ 0x00257680: 0x805e0044 lwz rtoc,68(r30)
     0x00257684: 0x38000001 li r0,1
     *0x00257688: 0x90020010 stw r0,16(rtoc)*
     0x0025768c: 0x805e0044 lwz rtoc,68(r30)
     0x00257690: 0x38008000 li r0,-32768/

from function /_mca_pls_xgrid_set_node_name/ in /

Unfortunately I'm not yet familiar with TotalView, so let me know if you
like to get more output (sorry: haven't found dbx for Mac OS X -> that's
why TotalView was used)


users-request_at_[hidden] wrote:


Message: 2
Date: Wed, 28 Jun 2006 10:35:03 -0400
From: "Terry D. Dontje" <Terry.Dontje_at_[hidden]>
Subject: [OMPI users] Re : OpenMPI 1.1: Signal:10,
        info.si_errno:0(Unknown, error: 0), si_code:1(BUS_ADRALN)
To: users_at_[hidden]
Message-ID: <44A29397.2000904_at_[hidden]>
Content-Type: text/plain; format=flowed; charset=ISO-8859-1


Can you set your limit coredumpsize to non-zero rerun the program
and then get the stack via dbx?

So, I have a similar case of BUS_ADRALN on SPARC systems with an
older version (June 21st) of the trunk. I've since run using the latest
trunk and the
bus went away. I am now going to try this out with v1.1 to see if I get
results. Your stack would help me try and determine if this is an
OpenMPI issue
or possibly some type of platform problem.

There is another thread with Eric Thibodeau that I am unsure if it is
the same issue
as either of our situation.


> >
> >Message: 3
> >Date: Wed, 28 Jun 2006 14:30:12 +0200
> >From: openmpi-user <openmpi-user_at_[hidden]>
> >Subject: Re: [OMPI users] OpenMPI 1.1: Signal:10
> > info.si_errno:0(Unknown, error: 0), si_code:1(BUS_ADRALN) (Terry D.
> > Dontje)
> >To: users_at_[hidden]
> >Message-ID: <44A27654.9060002_at_[hidden]>
> >Content-Type: text/plain; charset="iso-8859-1"
> >
> >Hi Terry,
> >
> >unfortunately I haven't got a stack trace.
> >
> >OS: Mac OS X 10.4.7 Server on the Xgrid-server and Mac OS X 10.4.7
> >Client on every node (G4 and G5). For testing-purposes I've installed
> >OpenMPI 1.1 on a Dual-G4-node and on a Dual-G5-node with my Xgrid
> >consisting of only either the Dual-G4- or the Dual-G5-node. No matter
> >which configuration, I ran into the bus error.
> >
> >Yours,
> >Frank
> >
> >
> >
> >