I hope this is of any help (debugged with TotalView):
Enclose you will find a graph from TotalView as well as this:
Created process 2 (7633), named "mpirun"
Thread 2.1 has appeared
Thread 2.2 has appeared
Thread 2.1 received a signal (Segmentation Violation)
and the stack trace:
-[PlsXGridClient launchJob:], FP=bffff100
and this (bold crashed):
0x00257680: 0x805e0044 lwz rtoc,68(r30)
0x00257684: 0x38000001 li r0,1
0x00257688: 0x90020010 stw r0,16(rtoc)
0x0025768c: 0x805e0044 lwz rtoc,68(r30)
0x00257690: 0x38008000 li r0,-32768
from function _mca_pls_xgrid_set_node_name in mca_pls_xgrid.so
Unfortunately I'm not yet familiar with TotalView, so let me know if
you like to get more output (sorry: haven't found dbx for Mac OS X
-> that's why TotalView was used)
Date: Wed, 28 Jun 2006 10:35:03 -0400
From: "Terry D. Dontje" <Terry.Dontje@Sun.COM>
Subject: [OMPI users] Re : OpenMPI 1.1: Signal:10,
info.si_errno:0(Unknown, error: 0), si_code:1(BUS_ADRALN)
Content-Type: text/plain; format=flowed; charset=ISO-8859-1
Can you set your limit coredumpsize to non-zero rerun the program
and then get the stack via dbx?
So, I have a similar case of BUS_ADRALN on SPARC systems with an
older version (June 21st) of the trunk. I've since run using the latest
trunk and the
bus went away. I am now going to try this out with v1.1 to see if I get
results. Your stack would help me try and determine if this is an
or possibly some type of platform problem.
There is another thread with Eric Thibodeau that I am unsure if it is
the same issue
as either of our situation.
Subject: Re: [OMPI users] OpenMPI 1.1: Signal:10
info.si_errno:0(Unknown, error: 0), si_code:1(BUS_ADRALN) (Terry D.
Content-Type: text/plain; charset="iso-8859-1"
unfortunately I haven't got a stack trace.
OS: Mac OS X 10.4.7 Server on the Xgrid-server and Mac OS X 10.4.7
Client on every node (G4 and G5). For testing-purposes I've installed
OpenMPI 1.1 on a Dual-G4-node and on a Dual-G5-node with my Xgrid
consisting of only either the Dual-G4- or the Dual-G5-node. No matter
which configuration, I ran into the bus error.
Date: Wed, 28 Jun 2006 14:30:12 +0200