Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Adams, Samuel D Contr AFRL/HEDR (Samuel.Adams_at_[hidden])
Date: 2007-08-14 11:44:14


So I ran valgrind on my code and it came up with a few thousand memory
errors, but none of them had anything to do with the code I wrote. It
gave a few errors for the LDAP authentication stuff at the beginning,
but most of the error came from orte*. The only part that made
reference to my code was in the main file on line 13 where I include
mpi.h. This seems suspect to me to have so many "error" in well used
and test codes. Also the stack trace errors that I previously posted
showed errors in places in my code that have been stable and unchanged
for about a year.

It seems like maybe this is some kind of error with the system
configuration or something like that. It just seems too odd for these
memory faults to just appear like that.

Sam Adams
General Dynamics Information Technology
Phone: 210.536.5945

-----Original Message-----
From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On
Behalf Of Jeff Squyres
Sent: Monday, August 13, 2007 4:13 PM
To: Open MPI Users
Subject: Re: [OMPI users] segmentation faults

It *looks* like a run-of-the-mill memory-badness kind of error, but
it's impossible to say without more information.

Are you able to run this through valgrind or some other memory-
checking debugger? It looks like the single process case may be the
simplest to check...?

On Aug 13, 2007, at 5:03 PM, Adams, Samuel D Contr AFRL/HEDR wrote:

> I tried to run a code that I have running for a while now this
> morning,
> but for some reason it is causing segmentation faults. I can't really
> think of anything that I have done recently that would be causing
> these
> errors. Does anyone have any idea?
>
> I get this running it on more than one processor......
> [sam_at_prodnode1 all]$ mpirun -np 2 --prefix
> /usr/local/profiles/gcc-openmpi/ /home/sam/code/fdtd/fdtd_0.3/fdtd -t
> /home/sam/code/fdtd/fdtd_0.3/test_files/tissue.txt -r
> /home/sam/code/fdtd/fdtd_0.3/test_files/tester_x002y002z004.raw -v -f
> 3000 --pw 90,0,1,0 -l test_log.out -a 1
> [prodnode1:04400] *** Process received signal ***
> [prodnode1:04400] Signal: Segmentation fault (11)
> [prodnode1:04400] Signal code: Invalid permissions (2)
> [prodnode1:04400] Failing at address: 0x2aaaab000048
> [prodnode1:04399] *** Process received signal ***
> [prodnode1:04399] Signal: Segmentation fault (11)
> [prodnode1:04399] Signal code: Invalid permissions (2)
> [prodnode1:04399] Failing at address: 0x2aaaab0a0a48
> [prodnode1:04400] [ 0] /lib64/libpthread.so.0 [0x3aa840dd40]
> [prodnode1:04400] [ 1]
> /usr/local/profiles/gcc-openmpi/lib/libopen-pal.so.0(_int_malloc
> +0x2a5)
> [0x2aaaaafda345]
> [prodnode1:04400] [ 2]
> /usr/local/profiles/gcc-openmpi/lib/libopen-pal.so.0(calloc+0xaa)
> [0x2aaaaafdbd8a]
> [prodnode1:04400] [ 3]
> /home/sam/code/fdtd/fdtd_0.3/fdtd(parseTissues+0x23) [0x40c9d3]
> [prodnode1:04400] [ 4]
> /home/sam/code/fdtd/fdtd_0.3/fdtd(parseArgs+0x489) [0x404b09]
> [prodnode1:04400] [ 5] /home/sam/code/fdtd/fdtd_0.3/fdtd(main+0x41)
> [0x404eb1]
> [prodnode1:04400] [ 6] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x3aa781d8a4]
> [prodnode1:04400] [ 7] /home/sam/code/fdtd/fdtd_0.3/fdtd [0x4034b9]
> [prodnode1:04400] *** End of error message ***
> [prodnode1:04399] [ 0] /lib64/libpthread.so.0 [0x3aa840dd40]
> [prodnode1:04399] [ 1]
> /usr/local/profiles/gcc-openmpi/lib/libopen-pal.so.0(_int_malloc
> +0x2a5)
> [0x2aaaaafda345]
> [prodnode1:04399] [ 2]
> /usr/local/profiles/gcc-openmpi/lib/libopen-pal.so.0(calloc+0xaa)
> [0x2aaaaafdbd8a]
> [prodnode1:04399] [ 3]
> /home/sam/code/fdtd/fdtd_0.3/fdtd(parseTissues+0x23) [0x40c9d3]
> [prodnode1:04399] [ 4]
> /home/sam/code/fdtd/fdtd_0.3/fdtd(parseArgs+0x489) [0x404b09]
> [prodnode1:04399] [ 5] /home/sam/code/fdtd/fdtd_0.3/fdtd(main+0x41)
> [0x404eb1]
> [prodnode1:04399] [ 6] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x3aa781d8a4]
> [prodnode1:04399] [ 7] /home/sam/code/fdtd/fdtd_0.3/fdtd [0x4034b9]
> [prodnode1:04399] *** End of error message ***
> mpirun noticed that job rank 0 with PID 4399 on node
> prodnode1.brooks.af.mil exited on signal 11 (Segmentation fault).
> 1 additional process aborted (not shown)
>
> --Or I get this if I run it on just one processor.
> [sam_at_prodnode1 all]$ ./script2.sh [prodnode1:04405] *** Process
> received
> signal ***
> [prodnode1:04405] Signal: Segmentation fault (11)
> [prodnode1:04405] Signal code: Address not mapped (1)
> [prodnode1:04405] Failing at address: 0x18
> [prodnode1:04405] [ 0] /lib64/libpthread.so.0 [0x3aa840dd40]
> [prodnode1:04405] [ 1] /home/sam/code/fdtd/fdtd_0.3/fdtd(calcMass
> +0xac)
> [0x40443c]
> [prodnode1:04405] [ 2]
> /home/sam/code/fdtd/fdtd_0.3/fdtd(parseArgs+0x5a1) [0x404c21]
> [prodnode1:04405] [ 3] /home/sam/code/fdtd/fdtd_0.3/fdtd(main+0x41)
> [0x404eb1]
> [prodnode1:04405] [ 4] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x3aa781d8a4]
> [prodnode1:04405] [ 5] /home/sam/code/fdtd/fdtd_0.3/fdtd [0x4034b9]
> [prodnode1:04405] *** End of error message ***
> mpirun noticed that job rank 0 with PID 4405 on node
> prodnode1.brooks.af.mil exited on signal 11 (Segmentation fault).
> [sam_at_prodnode1 all]$
>
>
> Sam Adams
> General Dynamics Information Technology
> Phone: 210.536.5945
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems
_______________________________________________
users mailing list
users_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/users