Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] mpi error?
From: Matthew MacManes (macmanes_at_[hidden])
Date: 2010-03-11 11:50:28


perfect.. that is exactly what I wanted to know.. that is was an issue with
the program- rather than an issue with openmpi..

Thanks, Jeff.
Matt
_________________________________
Matthew MacManes
PhD Candidate
University of California- Berkeley
Museum of Vertebrate Zoology
Phone: 510-495-5833
Lab Website: http://ib.berkeley.edu/labs/lacey
Personal Website: http://macmanes.com/

On Thu, Mar 11, 2010 at 08:47, Jeff Squyres <jsquyres_at_[hidden]> wrote:

> Debugging this is probably not going to be within the scope of Open MPI --
> it looks like your app is seg faulting inside some routine called DoCharset.
> If you're getting corefiles, you might try loading them up in the debugger
> and see what is going wrong, etc. I.e., standard debugging rules apply here
> -- it's not necessarily special just because it's an MPI application.
>
> Sorry!
>
>
> On Mar 11, 2010, at 8:36 AM, Matthew MacManes wrote:
>
> > I "unlimited" my stack space- got a different error, which maybe is a
> clue.. Im not sure how to vary the rank, like you suggested, so if you have
> a tip that would be great.
> >
> > Here is the new error:
> > [macmanes:05298] *** Process received signal ***
> > [macmanes:05298] Signal: Segmentation fault (11)
> > [macmanes:05298] Signal code: Address not mapped (1)
> > [macmanes:05298] Failing at address: 0x2ba2e9d9c00c
> > [macmanes:05298] [ 0] /lib/libpthread.so.0 [0x2ba2b27ce190]
> > [macmanes:05298] [ 1] mb(DoCharset+0x187) [0x41d9a7]
> > [macmanes:05298] [ 2] mb(ParseCommand+0x2b2) [0x42edc2]
> > [macmanes:05298] [ 3] mb(DoExecute+0x67f) [0x42f81f]
> > [macmanes:05298] [ 4] mb(ParseCommand+0x2b2) [0x42edc2]
> > [macmanes:05298] [ 5] mb(CommandLine+0x17e) [0x4137de]
> > [macmanes:05298] [ 6] mb(main+0x82) [0x413ad2]
> > [macmanes:05298] [ 7] /lib/libc.so.6(__libc_start_main+0xfd)
> [0x2ba2b29f9abd]
> > [macmanes:05298] [ 8] mb [0x410949]
> > [macmanes:05298] *** End of error message ***
> > [macmanes:05299] *** Process received signal ***
> > [macmanes:05299] Signal: Segmentation fault (11)
> > [macmanes:05299] Signal code: Address not mapped (1)
> > [macmanes:05299] Failing at address: 0x2b089e31600c
> > [macmanes:05299] [ 0] /lib/libpthread.so.0 [0x2b0866d48190]
> > [macmanes:05299] [ 1] mb(DoCharset+0x187) [0x41d9a7]
> > [macmanes:05299] [ 2] mb(ParseCommand+0x2b2) [0x42edc2]
> > [macmanes:05299] [ 3] mb(DoExecute+0x67f) [0x42f81f]
> > [macmanes:05299] [ 4] mb(ParseCommand+0x2b2) [0x42edc2]
> > [macmanes:05299] [ 5] mb(CommandLine+0x17e) [0x4137de]
> > [macmanes:05299] [ 6] mb(main+0x82) [0x413ad2]
> > [macmanes:05299] [ 7] /lib/libc.so.6(__libc_start_main+0xfd)
> [0x2b0866f73abd]
> > [macmanes:05299] [ 8] mb [0x410949]
> > [macmanes:05299] *** End of error message ***
> > [macmanes:05300] *** Process received signal ***
> > [macmanes:05300] Signal: Segmentation fault (11)
> > [macmanes:05300] Signal code: Address not mapped (1)
> > [macmanes:05300] Failing at address: 0x2b1fa264200c
> > [macmanes:05300] [ 0] /lib/libpthread.so.0 [0x2b1f6b074190]
> > [macmanes:05300] [ 1] mb(DoCharset+0x187) [0x41d9a7]
> > [macmanes:05300] [ 2] mb(ParseCommand+0x2b2) [0x42edc2]
> > [macmanes:05300] [ 3] mb(DoExecute+0x67f) [0x42f81f]
> > [macmanes:05300] [ 4] mb(ParseCommand+0x2b2) [0x42edc2]
> > [macmanes:05300] [ 5] mb(CommandLine+0x17e) [0x4137de]
> > [macmanes:05300] [ 6] mb(main+0x82) [0x413ad2]
> > [macmanes:05300] [ 7] /lib/libc.so.6(__libc_start_main+0xfd)
> [0x2b1f6b29fabd]
> > [macmanes:05300] [ 8] mb [0x410949]
> > [macmanes:05300] *** End of error message ***
> > [macmanes:05301] *** Process received signal ***
> > [macmanes:05301] Signal: Segmentation fault (11)
> > [macmanes:05301] Signal code: Address not mapped (1)
> > [macmanes:05301] Failing at address: 0x2b69f7c3300c
> > [macmanes:05301] [ 0] /lib/libpthread.so.0 [0x2b69c0665190]
> > [macmanes:05301] [ 1] mb(DoCharset+0x187) [0x41d9a7]
> > [macmanes:05301] [ 2] mb(ParseCommand+0x2b2) [0x42edc2]
> > [macmanes:05301] [ 3] mb(DoExecute+0x67f) [0x42f81f]
> > [macmanes:05301] [ 4] mb(ParseCommand+0x2b2) [0x42edc2]
> > [macmanes:05301] [ 5] mb(CommandLine+0x17e) [0x4137de]
> > [macmanes:05301] [ 6] mb(main+0x82) [0x413ad2]
> > [macmanes:05301] [ 7] /lib/libc.so.6(__libc_start_main+0xfd)
> [0x2b69c0890abd]
> > [macmanes:05301] [ 8] mb [0x410949]
> > [macmanes:05301] *** End of error message ***
> > [macmanes:05302] *** Process received signal ***
> > [macmanes:05302] Signal: Segmentation fault (11)
> > [macmanes:05302] Signal code: Address not mapped (1)
> > [macmanes:05302] Failing at address: 0x2b923066b00c
> > [macmanes:05302] [ 0] /lib/libpthread.so.0 [0x2b91f909d190]
> > [macmanes:05302] [ 1] mb(DoCharset+0x187) [0x41d9a7]
> > [macmanes:05302] [ 2] mb(ParseCommand+0x2b2) [0x42edc2]
> > [macmanes:05302] [ 3] mb(DoExecute+0x67f) [0x42f81f]
> > [macmanes:05302] [ 4] mb(ParseCommand+0x2b2) [0x42edc2]
> > [macmanes:05302] [ 5] mb(CommandLine+0x17e) [0x4137de]
> > [macmanes:05302] [ 6] mb(main+0x82) [0x413ad2]
> > [macmanes:05302] [ 7] /lib/libc.so.6(__libc_start_main+0xfd)
> [0x2b91f92c8abd]
> > [macmanes:05302] [ 8] mb [0x410949]
> > [macmanes:05302] *** End of error message ***
> > [macmanes:05303] *** Process received signal ***
> > [macmanes:05303] Signal: Segmentation fault (11)
> > [macmanes:05303] Signal code: Address not mapped (1)
> > [macmanes:05303] Failing at address: 0x2b36bc08c00c
> > [macmanes:05303] [ 0] /lib/libpthread.so.0 [0x2b3684abe190]
> > [macmanes:05303] [ 1] mb(DoCharset+0x187) [0x41d9a7]
> > [macmanes:05303] [ 2] mb(ParseCommand+0x2b2) [0x42edc2]
> > [macmanes:05303] [ 3] mb(DoExecute+0x67f) [0x42f81f]
> > [macmanes:05303] [ 4] mb(ParseCommand+0x2b2) [0x42edc2]
> > [macmanes:05303] [ 5] mb(CommandLine+0x17e) [0x4137de]
> > [macmanes:05303] [ 6] mb(main+0x82) [0x413ad2]
> > [macmanes:05303] [ 7] /lib/libc.so.6(__libc_start_main+0xfd)
> [0x2b3684ce9abd]
> > [macmanes:05303] [ 8] mb [0x410949]
> > [macmanes:05303] *** End of error message ***
> > [macmanes:05304] *** Process received signal ***
> > [macmanes:05304] Signal: Segmentation fault (11)
> > [macmanes:05304] Signal code: Address not mapped (1)
> > [macmanes:05304] Failing at address: 0x2ac048ece00c
> > [macmanes:05304] [ 0] /lib/libpthread.so.0 [0x2ac011900190]
> > [macmanes:05304] [ 1] mb(DoCharset+0x187) [0x41d9a7]
> > [macmanes:05304] [ 2] mb(ParseCommand+0x2b2) [0x42edc2]
> > [macmanes:05304] [ 3] mb(DoExecute+0x67f) [0x42f81f]
> > [macmanes:05304] [ 4] mb(ParseCommand+0x2b2) [0x42edc2]
> > [macmanes:05304] [ 5] mb(CommandLine+0x17e) [0x4137de]
> > [macmanes:05304] [ 6] mb(main+0x82) [0x413ad2]
> > [macmanes:05304] [ 7] /lib/libc.so.6(__libc_start_main+0xfd)
> [0x2ac011b2babd]
> > [macmanes:05304] [ 8] mb [0x410949]
> > [macmanes:05304] *** End of error message ***
> > [macmanes:05305] *** Process received signal ***
> > [macmanes:05305] Signal: Segmentation fault (11)
> > [macmanes:05305] Signal code: Address not mapped (1)
> > [macmanes:05305] Failing at address: 0x2ad1bd22900c
> > [macmanes:05305] [ 0] /lib/libpthread.so.0 [0x2ad185c5b190]
> > [macmanes:05305] [ 1] mb(DoCharset+0x187) [0x41d9a7]
> > [macmanes:05305] [ 2] mb(ParseCommand+0x2b2) [0x42edc2]
> > [macmanes:05305] [ 3] mb(DoExecute+0x67f) [0x42f81f]
> > [macmanes:05305] [ 4] mb(ParseCommand+0x2b2) [0x42edc2]
> > [macmanes:05305] [ 5] mb(CommandLine+0x17e) [0x4137de]
> > [macmanes:05305] [ 6] mb(main+0x82) [0x413ad2]
> > [macmanes:05305] [ 7] /lib/libc.so.6(__libc_start_main+0xfd)
> [0x2ad185e86abd]
> > [macmanes:05305] [ 8] mb [0x410949]
> > [macmanes:05305] *** End of error message ***
> > [macmanes:05306] *** Process received signal ***
> > [macmanes:05306] Signal: Segmentation fault (11)
> > [macmanes:05306] Signal code: Address not mapped (1)
> > [macmanes:05306] Failing at address: 0x2aff7d85000c
> > [macmanes:05306] [ 0] /lib/libpthread.so.0 [0x2aff46282190]
> > [macmanes:05306] [ 1] mb(DoCharset+0x187) [0x41d9a7]
> > [macmanes:05306] [ 2] mb(ParseCommand+0x2b2) [0x42edc2]
> > [macmanes:05306] [ 3] mb(DoExecute+0x67f) [0x42f81f]
> > [macmanes:05306] [ 4] mb(ParseCommand+0x2b2) [0x42edc2]
> > [macmanes:05306] [ 5] mb(CommandLine+0x17e) [0x4137de]
> > [macmanes:05306] [ 6] mb(main+0x82) [0x413ad2]
> > [macmanes:05306] [ 7] /lib/libc.so.6(__libc_start_main+0xfd)
> [0x2aff464adabd]
> > [macmanes:05306] [ 8] mb [0x410949]
> > [macmanes:05306] *** End of error message ***
> > [macmanes:05307] *** Process received signal ***
> > [macmanes:05307] Signal: Segmentation fault (11)
> > [macmanes:05307] Signal code: Address not mapped (1)
> > [macmanes:05307] Failing at address: 0x2b8b4104000c
> > [macmanes:05307] [ 0] /lib/libpthread.so.0 [0x2b8b09a72190]
> > [macmanes:05307] [ 1] mb(DoCharset+0x187) [0x41d9a7]
> > [macmanes:05307] [ 2] mb(ParseCommand+0x2b2) [0x42edc2]
> > [macmanes:05307] [ 3] mb(DoExecute+0x67f) [0x42f81f]
> > [macmanes:05307] [ 4] mb(ParseCommand+0x2b2) [0x42edc2]
> > [macmanes:05307] [ 5] mb(CommandLine+0x17e) [0x4137de]
> > [macmanes:05307] [ 6] mb(main+0x82) [0x413ad2]
> > [macmanes:05307] [ 7] /lib/libc.so.6(__libc_start_main+0xfd)
> [0x2b8b09c9dabd]
> > [macmanes:05307] [ 8] mb [0x410949]
> > [macmanes:05307] *** End of error message ***
> > [macmanes:05308] *** Process received signal ***
> > [macmanes:05308] Signal: Segmentation fault (11)
> > [macmanes:05308] Signal code: Address not mapped (1)
> > [macmanes:05308] Failing at address: 0x2ad33273400c
> > [macmanes:05308] [ 0] /lib/libpthread.so.0 [0x2ad2fb166190]
> > [macmanes:05308] [ 1] mb(DoCharset+0x187) [0x41d9a7]
> > [macmanes:05308] [ 2] mb(ParseCommand+0x2b2) [0x42edc2]
> > [macmanes:05308] [ 3] mb(DoExecute+0x67f) [0x42f81f]
> > [macmanes:05308] [ 4] mb(ParseCommand+0x2b2) [0x42edc2]
> > [macmanes:05308] [ 5] mb(CommandLine+0x17e) [0x4137de]
> > [macmanes:05308] [ 6] mb(main+0x82) [0x413ad2]
> > [macmanes:05308] [ 7] /lib/libc.so.6(__libc_start_main+0xfd)
> [0x2ad2fb391abd]
> > [macmanes:05308] [ 8] mb [0x410949]
> > [macmanes:05308] *** End of error message ***
> > [macmanes:05309] *** Process received signal ***
> > [macmanes:05309] Signal: Segmentation fault (11)
> > [macmanes:05309] Signal code: Address not mapped (1)
> > [macmanes:05309] Failing at address: 0x2b5e4da9100c
> > [macmanes:05309] [ 0] /lib/libpthread.so.0 [0x2b5e164c3190]
> > [macmanes:05309] [ 1] mb(DoCharset+0x187) [0x41d9a7]
> > [macmanes:05309] [ 2] mb(ParseCommand+0x2b2) [0x42edc2]
> > [macmanes:05309] [ 3] mb(DoExecute+0x67f) [0x42f81f]
> > [macmanes:05309] [ 4] mb(ParseCommand+0x2b2) [0x42edc2]
> > [macmanes:05309] [ 5] mb(CommandLine+0x17e) [0x4137de]
> > [macmanes:05309] [ 6] mb(main+0x82) [0x413ad2]
> > [macmanes:05309] [ 7] /lib/libc.so.6(__libc_start_main+0xfd)
> [0x2b5e166eeabd]
> > [macmanes:05309] [ 8] mb [0x410949]
> > [macmanes:05309] *** End of error message ***
> > [macmanes:05310] *** Process received signal ***
> > [macmanes:05310] Signal: Segmentation fault (11)
> > [macmanes:05310] Signal code: Address not mapped (1)
> > [macmanes:05310] Failing at address: 0x2b7b2a94300c
> > [macmanes:05311] *** Process received signal ***
> > [macmanes:05311] Signal: Segmentation fault (11)
> > [macmanes:05311] Signal code: Address not mapped (1)
> > [macmanes:05311] Failing at address: 0x2b9e2bf4b00c
> > [macmanes:05311] [ 0] /lib/libpthread.so.0 [0x2b9df497d190]
> > [macmanes:05311] [ 1] mb(DoCharset+0x187) [0x41d9a7]
> > [macmanes:05311] [ 2] mb(ParseCommand+0x2b2) [0x42edc2]
> > [macmanes:05311] [ 3] mb(DoExecute+0x67f) [0x42f81f]
> > [macmanes:05311] [ 4] mb(ParseCommand+0x2b2) [0x42edc2]
> > [macmanes:05311] [ 5] mb(CommandLine+0x17e) [0x4137de]
> > [macmanes:05311] [ 6] mb(main+0x82) [0x413ad2]
> > [macmanes:05311] [ 7] /lib/libc.so.6(__libc_start_main+0xfd)
> [0x2b9df4ba8abd]
> > [macmanes:05311] [ 8] mb [0x410949]
> > [macmanes:05311] *** End of error message ***
> > [macmanes:05312] *** Process received signal ***
> > [macmanes:05312] Signal: Segmentation fault (11)
> > [macmanes:05312] Signal code: Address not mapped (1)
> > [macmanes:05312] Failing at address: 0x2b756bf1b00c
> > [macmanes:05312] [ 0] /lib/libpthread.so.0 [0x2b753494d190]
> > [macmanes:05312] [ 1] mb(DoCharset+0x187) [0x41d9a7]
> > [macmanes:05312] [ 2] mb(ParseCommand+0x2b2) [0x42edc2]
> > [macmanes:05312] [ 3] mb(DoExecute+0x67f) [0x42f81f]
> > [macmanes:05312] [ 4] mb(ParseCommand+0x2b2) [0x42edc2]
> > [macmanes:05312] [ 5] mb(CommandLine+0x17e) [0x4137de]
> > [macmanes:05312] [ 6] mb(main+0x82) [0x413ad2]
> > [macmanes:05312] [ 7] /lib/libc.so.6(__libc_start_main+0xfd)
> [0x2b7534b78abd]
> > [macmanes:05312] [ 8] mb [0x410949]
> > [macmanes:05312] *** End of error message ***
> > Defining charset called gene1000
> > [macmanes:05310] [ 0] /lib/libpthread.so.0 [0x2b7af3375190]
> > [macmanes:05310] [ 1] mb(DoCharset+0x187) [0x41d9a7]
> > [macmanes:05310] [ 2] mb(ParseCommand+0x2b2) [0x42edc2]
> > [macmanes:05310] [ 3] mb(DoExecute+0x67f) [0x42f81f]
> > [macmanes:05310] [ 4] mb(ParseCommand+0x2b2) [0x42edc2]
> > [macmanes:05310] [ 5] mb(CommandLine+0x17e) [0x4137de]
> > [macmanes:05310] [ 6] mb(main+0x82) [0x413ad2]
> > [macmanes:05310] [ 7] /lib/libc.so.6(__libc_start_main+0xfd)
> [0x2b7af35a0abd]
> > [macmanes:05310] [ 8] mb [0x410949]
> > [macmanes:05310] *** End of error message ***
> >
> --------------------------------------------------------------------------
> > mpirun noticed that process rank 9 with PID 5307 on node macmanes exited
> on signal 11 (Segmentation fault).
> >
> --------------------------------------------------------------------------
> > 2 total processes killed (some possibly by mpirun during cleanup)
> > macmanes_at_macmanes:~/mrbayes$
> > _________________________________
> > Matthew MacManes
> > PhD Candidate
> > University of California- Berkeley
> > Museum of Vertebrate Zoology
> > Phone: 510-495-5833
> > Lab Website: http://ib.berkeley.edu/labs/lacey
> > Personal Website: http://macmanes.com/
> >
> >
> > On Thu, Mar 11, 2010 at 07:42, Peter Kjellstrom <cap_at_[hidden]> wrote:
> > On Thursday 11 March 2010, Matthew MacManes wrote:
> > > Can anybody tell me if this is an error associated with openmpi, versus
> an
> > > issue with the program I am running (MRBAYES,
> > > https://sourceforge.net/projects/mrbayes/)
> > >
> > > We are trying to run a large simulated dataset using 1,000,000 bases
> > > divided up into 1000 genes, 5 taxa.. An error is occurring, but we are
> not
> > > sure why. We are using the MPI version of MRBAYES v3.2-cvs on a linux
> > > 16core 24GB RAM machine. It does not appear as if the program runs out
> of
> > > memory (max memory usage is 13gb). Maybe this is an OpenMPI problem
> and
> > > not related to MrBayes...
> > >
> > > See snippet of error message below. Can anybody give me any hints about
> the
> > > source of the problem?
> > >
> > > I am using OPENMPI version 1.4.1.
> > >
> > > ...
> > > Defining charset called gene997
> > > Defining charset called gene998
> > > Defining charset called gene999
> > > Defining charset called gene1000
> > > Defining partition called Genes
> > > [macmanes:02546] *** Process received signal ***
> > > [macmanes:02546] Signal: Segmentation fault (11)
> > > [macmanes:02546] Signal code: Address not mapped (1)
> > > [macmanes:02546] Failing at address: (nil)
> > > [macmanes:02546] [ 0] /lib/libpthread.so.0 [0x7ffd0f322190]
> > > [macmanes:02546] *** End of error message ***
> > >
> --------------------------------------------------------------------------
> > > mpirun noticed that process rank 13 with PID 2546 on node macmanes
> exited
> > > on signal 11 (Segmentation fault).
> >
> > On of the ranks got a "Segmentation fault". This would typically indicate
> a
> > problem with the app not the MPI. Maybe you ran out of stack space?
> > (ulimit -s).
> >
> > Have you tried a different/lower number of ranks?
> >
> > /Peter
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>