Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] mpi error?
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-03-11 11:47:12


Debugging this is probably not going to be within the scope of Open MPI -- it looks like your app is seg faulting inside some routine called DoCharset. If you're getting corefiles, you might try loading them up in the debugger and see what is going wrong, etc. I.e., standard debugging rules apply here -- it's not necessarily special just because it's an MPI application.

Sorry!

On Mar 11, 2010, at 8:36 AM, Matthew MacManes wrote:

> I "unlimited" my stack space- got a different error, which maybe is a clue.. Im not sure how to vary the rank, like you suggested, so if you have a tip that would be great.
>
> Here is the new error:
> [macmanes:05298] *** Process received signal ***
> [macmanes:05298] Signal: Segmentation fault (11)
> [macmanes:05298] Signal code: Address not mapped (1)
> [macmanes:05298] Failing at address: 0x2ba2e9d9c00c
> [macmanes:05298] [ 0] /lib/libpthread.so.0 [0x2ba2b27ce190]
> [macmanes:05298] [ 1] mb(DoCharset+0x187) [0x41d9a7]
> [macmanes:05298] [ 2] mb(ParseCommand+0x2b2) [0x42edc2]
> [macmanes:05298] [ 3] mb(DoExecute+0x67f) [0x42f81f]
> [macmanes:05298] [ 4] mb(ParseCommand+0x2b2) [0x42edc2]
> [macmanes:05298] [ 5] mb(CommandLine+0x17e) [0x4137de]
> [macmanes:05298] [ 6] mb(main+0x82) [0x413ad2]
> [macmanes:05298] [ 7] /lib/libc.so.6(__libc_start_main+0xfd) [0x2ba2b29f9abd]
> [macmanes:05298] [ 8] mb [0x410949]
> [macmanes:05298] *** End of error message ***
> [macmanes:05299] *** Process received signal ***
> [macmanes:05299] Signal: Segmentation fault (11)
> [macmanes:05299] Signal code: Address not mapped (1)
> [macmanes:05299] Failing at address: 0x2b089e31600c
> [macmanes:05299] [ 0] /lib/libpthread.so.0 [0x2b0866d48190]
> [macmanes:05299] [ 1] mb(DoCharset+0x187) [0x41d9a7]
> [macmanes:05299] [ 2] mb(ParseCommand+0x2b2) [0x42edc2]
> [macmanes:05299] [ 3] mb(DoExecute+0x67f) [0x42f81f]
> [macmanes:05299] [ 4] mb(ParseCommand+0x2b2) [0x42edc2]
> [macmanes:05299] [ 5] mb(CommandLine+0x17e) [0x4137de]
> [macmanes:05299] [ 6] mb(main+0x82) [0x413ad2]
> [macmanes:05299] [ 7] /lib/libc.so.6(__libc_start_main+0xfd) [0x2b0866f73abd]
> [macmanes:05299] [ 8] mb [0x410949]
> [macmanes:05299] *** End of error message ***
> [macmanes:05300] *** Process received signal ***
> [macmanes:05300] Signal: Segmentation fault (11)
> [macmanes:05300] Signal code: Address not mapped (1)
> [macmanes:05300] Failing at address: 0x2b1fa264200c
> [macmanes:05300] [ 0] /lib/libpthread.so.0 [0x2b1f6b074190]
> [macmanes:05300] [ 1] mb(DoCharset+0x187) [0x41d9a7]
> [macmanes:05300] [ 2] mb(ParseCommand+0x2b2) [0x42edc2]
> [macmanes:05300] [ 3] mb(DoExecute+0x67f) [0x42f81f]
> [macmanes:05300] [ 4] mb(ParseCommand+0x2b2) [0x42edc2]
> [macmanes:05300] [ 5] mb(CommandLine+0x17e) [0x4137de]
> [macmanes:05300] [ 6] mb(main+0x82) [0x413ad2]
> [macmanes:05300] [ 7] /lib/libc.so.6(__libc_start_main+0xfd) [0x2b1f6b29fabd]
> [macmanes:05300] [ 8] mb [0x410949]
> [macmanes:05300] *** End of error message ***
> [macmanes:05301] *** Process received signal ***
> [macmanes:05301] Signal: Segmentation fault (11)
> [macmanes:05301] Signal code: Address not mapped (1)
> [macmanes:05301] Failing at address: 0x2b69f7c3300c
> [macmanes:05301] [ 0] /lib/libpthread.so.0 [0x2b69c0665190]
> [macmanes:05301] [ 1] mb(DoCharset+0x187) [0x41d9a7]
> [macmanes:05301] [ 2] mb(ParseCommand+0x2b2) [0x42edc2]
> [macmanes:05301] [ 3] mb(DoExecute+0x67f) [0x42f81f]
> [macmanes:05301] [ 4] mb(ParseCommand+0x2b2) [0x42edc2]
> [macmanes:05301] [ 5] mb(CommandLine+0x17e) [0x4137de]
> [macmanes:05301] [ 6] mb(main+0x82) [0x413ad2]
> [macmanes:05301] [ 7] /lib/libc.so.6(__libc_start_main+0xfd) [0x2b69c0890abd]
> [macmanes:05301] [ 8] mb [0x410949]
> [macmanes:05301] *** End of error message ***
> [macmanes:05302] *** Process received signal ***
> [macmanes:05302] Signal: Segmentation fault (11)
> [macmanes:05302] Signal code: Address not mapped (1)
> [macmanes:05302] Failing at address: 0x2b923066b00c
> [macmanes:05302] [ 0] /lib/libpthread.so.0 [0x2b91f909d190]
> [macmanes:05302] [ 1] mb(DoCharset+0x187) [0x41d9a7]
> [macmanes:05302] [ 2] mb(ParseCommand+0x2b2) [0x42edc2]
> [macmanes:05302] [ 3] mb(DoExecute+0x67f) [0x42f81f]
> [macmanes:05302] [ 4] mb(ParseCommand+0x2b2) [0x42edc2]
> [macmanes:05302] [ 5] mb(CommandLine+0x17e) [0x4137de]
> [macmanes:05302] [ 6] mb(main+0x82) [0x413ad2]
> [macmanes:05302] [ 7] /lib/libc.so.6(__libc_start_main+0xfd) [0x2b91f92c8abd]
> [macmanes:05302] [ 8] mb [0x410949]
> [macmanes:05302] *** End of error message ***
> [macmanes:05303] *** Process received signal ***
> [macmanes:05303] Signal: Segmentation fault (11)
> [macmanes:05303] Signal code: Address not mapped (1)
> [macmanes:05303] Failing at address: 0x2b36bc08c00c
> [macmanes:05303] [ 0] /lib/libpthread.so.0 [0x2b3684abe190]
> [macmanes:05303] [ 1] mb(DoCharset+0x187) [0x41d9a7]
> [macmanes:05303] [ 2] mb(ParseCommand+0x2b2) [0x42edc2]
> [macmanes:05303] [ 3] mb(DoExecute+0x67f) [0x42f81f]
> [macmanes:05303] [ 4] mb(ParseCommand+0x2b2) [0x42edc2]
> [macmanes:05303] [ 5] mb(CommandLine+0x17e) [0x4137de]
> [macmanes:05303] [ 6] mb(main+0x82) [0x413ad2]
> [macmanes:05303] [ 7] /lib/libc.so.6(__libc_start_main+0xfd) [0x2b3684ce9abd]
> [macmanes:05303] [ 8] mb [0x410949]
> [macmanes:05303] *** End of error message ***
> [macmanes:05304] *** Process received signal ***
> [macmanes:05304] Signal: Segmentation fault (11)
> [macmanes:05304] Signal code: Address not mapped (1)
> [macmanes:05304] Failing at address: 0x2ac048ece00c
> [macmanes:05304] [ 0] /lib/libpthread.so.0 [0x2ac011900190]
> [macmanes:05304] [ 1] mb(DoCharset+0x187) [0x41d9a7]
> [macmanes:05304] [ 2] mb(ParseCommand+0x2b2) [0x42edc2]
> [macmanes:05304] [ 3] mb(DoExecute+0x67f) [0x42f81f]
> [macmanes:05304] [ 4] mb(ParseCommand+0x2b2) [0x42edc2]
> [macmanes:05304] [ 5] mb(CommandLine+0x17e) [0x4137de]
> [macmanes:05304] [ 6] mb(main+0x82) [0x413ad2]
> [macmanes:05304] [ 7] /lib/libc.so.6(__libc_start_main+0xfd) [0x2ac011b2babd]
> [macmanes:05304] [ 8] mb [0x410949]
> [macmanes:05304] *** End of error message ***
> [macmanes:05305] *** Process received signal ***
> [macmanes:05305] Signal: Segmentation fault (11)
> [macmanes:05305] Signal code: Address not mapped (1)
> [macmanes:05305] Failing at address: 0x2ad1bd22900c
> [macmanes:05305] [ 0] /lib/libpthread.so.0 [0x2ad185c5b190]
> [macmanes:05305] [ 1] mb(DoCharset+0x187) [0x41d9a7]
> [macmanes:05305] [ 2] mb(ParseCommand+0x2b2) [0x42edc2]
> [macmanes:05305] [ 3] mb(DoExecute+0x67f) [0x42f81f]
> [macmanes:05305] [ 4] mb(ParseCommand+0x2b2) [0x42edc2]
> [macmanes:05305] [ 5] mb(CommandLine+0x17e) [0x4137de]
> [macmanes:05305] [ 6] mb(main+0x82) [0x413ad2]
> [macmanes:05305] [ 7] /lib/libc.so.6(__libc_start_main+0xfd) [0x2ad185e86abd]
> [macmanes:05305] [ 8] mb [0x410949]
> [macmanes:05305] *** End of error message ***
> [macmanes:05306] *** Process received signal ***
> [macmanes:05306] Signal: Segmentation fault (11)
> [macmanes:05306] Signal code: Address not mapped (1)
> [macmanes:05306] Failing at address: 0x2aff7d85000c
> [macmanes:05306] [ 0] /lib/libpthread.so.0 [0x2aff46282190]
> [macmanes:05306] [ 1] mb(DoCharset+0x187) [0x41d9a7]
> [macmanes:05306] [ 2] mb(ParseCommand+0x2b2) [0x42edc2]
> [macmanes:05306] [ 3] mb(DoExecute+0x67f) [0x42f81f]
> [macmanes:05306] [ 4] mb(ParseCommand+0x2b2) [0x42edc2]
> [macmanes:05306] [ 5] mb(CommandLine+0x17e) [0x4137de]
> [macmanes:05306] [ 6] mb(main+0x82) [0x413ad2]
> [macmanes:05306] [ 7] /lib/libc.so.6(__libc_start_main+0xfd) [0x2aff464adabd]
> [macmanes:05306] [ 8] mb [0x410949]
> [macmanes:05306] *** End of error message ***
> [macmanes:05307] *** Process received signal ***
> [macmanes:05307] Signal: Segmentation fault (11)
> [macmanes:05307] Signal code: Address not mapped (1)
> [macmanes:05307] Failing at address: 0x2b8b4104000c
> [macmanes:05307] [ 0] /lib/libpthread.so.0 [0x2b8b09a72190]
> [macmanes:05307] [ 1] mb(DoCharset+0x187) [0x41d9a7]
> [macmanes:05307] [ 2] mb(ParseCommand+0x2b2) [0x42edc2]
> [macmanes:05307] [ 3] mb(DoExecute+0x67f) [0x42f81f]
> [macmanes:05307] [ 4] mb(ParseCommand+0x2b2) [0x42edc2]
> [macmanes:05307] [ 5] mb(CommandLine+0x17e) [0x4137de]
> [macmanes:05307] [ 6] mb(main+0x82) [0x413ad2]
> [macmanes:05307] [ 7] /lib/libc.so.6(__libc_start_main+0xfd) [0x2b8b09c9dabd]
> [macmanes:05307] [ 8] mb [0x410949]
> [macmanes:05307] *** End of error message ***
> [macmanes:05308] *** Process received signal ***
> [macmanes:05308] Signal: Segmentation fault (11)
> [macmanes:05308] Signal code: Address not mapped (1)
> [macmanes:05308] Failing at address: 0x2ad33273400c
> [macmanes:05308] [ 0] /lib/libpthread.so.0 [0x2ad2fb166190]
> [macmanes:05308] [ 1] mb(DoCharset+0x187) [0x41d9a7]
> [macmanes:05308] [ 2] mb(ParseCommand+0x2b2) [0x42edc2]
> [macmanes:05308] [ 3] mb(DoExecute+0x67f) [0x42f81f]
> [macmanes:05308] [ 4] mb(ParseCommand+0x2b2) [0x42edc2]
> [macmanes:05308] [ 5] mb(CommandLine+0x17e) [0x4137de]
> [macmanes:05308] [ 6] mb(main+0x82) [0x413ad2]
> [macmanes:05308] [ 7] /lib/libc.so.6(__libc_start_main+0xfd) [0x2ad2fb391abd]
> [macmanes:05308] [ 8] mb [0x410949]
> [macmanes:05308] *** End of error message ***
> [macmanes:05309] *** Process received signal ***
> [macmanes:05309] Signal: Segmentation fault (11)
> [macmanes:05309] Signal code: Address not mapped (1)
> [macmanes:05309] Failing at address: 0x2b5e4da9100c
> [macmanes:05309] [ 0] /lib/libpthread.so.0 [0x2b5e164c3190]
> [macmanes:05309] [ 1] mb(DoCharset+0x187) [0x41d9a7]
> [macmanes:05309] [ 2] mb(ParseCommand+0x2b2) [0x42edc2]
> [macmanes:05309] [ 3] mb(DoExecute+0x67f) [0x42f81f]
> [macmanes:05309] [ 4] mb(ParseCommand+0x2b2) [0x42edc2]
> [macmanes:05309] [ 5] mb(CommandLine+0x17e) [0x4137de]
> [macmanes:05309] [ 6] mb(main+0x82) [0x413ad2]
> [macmanes:05309] [ 7] /lib/libc.so.6(__libc_start_main+0xfd) [0x2b5e166eeabd]
> [macmanes:05309] [ 8] mb [0x410949]
> [macmanes:05309] *** End of error message ***
> [macmanes:05310] *** Process received signal ***
> [macmanes:05310] Signal: Segmentation fault (11)
> [macmanes:05310] Signal code: Address not mapped (1)
> [macmanes:05310] Failing at address: 0x2b7b2a94300c
> [macmanes:05311] *** Process received signal ***
> [macmanes:05311] Signal: Segmentation fault (11)
> [macmanes:05311] Signal code: Address not mapped (1)
> [macmanes:05311] Failing at address: 0x2b9e2bf4b00c
> [macmanes:05311] [ 0] /lib/libpthread.so.0 [0x2b9df497d190]
> [macmanes:05311] [ 1] mb(DoCharset+0x187) [0x41d9a7]
> [macmanes:05311] [ 2] mb(ParseCommand+0x2b2) [0x42edc2]
> [macmanes:05311] [ 3] mb(DoExecute+0x67f) [0x42f81f]
> [macmanes:05311] [ 4] mb(ParseCommand+0x2b2) [0x42edc2]
> [macmanes:05311] [ 5] mb(CommandLine+0x17e) [0x4137de]
> [macmanes:05311] [ 6] mb(main+0x82) [0x413ad2]
> [macmanes:05311] [ 7] /lib/libc.so.6(__libc_start_main+0xfd) [0x2b9df4ba8abd]
> [macmanes:05311] [ 8] mb [0x410949]
> [macmanes:05311] *** End of error message ***
> [macmanes:05312] *** Process received signal ***
> [macmanes:05312] Signal: Segmentation fault (11)
> [macmanes:05312] Signal code: Address not mapped (1)
> [macmanes:05312] Failing at address: 0x2b756bf1b00c
> [macmanes:05312] [ 0] /lib/libpthread.so.0 [0x2b753494d190]
> [macmanes:05312] [ 1] mb(DoCharset+0x187) [0x41d9a7]
> [macmanes:05312] [ 2] mb(ParseCommand+0x2b2) [0x42edc2]
> [macmanes:05312] [ 3] mb(DoExecute+0x67f) [0x42f81f]
> [macmanes:05312] [ 4] mb(ParseCommand+0x2b2) [0x42edc2]
> [macmanes:05312] [ 5] mb(CommandLine+0x17e) [0x4137de]
> [macmanes:05312] [ 6] mb(main+0x82) [0x413ad2]
> [macmanes:05312] [ 7] /lib/libc.so.6(__libc_start_main+0xfd) [0x2b7534b78abd]
> [macmanes:05312] [ 8] mb [0x410949]
> [macmanes:05312] *** End of error message ***
> Defining charset called gene1000
> [macmanes:05310] [ 0] /lib/libpthread.so.0 [0x2b7af3375190]
> [macmanes:05310] [ 1] mb(DoCharset+0x187) [0x41d9a7]
> [macmanes:05310] [ 2] mb(ParseCommand+0x2b2) [0x42edc2]
> [macmanes:05310] [ 3] mb(DoExecute+0x67f) [0x42f81f]
> [macmanes:05310] [ 4] mb(ParseCommand+0x2b2) [0x42edc2]
> [macmanes:05310] [ 5] mb(CommandLine+0x17e) [0x4137de]
> [macmanes:05310] [ 6] mb(main+0x82) [0x413ad2]
> [macmanes:05310] [ 7] /lib/libc.so.6(__libc_start_main+0xfd) [0x2b7af35a0abd]
> [macmanes:05310] [ 8] mb [0x410949]
> [macmanes:05310] *** End of error message ***
> --------------------------------------------------------------------------
> mpirun noticed that process rank 9 with PID 5307 on node macmanes exited on signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
> 2 total processes killed (some possibly by mpirun during cleanup)
> macmanes_at_macmanes:~/mrbayes$
> _________________________________
> Matthew MacManes
> PhD Candidate
> University of California- Berkeley
> Museum of Vertebrate Zoology
> Phone: 510-495-5833
> Lab Website: http://ib.berkeley.edu/labs/lacey
> Personal Website: http://macmanes.com/
>
>
> On Thu, Mar 11, 2010 at 07:42, Peter Kjellstrom <cap_at_[hidden]> wrote:
> On Thursday 11 March 2010, Matthew MacManes wrote:
> > Can anybody tell me if this is an error associated with openmpi, versus an
> > issue with the program I am running (MRBAYES,
> > https://sourceforge.net/projects/mrbayes/)
> >
> > We are trying to run a large simulated dataset using 1,000,000 bases
> > divided up into 1000 genes, 5 taxa.. An error is occurring, but we are not
> > sure why. We are using the MPI version of MRBAYES v3.2-cvs on a linux
> > 16core 24GB RAM machine. It does not appear as if the program runs out of
> > memory (max memory usage is 13gb). Maybe this is an OpenMPI problem and
> > not related to MrBayes...
> >
> > See snippet of error message below. Can anybody give me any hints about the
> > source of the problem?
> >
> > I am using OPENMPI version 1.4.1.
> >
> > ...
> > Defining charset called gene997
> > Defining charset called gene998
> > Defining charset called gene999
> > Defining charset called gene1000
> > Defining partition called Genes
> > [macmanes:02546] *** Process received signal ***
> > [macmanes:02546] Signal: Segmentation fault (11)
> > [macmanes:02546] Signal code: Address not mapped (1)
> > [macmanes:02546] Failing at address: (nil)
> > [macmanes:02546] [ 0] /lib/libpthread.so.0 [0x7ffd0f322190]
> > [macmanes:02546] *** End of error message ***
> > --------------------------------------------------------------------------
> > mpirun noticed that process rank 13 with PID 2546 on node macmanes exited
> > on signal 11 (Segmentation fault).
>
> On of the ranks got a "Segmentation fault". This would typically indicate a
> problem with the app not the MPI. Maybe you ran out of stack space?
> (ulimit -s).
>
> Have you tried a different/lower number of ranks?
>
> /Peter
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/