Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] CMAQ crashes with OpenMPI
From: Barrett, Brian W (bwbarre_at_[hidden])
Date: 2011-08-09 17:27:02


The error message looks like it's no where near an MPI function; I would
guess that this is not an Open MPI problem but, particularly given your
statements about Snow Leopard) a CMAQ problem. The easiest way to debug
on OS X is to launch the application code in a debugger, something like:

  mpirun -np 2 xterm -e gdb <app>

One thing that can get people on OS X is that the maximum stack size is
extremely small compared to Linux. Fortran apps, in particular, can end
up putting things on the stack which cause an overrun and all kinds of fun.

Brian

On 8/9/11 3:18 PM, "Ralph Castain" <rhc_at_[hidden]> wrote:

>Also, please be aware that we haven't done any testing of OMPI on Lion,
>so this is truly new ground.
>
>On Aug 9, 2011, at 3:00 PM, Doug Reeder wrote:
>
>
>Matt,
>Are you sure you are building against your macports version of openmpi
>and not the one that ships w/ lion. In the trace back are items 4-9, that
>end w/ x86_64pg from the pgi compiler. You said you are using pgf90 and
>pgcc but in the configure input it looks like gcc is being used on lion.
>
>Doug Reeder
>On Aug 9, 2011, at 1:49 PM, Matthew Russell wrote:
>
>
>
>Hi,
>I'm trying to run CMAQ - an air quality model developed by the US EPA -
>on a Mac (Lion) using OpenMPI (1.5.3) installed with MacPorts.
>
>I am able to run CMAQ in parallel, and am able to run small programs that
>use OpenMPI.
>
>I set the OpenMPI environment variables to use pgf90/pgcc (10.9) as my
>compiler. Using PGI because some of the code I need to build is fortran
>77 ( *sigh* ), and for some other reasons.
>
>
>The error I get is:
>
>/opt/local/lib/openmpi/bin/mpirun -v -machinefile
>/Users/matt/cmaq/darwin11/scripts/cctm/machines8 -np 2
>/Users/matt/cmaq/darwin11/scripts/cctm/CCTM_e1a_Darwin11_x86_64pg
>[pontus:72547] *** Process received signal ***
>[pontus:72547] Signal: Segmentation fault: 11 (11)
>[pontus:72547] Signal code: Address not mapped (1)
>[pontus:72547] Failing at address: 0x0
>[pontus:72547] [ 0] 2 libsystem_c.dylib
>0x00007fff91065cfa _sigtramp + 26
>[pontus:72547] [ 1] 3 ???
>0x00007fff5fbe58ab 0x0 + 140734799698091
>[pontus:72547] [ 2] 4 CCTM_e1a_Darwin11_x86_64pg
>0x000000010003c89b distr_env_ + 971
>[pontus:72547] [ 3] 5 CCTM_e1a_Darwin11_x86_64pg
>0x000000010003cbe5 par_init_ + 565
>[pontus:72547] [ 4] 6 CCTM_e1a_Darwin11_x86_64pg
>0x0000000100032e1b MAIN_ + 219
>[pontus:72547] [ 5] 7 CCTM_e1a_Darwin11_x86_64pg
>0x00000001000016f6 main + 70
>[pontus:72547] [ 6] 8 CCTM_e1a_Darwin11_x86_64pg
>0x000000010000163a _start + 248
>[pontus:72547] [ 7] 9 CCTM_e1a_Darwin11_x86_64pg
>0x0000000100001541 start + 33
>[pontus:72547] [ 8] 10 ???
>0x0000000000000001 0x0 + 1
>[pontus:72547] *** End of error message ***
>--------------------------------------------------------------------------
>mpirun noticed that process rank 1 with PID 72547 on node
>pontus.cee.carleton.ca <http://pontus.cee.carleton.ca/> exited on signal
>11 (Segmentation fault: 11).
>--------------------------------------------------------------------------
>
>
>I don't expect anyone to know the solution from this brief error message,
>however I was wondering if anyone has insight on how I might debug this?
>I am too new to both OpenMPI and CMAQ to be served that well from this
>traceback.
>
>I'm told by others in my research group that CMAQ with OpenMPI on Linux
>works fine, and that the error I'm getting is very similar to the error
>others got when trying this on a Mac (Snow Leopard) with ifort.. before
>they gave up...
>
>OpenMPI was configured with:
>configure.args --sysconfdir=${prefix}/etc/${name} \
>
> --includedir=${prefix}/include/${name} \
> --bindir=${prefix}/lib/${name}/bin \
> --mandir=${prefix}/share/man \
> --with-memory-manager=none
>
># enable build on Lion
>if {$os.major} >= 11} {
> configure.compiler gcc-4.2
>}
>
>
>The --with-memory-manager is there because I saw it fix potentially
>similar problems in other postings to this Mailing list. It didn't make
>a difference though.
>
>Thanks!
>
>
>_______________________________________________
>users mailing list
>users_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
>
>
>_______________________________________________
>users mailing list
>users_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
>
>
>_______________________________________________
>users mailing list
>users_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
  Brian W. Barrett
  Dept. 1423: Scalable System Software
  Sandia National Laboratories