Matt,

Are you sure you are building against your macports version of openmpi and not the one that ships w/ lion. In the trace back are items 4-9, that end w/ x86_64pg from the pgi compiler. You said you are using pgf90 and pgcc but in the configure input it looks like gcc is being used on lion.

Doug Reeder
On Aug 9, 2011, at 1:49 PM, Matthew Russell wrote:


Hi,

I'm trying to run CMAQ - an air quality model developed by the US EPA - on a Mac (Lion) using OpenMPI (1.5.3) installed with MacPorts.

I am able to run CMAQ in parallel, and am able to run small programs that use OpenMPI.

I set the OpenMPI environment variables to use pgf90/pgcc (10.9) as my compiler.  Using PGI because some of the code I need to build is fortran 77 ( *sigh* ), and for some other reasons. 

The error I get is:

/opt/local/lib/openmpi/bin/mpirun -v -machinefile /Users/matt/cmaq/darwin11/scripts/cctm/machines8 -np 2 /Users/matt/cmaq/darwin11/scripts/cctm/CCTM_e1a_Darwin11_x86_64pg
[pontus:72547] *** Process received signal ***
[pontus:72547] Signal: Segmentation fault: 11 (11)
[pontus:72547] Signal code: Address not mapped (1)
[pontus:72547] Failing at address: 0x0
[pontus:72547] [ 0] 2   libsystem_c.dylib                   0x00007fff91065cfa _sigtramp + 26
[pontus:72547] [ 1] 3   ???                                 0x00007fff5fbe58ab 0x0 + 140734799698091
[pontus:72547] [ 2] 4   CCTM_e1a_Darwin11_x86_64pg          0x000000010003c89b distr_env_ + 971
[pontus:72547] [ 3] 5   CCTM_e1a_Darwin11_x86_64pg          0x000000010003cbe5 par_init_ + 565
[pontus:72547] [ 4] 6   CCTM_e1a_Darwin11_x86_64pg          0x0000000100032e1b MAIN_ + 219
[pontus:72547] [ 5] 7   CCTM_e1a_Darwin11_x86_64pg          0x00000001000016f6 main + 70
[pontus:72547] [ 6] 8   CCTM_e1a_Darwin11_x86_64pg          0x000000010000163a _start + 248
[pontus:72547] [ 7] 9   CCTM_e1a_Darwin11_x86_64pg          0x0000000100001541 start + 33
[pontus:72547] [ 8] 10  ???                                 0x0000000000000001 0x0 + 1
[pontus:72547] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 72547 on node pontus.cee.carleton.ca exited on signal 11 (Segmentation fault: 11).
--------------------------------------------------------------------------

I don't expect anyone to know the solution from this brief error message, however I was wondering if anyone has insight on how I might debug this?  I am too new to both OpenMPI and CMAQ to be served that well from this traceback.

I'm told by others in my research group that CMAQ with OpenMPI on Linux works fine, and that the error I'm getting is very similar to the error others got when trying this on a Mac (Snow Leopard) with ifort.. before they gave up...

OpenMPI was configured with:
configure.args  --sysconfdir=${prefix}/etc/${name} \
                --includedir=${prefix}/include/${name} \
                --bindir=${prefix}/lib/${name}/bin \
                --mandir=${prefix}/share/man \
                --with-memory-manager=none

# enable build on Lion
if {$os.major} >= 11} {
        configure.compiler       gcc-4.2
}

The --with-memory-manager is there because I saw it fix potentially similar problems in other postings to this Mailing list.  It didn't make a difference though.

Thanks!

_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users