Jeff,

I'm afraid, I'm not familiar enough to dive into it. I suspect between the fact we have a working MPI
implementation (MPICH)  and the fact the this version of the pop model is superceded, it is probably
not worth the effort to spend a lot of time on it.

I was hoping that this was maybe a "typical" error that could be treated with different compiler switches
or that it mapped to a  known bug/incompatability in OpenMPI.

If this isn't the case it probably is best to drop it?

Thanks for your offer to help though!

Axel
Jeff Squyres wrote:
On Jan 22, 2007, at 11:53 AM, Axel Schweiger wrote:

  
Thanks for your reply. Yes POP 1.2 is dead w.r.t. development but our
application still uses it. The 1.2 to 2.0 transition
involves a lot of physical differences and for a while at least we are
stuck with 1.2.
    

Gotcha.

  
Can't say if there is a bug that was fixed since there was a lot of
re-engineering going to 2.0. . But I do know that POP 1.2 works
fine with the MPICH MPI implementation. Wouldn't you expect that a bad
parameters would produce the same error with MPICH?
    

Usually, but not always.  Mostly, this involves problems with C  
codes, but it can happen in Fortran as well.  Specifically, different  
run-time behaviors of MPI implementations can sometimes result in a  
code that runs under one MPI and not under another, typically (but  
not always) if the code makes some assumptions or violates the  
standard in some way.

I see in OMPI's MPI_CART_SHIFT, we only return the "bad communicator"  
error if we get an invalid communicator or an intercommunicator.  Are  
you familiar with the POP code at all to be able to dive into it to  
see where the problem is actually occurring?


  
Thanks much
Axel
Jeff Squyres wrote:
    
Looking at the web page for POP (http://climate.lanl.gov/Models/POP/
index.shtml), it looks like POP 1.2 is pretty ancient.  I gather from
your text that later versions work ok ("POP 2").

My first guess -- knowing nothing about the POP code itself -- is
that there is a bug in the POP 1.2 code such that it is passing a bad
parameter to MPI_CART_SHIFT, and that later versions (POP 2) fixed
the problem.

Do you know if this is the case?


On Jan 19, 2007, at 8:06 PM, Axel Schweiger wrote:


      
I am having a problem running pop 1.2 (Parallel Ocean Model) with
OpenMPI version 1.1.2  compiled with PGI 6.2-4  on RH EL-4 Update 4
(configure result attached)

The error is as follows:

mpirun -v -np 4 -machinefile node18.dat pop
[node18:11220] *** An error occurred in MPI_Cart_shift
[node18:11221] *** An error occurred in MPI_Cart_shift
[node18:11221] *** on communicator MPI_COMM_WORLD
[node18:11221] *** MPI_ERR_COMM: invalid communicator
[node18:11221] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node18:11220] *** on communicator MPI_COMM_WORLD
[node18:11220] *** MPI_ERR_COMM: invalid communicator
[node18:11220] *** MPI_ERRORS_ARE_FATAL (goodbye)
3 additional processes aborted (not shown)

The application runs fine with MPICH 1.2.6 and other applications
(POP 2) run fine with OpenMPI

Any suggestions

Thanks

<configure_pgi_ext.log.gz>
<axel.vcf>
_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

        

      
_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users