Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Axel Schweiger (axel_at_[hidden])
Date: 2007-01-22 14:59:06


Jeff,

I'm afraid, I'm not familiar enough to dive into it. I suspect between
the fact we have a working MPI
implementation (MPICH) and the fact the this version of the pop model
is superceded, it is probably
not worth the effort to spend a lot of time on it.

I was hoping that this was maybe a "typical" error that could be treated
with different compiler switches
or that it mapped to a known bug/incompatability in OpenMPI.

If this isn't the case it probably is best to drop it?

Thanks for your offer to help though!

Axel
Jeff Squyres wrote:
> On Jan 22, 2007, at 11:53 AM, Axel Schweiger wrote:
>
>
>> Thanks for your reply. Yes POP 1.2 is dead w.r.t. development but our
>> application still uses it. The 1.2 to 2.0 transition
>> involves a lot of physical differences and for a while at least we are
>> stuck with 1.2.
>>
>
> Gotcha.
>
>
>> Can't say if there is a bug that was fixed since there was a lot of
>> re-engineering going to 2.0. . But I do know that POP 1.2 works
>> fine with the MPICH MPI implementation. Wouldn't you expect that a bad
>> parameters would produce the same error with MPICH?
>>
>
> Usually, but not always. Mostly, this involves problems with C
> codes, but it can happen in Fortran as well. Specifically, different
> run-time behaviors of MPI implementations can sometimes result in a
> code that runs under one MPI and not under another, typically (but
> not always) if the code makes some assumptions or violates the
> standard in some way.
>
> I see in OMPI's MPI_CART_SHIFT, we only return the "bad communicator"
> error if we get an invalid communicator or an intercommunicator. Are
> you familiar with the POP code at all to be able to dive into it to
> see where the problem is actually occurring?
>
>
>
>> Thanks much
>> Axel
>> Jeff Squyres wrote:
>>
>>> Looking at the web page for POP (http://climate.lanl.gov/Models/POP/
>>> index.shtml), it looks like POP 1.2 is pretty ancient. I gather from
>>> your text that later versions work ok ("POP 2").
>>>
>>> My first guess -- knowing nothing about the POP code itself -- is
>>> that there is a bug in the POP 1.2 code such that it is passing a bad
>>> parameter to MPI_CART_SHIFT, and that later versions (POP 2) fixed
>>> the problem.
>>>
>>> Do you know if this is the case?
>>>
>>>
>>> On Jan 19, 2007, at 8:06 PM, Axel Schweiger wrote:
>>>
>>>
>>>
>>>> I am having a problem running pop 1.2 (Parallel Ocean Model) with
>>>> OpenMPI version 1.1.2 compiled with PGI 6.2-4 on RH EL-4 Update 4
>>>> (configure result attached)
>>>>
>>>> The error is as follows:
>>>>
>>>> mpirun -v -np 4 -machinefile node18.dat pop
>>>> [node18:11220] *** An error occurred in MPI_Cart_shift
>>>> [node18:11221] *** An error occurred in MPI_Cart_shift
>>>> [node18:11221] *** on communicator MPI_COMM_WORLD
>>>> [node18:11221] *** MPI_ERR_COMM: invalid communicator
>>>> [node18:11221] *** MPI_ERRORS_ARE_FATAL (goodbye)
>>>> [node18:11220] *** on communicator MPI_COMM_WORLD
>>>> [node18:11220] *** MPI_ERR_COMM: invalid communicator
>>>> [node18:11220] *** MPI_ERRORS_ARE_FATAL (goodbye)
>>>> 3 additional processes aborted (not shown)
>>>>
>>>> The application runs fine with MPICH 1.2.6 and other applications
>>>> (POP 2) run fine with OpenMPI
>>>>
>>>> Any suggestions
>>>>
>>>> Thanks
>>>>
>>>> <configure_pgi_ext.log.gz>
>>>> <axel.vcf>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>
>>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>