Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Brock Palen (brockp_at_[hidden])
Date: 2007-08-17 15:18:53


Thanks this is all interesting articles.

Brock Palen
Center for Advanced Computing
brockp_at_[hidden]
(734)936-1985

On Aug 17, 2007, at 3:13 PM, Jeff Squyres wrote:

> On Aug 17, 2007, at 12:16 PM, Brock Palen wrote:
>
>> We have a user who uses the sepran1206 package. It works for him on
>> lam, mpich2 and OMPI up to problem sizes i see in the debugger (ddt)
>> that both rank 0 and rank 1 call PMPI_SEND()
>> Is PMPI_SEND the same as MPI_SEND?
>
> For the most part, yes. In Open MPI on many operating systems, one
> is a weak symbol for the other (so in a debugger, you might see your
> app call PMPI_Send instead of MPI_Send). On other operating systems
> where weak symbols are not supported (e.g., OS X), there are two
> copies of the same C function, one named MPI_Send and the other named
> PMPI_Send (ditto for Fortran).
>
> The "PMPI" versions are what are called the profiling versions --
> meaning that someone can write their own 3rd party library and
> provide "MPI_<foo>" functions to intercept all the MPI calls. They
> can then do process accounting, tracing, or whatever they want to do,
> and then call the back-end "PMPI_<foo>" function to perform the
> actual MPI functionality. See the "profiling" chapter of the MPI-1
> spec if you care about the details.
>
>> also why would it work with lam and mpich2 ?
>>
>> If we up the btl_sm_eager_limit it works, (obviously due to the
>> blocking nature of both calling MPI_SEND)
>> but I am confused as to why lam works and ompi does not.
>
> A common reason for this kind of behavior is assuming that MPI_SEND
> will not block. Check out an old magazine column that I wrote about
> this topic:
>
> http://cw.squyres.com/columns/2004-08-CW-MPI-Mechanic.pdf
>
> It's "#1" on my top-10 list of evils to avoid in parallel (that
> column is part 2 of 2 -- part 1 is http://cw.squyres.com/columns/
> 2004-07-CW-MPI-Mechanic.pdf). I also talk about the same problem
> in this column under the "Debugging a Classic MPI Mistake" heading:
>
> http://cw.squyres.com/columns/2005-01-CW-MPI-Mechanic.pdf
>
> I'll likely be copying/moving the PDFs of these old columns to
> www.open-mpi.org in the not-distant future.
>
> BTW: I'm not saying that this is definitely the problem, but from
> your description, it certainly could be.
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>