Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Michael (mklus_at_[hidden])
Date: 2007-03-06 16:51:19


I discovered I made a minor change that cost me dearly (I had thought
I had tested this single change but perhaps didn't track the timing
data closely).

MPI_Type_creat_struct performs well only when all the data is
continuous in memory (at least for OpenMPI 1.1.2).

Is this normal or expected?

In my case the program has a f90 structure with 11 integers, 2
logicals, and five 50 element integer arrays. But at the first stage
of the program only the first element of those arrays are used. But
using MPI_Type_create_struct it is more efficient to send the entire
263 words of continuous memory (58 sec's) than to try and send only
18 words of noncontinuous memory (64 sec's). At the second stage
it's 33 words and at that stage it becomes 47 sec's vs. 163 sec's, an
extra 116 seconds, which dominates the push of my overall wall clock
time from 130 to 278 seconds. The third stage increases from 13
seconds to 37 seconds.

Because I need to send this block of data back and forward a lot I
was hoping to find a way to speed up this data transfer of this odd
block of data and a couple other variables. I may try PACK and
UNPACK on the structure, but calling those lots of times can't be
more efficient.

Previously I was equivalencing the structure to a integer array and
sending the integer array as a fast dirty solution to get started and
it worked. Not completely portable no doubt.

Michael

ps. I don't currently have valgrind installed on this cluster and
valgrind is not part of the Debian Linux 3.1r3 distribution. Without
any experience with valgrind I'm not sure how useful valgrind will
be with a MPI program of 500+ subroutines and 50K+ lines running on
16 processes. It took us a bit to get profiling working for the
OpenMP version of this code.

On Mar 6, 2007, at 11:28 AM, George Bosilca wrote:

> I doubt this come from the MPI_Pack/MPI_Unpack. The difference is 137
> seconds for 5 calls. That's basically 27 seconds by call to MPI_Pack,
> for packing 8 integers. I know the code and I'm affirmative there is
> no way to spend 27 seconds over there.
>
> Can you run your application using valgrind with the callgrind tool.
> This will give you some basic informations about where the time is
> spend. This will give us additional information about where to look.
>
> Thanks,
> george.
>
> On Mar 6, 2007, at 11:26 AM, Michael wrote:
>
>> I have a section of code were I need to send 8 separate integers via
>> BCAST.
>>
>> Initially I was just putting the 8 integers into an array and then
>> sending that array.
>>
>> I just tried using MPI_PACK on those 8 integers and I'm seeing a
>> massive slow down in the code, I have a lot of other communication
>> and this section is being used only 5 times. I went from 140 seconds
>> to 277 seconds on 16 processors using TCP via a dual gigabit ethernet
>> setup (I'm the only user working on this system today).
>>
>> This was run with OpenMPI 1.1.2 to maintain compatibility with a
>> major HPC site.
>>
>> Is there a know problem with MPI_PACK/UNPACK in OpenMPI?
>>
>> Michael
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> "Half of what I say is meaningless; but I say it so that the other
> half may reach you"
> Kahlil Gibran
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>