Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: 42aftab_at_[hidden]
Date: 2007-10-29 02:47:32


Hi All,
       Thanks for the help. I think that I don't have the cache issue
because all the processes have the same amount of data, and
accessed in the same fashion. My problem is solved partially as I
was using 2, 4, 8 , 16, 32 and 64 processes for my application
code. Now what I did I used 3 processes instead of 2 and 5 instead
of 4. In other words I used one extra process than what I was using
before. I forced process 0 to do nothing but just wait for other
processes to finish. In this way I am having same time, on all the
processes except process 0, for calculating the code segment that
was taking more time on process 0. So, still I need help and I will
be thankful for further help.

regards Aftab Hussain

On Fri, October 26, 2007 4:13 pm, Jeff Squyres wrote:
> This is not an MPI problem.
>
>
> Without looking at your code in detail, I'm guessing that you're
> accessing memory without any regard to memory layout and/or caching. Such
> an access pattern will therefore thrash your L1 and L2 caches and access
> memory in a truly horrible pattern that guarantees abysmal performance.
>
> Google around for cache effects or check out an operating systems
> textbook; there's lots of material around about this kind of effect.
>
> Good luck.
>
>
>
>
> On Oct 26, 2007, at 5:10 AM, 42aftab_at_[hidden] wrote:
>
>
>> Thanks,
>>
>>
>> The array bounds are the same on all the nodes and also the
>> compute nodes are identical i.e. SunFire V890 nodes. And I have also
>> changed the root process to be on different nodes, but the problem
>> remains the same. I still dont understand the problem very well and my
>> progress is in stand still situation.
>>
>> regards aftab hussain
>>
>> Hi,
>>
>>
>> Please ensure if following things are correct
>> 1) The array bounds are equal. Means "my_x" and "size_y" has the same
>> value on all nodes. 2) Nodes are homogenous. To check that, you could
>> decide root to be some different node and run the program
>>
>> -Neeraj
>>
>>
>>
>> On Fri, October 26, 2007 10:13 am, 42aftab_at_[hidden] wrote:
>>
>>> Thanks for your reply,
>>>
>>>
>>>
>>> I used MPI_Wtime for my application but even then process 0 took
>>> longer time executing the mentioned code segment. I might be worng, but
>>> what I see is process 0 takes more time to access the array elements
>>> than other processes. Now I dont see what to do because the mentioned
>>> code segment is creating a bottleneck for the timing of my application.
>>>
>>>
>>> Can any one suggest somthing in this regard. I will be very thankful
>>>
>>>
>>>
>>> regards
>>>
>>> Aftab Hussain
>>>
>>>
>>>
>>>
>>> On Thu, October 25, 2007 9:38 pm, jody wrote:
>>>
>>>
>>>> HI
>>>> I'm not sure if that is a problem,
>>>> but in MPI applications you shoud use MPI_WTime() for time-
>>>> measurements
>>>>
>>>> Jody
>>>>
>>>>
>>>>
>>>>
>>>> On 10/25/07, 42aftab_at_[hidden] <42aftab_at_[hidden]> wrote:
>>>>
>>>>
>>>>
>>>>> Hi all,
>>>>> I am a research assistant (RA) at NUST Pakistan in High
>>>>> Performance
>>>>> Scientific Computing Lab. I am working on the parallel
>>>>> implementation of the Finitie Difference Time Domain (FDTD) method
>>>>> using MPI. I am using the OpenMPI environment on a cluster of 4
>>>>> SunFire v890 cluster connected through Myrinet. I am having
>>>>> problem that when I run my code with let say 4 processes. Process
>>>>> 0 takes about 3 times more time than other three processes,
>>>>> executing a for loop which is the main cause of load imbalance in
>>>>> my code. I am writing the code that is causing the problem. The
>>>>> code is run by all the processes simultaneously and independently
>>>>> and I have timed it independent of segments of code.
>>>>>
>>>>> start = gethrtime(); for (m = 1; m < my_x ; m++){ for (n = 1; n <
>>>>> size_y-1; n++) { Ez(m,n) = Ez(m,n) + cezh*((Hy(m,n) - Hy(m-1,n))
>>>>> -
>>>>> (Hx(m,n) - Hx(m,n-1)));
>>>>> }
>>>>> }
>>>>> stop = gethrtime(); time = (stop-start);
>>>>>
>>>>> In my implementation I used 1-D array to realize 2-D arrays.I
>>>>> have used the following macros for accesing the array elements.
>>>>>
>>>>> #define Hx(I,J) hx[(I)*(size_y) + (J)]
>>>>> #define Hy(I,J) hy[(I)*(size_y) + (J)]
>>>>> #define Ez(I,J) ez[(I)*(size_y) + (J)]
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Can any one tell me what am I doing wrong here, or macros are
>>>>> creating the problems or it can be related to any OS issue. I will
>>>>> be looking forward for help because this problem has stopped my
>>>>> progress for the last two weeks
>>>>>
>>>>> regards aftab hussain
>>>>>
>>>>> RA High Performance Scientific Computing Lab
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> NUST Institue of Information Technology
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> National University of Sciences and Technology Pakistan
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> This message has been scanned for viruses and
>>>>> dangerous content by MailScanner, and is believed to be clean.
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> users mailing list users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> This message has been scanned for viruses and
>>>> dangerous content by MailScanner, and is believed to be clean.
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> This message has been scanned for viruses and
>>> dangerous content by MailScanner, and is believed to be clean.
>>>
>>> _______________________________________________
>>> users mailing list users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>>
>>
>>
>>
>> --
>> This message has been scanned for viruses and
>> dangerous content by MailScanner, and is believed to be clean.
>>
>> _______________________________________________
>> users mailing list users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
> --
> Jeff Squyres
> Cisco Systems
>
>
> _______________________________________________
> users mailing list users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is believed to be clean.
>
>

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.