Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem with MPI_BARRIER
From: Ghislain Lartigue (ghislain.lartigue_at_[hidden])
Date: 2011-09-08 11:04:15


This behavior happens at every call (first and following)

Here is my code (simplified):

================================================================
start_time = MPI_Wtime()
call mpi_ext_barrier()
new_time = MPI_Wtime()-start_time
write(local_time,'(F9.1)') new_time*1.0e9_WP/(36.0_WP*36.0_WP*36.0_WP)
call print_message("CAST GHOST DATA2 LOOP 1 barrier "//trim(local_time),0)

            do conn_index_id=1, Nconn(conn_type_id)

                  ! loop over data
                  this_data => block%data
                  do while (associated(this_data))

                        MPI_IRECV(...)
                        MPI_ISEND(...)

                  this_data => this_data%next
                  enddo

               endif

            enddo

         enddo

start_time = MPI_Wtime()
call mpi_ext_barrier()
new_time = MPI_Wtime()-start_time
write(local_time,'(F9.1)') new_time*1.0e9_WP/(36.0_WP*36.0_WP*36.0_WP)
call print_message("CAST GHOST DATA2 LOOP 2 barrier "//trim(local_time),0)

         done=.false.
         counter = 0
         do while (.not.done)
            do ireq=1,nreq
               if (recv_req(ireq)/=MPI_REQUEST_NULL) then
                  call MPI_TEST(recv_req(ireq),found,mystatus,icommerr)
                  if (found) then
                     call ....
                     counter=counter+1
                  endif
               endif
            enddo
            if (counter==nreq) then
               done=.true.
            endif
         enddo
================================================================

The first call to the barrier works perfectly fine, but the second one gives the strange behavior...

Ghislain.

Le 8 sept. 2011 à 16:53, Eugene Loh a écrit :

> On 9/8/2011 7:42 AM, Ghislain Lartigue wrote:
>> I will check that, but as I said in first email, this strange behaviour happens only in one place in my code.
> Is the strange behavior on the first time, or much later on? (You seem to imply later on, but I thought I'd ask.)
>
> I agree the behavior is noteworthy, but it's plausible and there's not enough information to explain it based solely on what you've said.
>
> Here is one scenario. I don't know if it applies to you since I know very little about what you're doing. I think with VampirTrace, you can collect performance data into large buffers. Occasionally, the buffers need to be flushed to disk. VampirTrace will wait for a good opportunity to do so -- e.g., a global barrier. So, you execute lots of barriers, but suddenly you hit one where VT wants to flush to disk. This takes a long time and everyone in the barrier spends a long time in the barrier. Then, execution resumes and barrier performance looks again like what it used to look like.
>
> Again, there are various scenarios to explain what you see. More information would be needed to decide which applies to you.
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>