Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Problem with MPI_BARRIER
From: Ghislain Lartigue (ghislain.lartigue_at_[hidden])
Date: 2011-09-08 11:04:15


This behavior happens at every call (first and following)

Here is my code (simplified):

================================================================
start_time = MPI_Wtime()
call mpi_ext_barrier()
new_time = MPI_Wtime()-start_time
write(local_time,'(F9.1)') new_time*1.0e9_WP/(36.0_WP*36.0_WP*36.0_WP)
call print_message("CAST GHOST DATA2 LOOP 1 barrier "//trim(local_time),0)

            do conn_index_id=1, Nconn(conn_type_id)

                  ! loop over data
                  this_data => block%data
                  do while (associated(this_data))

                        MPI_IRECV(...)
                        MPI_ISEND(...)

                  this_data => this_data%next
                  enddo

               endif

            enddo

         enddo

start_time = MPI_Wtime()
call mpi_ext_barrier()
new_time = MPI_Wtime()-start_time
write(local_time,'(F9.1)') new_time*1.0e9_WP/(36.0_WP*36.0_WP*36.0_WP)
call print_message("CAST GHOST DATA2 LOOP 2 barrier "//trim(local_time),0)

         done=.false.
         counter = 0
         do while (.not.done)
            do ireq=1,nreq
               if (recv_req(ireq)/=MPI_REQUEST_NULL) then
                  call MPI_TEST(recv_req(ireq),found,mystatus,icommerr)
                  if (found) then
                     call ....
                     counter=counter+1
                  endif
               endif
            enddo
            if (counter==nreq) then
               done=.true.
            endif
         enddo
================================================================

The first call to the barrier works perfectly fine, but the second one gives the strange behavior...

Ghislain.

Le 8 sept. 2011 à 16:53, Eugene Loh a écrit :

> On 9/8/2011 7:42 AM, Ghislain Lartigue wrote:
>> I will check that, but as I said in first email, this strange behaviour happens only in one place in my code.
> Is the strange behavior on the first time, or much later on? (You seem to imply later on, but I thought I'd ask.)
>
> I agree the behavior is noteworthy, but it's plausible and there's not enough information to explain it based solely on what you've said.
>
> Here is one scenario. I don't know if it applies to you since I know very little about what you're doing. I think with VampirTrace, you can collect performance data into large buffers. Occasionally, the buffers need to be flushed to disk. VampirTrace will wait for a good opportunity to do so -- e.g., a global barrier. So, you execute lots of barriers, but suddenly you hit one where VT wants to flush to disk. This takes a long time and everyone in the barrier spends a long time in the barrier. Then, execution resumes and barrier performance looks again like what it used to look like.
>
> Again, there are various scenarios to explain what you see. More information would be needed to decide which applies to you.
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>