Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Unknown overhead in "mpirun -am ft-enable-cr"
From: Nguyen Toan (nguyentoan1508_at_[hidden])
Date: 2011-03-03 10:33:37


Thanks Josh.
Actually I also tested with the Himeno
benchmark<http://accc.riken.jp/assets/files/himenob_loadmodule/himenoBMT_c_mpi.lzh>and
got the same problem, so I think this could be a bug.
Hope this information also helps.

Regards,
Nguyen Toan

On Fri, Mar 4, 2011 at 12:04 AM, Joshua Hursey <jjhursey_at_[hidden]>wrote:

> Thanks for the program. I created a ticket for this performance bug and
> attached the tarball to the ticket:
> https://svn.open-mpi.org/trac/ompi/ticket/2743
>
> I do not know exactly when I will be able to get back to this, but
> hopefully soon. I added you to the CC so you should receive any progress
> updates regarding the ticket as we move forward.
>
> Thanks again,
> Josh
>
> On Mar 3, 2011, at 2:12 AM, Nguyen Toan wrote:
>
> > Dear Josh,
> >
> > Attached with this email is a small program that illustrates the
> performance problem. You can find simple instructions in the README file.
> > There are also 2 sample result files (cpu.256^3.8N.*) which show the
> execution time difference between 2 cases.
> > Hope you can take some time to find the problem.
> > Thanks for your kindness.
> >
> > Best Regards,
> > Nguyen Toan
> >
> > On Wed, Mar 2, 2011 at 3:00 AM, Joshua Hursey <jjhursey_at_[hidden]>
> wrote:
> > I have not had the time to look into the performance problem yet, and
> probably won't for a little while. Can you send me a small program that
> illustrates the performance problem, and I'll file a bug so we don't lose
> track of it.
> >
> > Thanks,
> > Josh
> >
> > On Feb 25, 2011, at 1:31 PM, Nguyen Toan wrote:
> >
> > > Dear Josh,
> > >
> > > Did you find out the problem? I still cannot progress anything.
> > > Hope to hear some good news from you.
> > >
> > > Regards,
> > > Nguyen Toan
> > >
> > > On Sun, Feb 13, 2011 at 3:04 PM, Nguyen Toan <nguyentoan1508_at_[hidden]>
> wrote:
> > > Hi Josh,
> > >
> > > I tried the MCA parameter you mentioned but it did not help, the
> unknown overhead still exists.
> > > Here I attach the output of 'ompi_info', both version 1.5 and 1.5.1.
> > > Hope you can find out the problem.
> > > Thank you.
> > >
> > > Regards,
> > > Nguyen Toan
> > >
> > > On Wed, Feb 9, 2011 at 11:08 PM, Joshua Hursey <jjhursey_at_[hidden]>
> wrote:
> > > It looks like the logic in the configure script is turning on the FT
> thread for you when you specify both '--with-ft=cr' and
> '--enable-mpi-threads'.
> > >
> > > Can you send me the output of 'ompi_info'? Can you also try the MCA
> parameter that I mentioned earlier to see if that changes the performance?
> > >
> > > I there are many non-blocking sends and receives, there might be
> performance bug with the way the point-to-point wrapper is tracking request
> objects. If the above MCA parameter does not help the situation, let me know
> and I might be able to take a look at this next week.
> > >
> > > Thanks,
> > > Josh
> > >
> > > On Feb 9, 2011, at 1:40 AM, Nguyen Toan wrote:
> > >
> > > > Hi Josh,
> > > > Thanks for the reply. I did not use the '--enable-ft-thread' option.
> Here is my build options:
> > > >
> > > > CFLAGS=-g \
> > > > ./configure \
> > > > --with-ft=cr \
> > > > --enable-mpi-threads \
> > > > --with-blcr=/home/nguyen/opt/blcr \
> > > > --with-blcr-libdir=/home/nguyen/opt/blcr/lib \
> > > > --prefix=/home/nguyen/opt/openmpi \
> > > > --with-openib \
> > > > --enable-mpirun-prefix-by-default
> > > >
> > > > My application requires lots of communication in every loop, focusing
> on MPI_Isend, MPI_Irecv and MPI_Wait. Also I want to make only one
> checkpoint per application execution for my purpose, but the unknown
> overhead exists even when no checkpoint was taken.
> > > >
> > > > Do you have any other idea?
> > > >
> > > > Regards,
> > > > Nguyen Toan
> > > >
> > > >
> > > > On Wed, Feb 9, 2011 at 12:41 AM, Joshua Hursey <
> jjhursey_at_[hidden]> wrote:
> > > > There are a few reasons why this might be occurring. Did you build
> with the '--enable-ft-thread' option?
> > > >
> > > > If so, it looks like I didn't move over the thread_sleep_wait
> adjustment from the trunk - the thread was being a bit too aggressive. Try
> adding the following to your command line options, and see if it changes the
> performance.
> > > > "-mca opal_cr_thread_sleep_wait 1000"
> > > >
> > > > There are other places to look as well depending on how frequently
> your application communicates, how often you checkpoint, process layout, ...
> But usually the aggressive nature of the thread is the main problem.
> > > >
> > > > Let me know if that helps.
> > > >
> > > > -- Josh
> > > >
> > > > On Feb 8, 2011, at 2:50 AM, Nguyen Toan wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I am using the latest version of OpenMPI (1.5.1) and BLCR (0.8.2).
> > > > > I found that when running an application,which uses MPI_Isend,
> MPI_Irecv and MPI_Wait,
> > > > > enabling C/R, i.e using "-am ft-enable-cr", the application runtime
> is much longer than the normal execution with mpirun (no checkpoint was
> taken).
> > > > > This overhead becomes larger when the normal execution runtime is
> longer.
> > > > > Does anybody have any idea about this overhead, and how to
> eliminate it?
> > > > > Thanks.
> > > > >
> > > > > Regards,
> > > > > Nguyen
> > > > > _______________________________________________
> > > > > users mailing list
> > > > > users_at_[hidden]
> > > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > >
> > > > ------------------------------------
> > > > Joshua Hursey
> > > > Postdoctoral Research Associate
> > > > Oak Ridge National Laboratory
> > > > http://users.nccs.gov/~jjhursey
> > > >
> > > >
> > > > _______________________________________________
> > > > users mailing list
> > > > users_at_[hidden]
> > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > >
> > > > _______________________________________________
> > > > users mailing list
> > > > users_at_[hidden]
> > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >
> > > ------------------------------------
> > > Joshua Hursey
> > > Postdoctoral Research Associate
> > > Oak Ridge National Laboratory
> > > http://users.nccs.gov/~jjhursey
> > >
> > >
> > > _______________________________________________
> > > users mailing list
> > > users_at_[hidden]
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >
> > >
> > > _______________________________________________
> > > users mailing list
> > > users_at_[hidden]
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > ------------------------------------
> > Joshua Hursey
> > Postdoctoral Research Associate
> > Oak Ridge National Laboratory
> > http://users.nccs.gov/~jjhursey
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > <test.tar>
>
> ------------------------------------
> Joshua Hursey
> Postdoctoral Research Associate
> Oak Ridge National Laboratory
> http://users.nccs.gov/~jjhursey
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>