Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Konstantin Kudin (konstantin_kudin_at_[hidden])
Date: 2006-02-06 09:12:23


 Dear Galen,

 It actually turns out that there is a problem not only with
MPI_Alltoall_Isend_Irecv, but also with another related operation
insyncol_MPI_Alltoallv-nodes-long-SM.ski (this what seems to be holding
down the FFTs, I checked the source code, and it uses alltoallv):

#/*@insyncol_MPI_Alltoallv-nodes-long-SM.ski*/
       2 250.8 1.0 8 250.8 1.0 8
       3 1779.6 27.0 8 1779.6 27.0 8
       4 2975.1 45.8 8 2975.1 45.8 8
       5 4413.1 76.0 8 4413.1 76.0 8
       6 93370.6 42900.6 8 93370.6 42900.6 8
       7 199634.4 43273.1 8 199634.4 43273.1 8
       8 262469.6 5896.3 8 262469.6 5896.3 8

 The file .skampi I am using is the standard one that came with version
4.1, with only one notable change:
@STANDARDERRORDEFAULT 100.00

 Thanks!
 Kostya

--- "Galen M. Shipman" <gshipman_at_[hidden]> wrote:

> Hi Konstantin,
>
> > MPI_Alltoall_Isend_Irecv
>
> This is a very unscalable algorithm in skampi as it simply posts N
> MPI_Irecv's and MPI_Isend's and then does a Waitall. We shouldn't
> have an issue though on 8 procs but in general I would expect the
> performance of this algorithm to degrade quite quickly especially
> compared to Open MPI's tuned collectives. I can dig into this a bit
> more if you send me your .skampi file configured to run this
> particular benchmark.
>
> Thanks,
>
> Galen
>
>
> On Feb 4, 2006, at 9:37 AM, Konstantin Kudin wrote:
>
> > Dear Jeff and Galen,
> >
> > I have tried openmpi-1.1a1r8890. The good news is that it seems
> like
> > the freaky long latencies for certain packet sizes went away with
> the
> > options they showed up with before. Also, one version of all-to-all
> > appears to behave nicer with a specified set of parameters.
> However, I
> > still get only 1cpu performance out of 8 with the actual
> application,
> > and all this time is spent doing parallel FFTs. What is interesting
> is
> > that even with the tuned parameters, the other version of
> all-to-all
> > still performs quite poorly (see below).
> >
> > #/*@insyncol_MPI_Alltoall-nodes-long-SM.ski*/
> > mpirun -np 8 -mca btl tcp -mca coll self,basic,tuned -mca \
> > mpi_paffinity_alone 1 skampi41
> > 2 272.1 3.7 8 272.1 3.7 8
> > 3 1800.5 72.9 8 1800.5 72.9 8
> > 4 3074.0 61.0 8 3074.0 61.0 8
> > 5 5705.5 102.0 8 5705.5 102.0 8
> > 6 8054.2 282.3 8 8054.2 282.3 8
> > 7 9462.9 104.2 8 9462.9 104.2 8
> > 8 11245.8 66.9 8 11245.8 66.9 8
> >
> > mpirun -np 8 -mca btl tcp -mca coll self,basic,tuned -mca \
> > mpi_paffinity_alone 1 -mca coll_basic_crossover 8 skampi41
> > 2 267.7 1.5 8 267.7 1.5 8
> > 3 1591.2 8.4 8 1591.2 8.4 8
> > 4 2704.4 17.1 8 2704.4 17.1 8
> > 5 4813.7 307.9 3 4813.7 307.9 3
> > 6 5329.1 57.0 2 5329.1 57.0 2
> > 7 198767.6 49076.2 5 198767.6 49076.2 5
> > 8 254832.6 11235.3 5 254832.6 11235.3 5
> >
> >
> > Still poor performance:
> >
> > #/*@insyncol_MPI_Alltoall_Isend_Irecv-nodes-long-SM.ski*/
> > 2 235.0 0.7 8 235.0 0.7 8
> > 3 1565.6 15.3 8 1565.6 15.3 8
> > 4 2694.8 24.3 8 2694.8 24.3 8
> > 5 11389.9 6971.9 6 11389.9 6971.9 6
> > 6 249612.0 21102.1 2 249612.0 21102.1 2
> > 7 239051.9 3915.0 2 239051.9 3915.0 2
> > 8 262356.5 12324.6 2 262356.5 12324.6 2
> >
> >
> > Kostya
> >
> >
> >
> >
> > --- Jeff Squyres <jsquyres_at_[hidden]> wrote:
> >
> >> Greetings Konstantin.
> >>
> >> Many thanks for this report. Another user submitted almost the
> same
> >>
> >> issue earlier today (poor performance of Open MPI 1.0.x
> collectives;
> >>
> >> see
> http://www.open-mpi.org/community/lists/users/2006/02/0558.php).
> >>
> >> Let me provide an additional clarification on Galen's reply:
> >>
> >> The collectives in Open MPI 1.0.x are known to be sub-optimal --
> they
> >>
> >> return correct results, but they are not optimized at all. This
> is
> >> what Galen meant by "If I use the basic collectives then things do
> >> fall apart with long messages, but this is expected". The
> >> collectives in the Open MPI 1.1.x series (i.e., our current
> >> development trunk) provide *much* better performance.
> >>
> >> Galen ran his tests using the "tuned" collective module in the
> 1.1.x
> >>
> >> series -- these are the "better" collectives that I referred to
> >> above. This "tuned" module does not exist in the 1.0.x series.
> >>
> >> You can download a 1.1.x nightly snapshot -- including the new
> >> "tuned" module -- from here:
> >>
> >> http://www.open-mpi.org/nightly/trunk/
> >>
> >> If you get the opportunity, could you re-try your application with
> a
> >>
> >> 1.1 snapshot?
> >
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam? Yahoo! Mail has the best spam protection around
> > http://mail.yahoo.com
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com