Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Working with a CellBlade cluster
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-10-31 16:20:39


AFAIK, there are no parameters available to monitor IB message
passing. The majority of it is processed in hardware, and Linux is
unaware of it. We have not added any extra instrumentation into the
openib BTL to provide auditing information, because, among other
reasons, that is the performance-critical code path and we didn't want
to add any latency in there.

The best you may be able to do is with a PMPI-based library to audit
MPI function call invocations.

On Oct 31, 2008, at 4:07 PM, Mi Yan wrote:

> Gilbert,
>
> I did not know the MCA parameters that can monitor the message
> passing. I have tried a few MCA verbose parameters and did not
> identify anyone helpful.
>
> One way to check if the message goes via IB or SM maybe to check the
> counters in /sys/class/infiniband.
>
> Regards,
> Mi
> <graycol.gif>Gilbert Grosdidier <grodid_at_[hidden]>
>
>
> Gilbert Grosdidier <grodid_at_[hidden]>
> Sent by: users-bounces_at_[hidden]
> 10/29/2008 12:36 PM
> Please respond to
> Open MPI Users <users_at_[hidden]>
> <ecblank.gif>
> To
> <ecblank.gif>
> Open MPI Users <users_at_[hidden]>
> <ecblank.gif>
> cc
> <ecblank.gif>
> <ecblank.gif>
> Subject
> <ecblank.gif>
> Re: [OMPI users] Working with a CellBlade cluster
> <ecblank.gif>
> <ecblank.gif>
>
> Thank you very much Mi and Lenny for your detailed replies.
>
> I believe I can summarize the infos to allow for
> 'Working with a QS22 CellBlade cluster' like this:
> - Yes, messages are efficiently handled with "-mca btl openib,sm,self"
> - Better to go to the OMPI-1.3 version ASAP
> - It is currently more efficient/easy to use numactl to control
> processor affinity on a QS22.
>
> So far so good.
>
> One question remains: how could I monitor in details message passing
> thru IB (on one side) and thru SM (on the other side) thru the use
> of mca
> parameters, please ? Additionnal info about the verbosity level
> of this monitoring will be highly appreciated ... A lengthy travel
> inside the list of such parameters provided by ompi_info did not
> enlighten me (there are so many xxx_sm_yyy type params that I don't
> know which
> could be the right one ;-)
>
> Thanks in advance for your hints, Best Regards, Gilbert.
>
>
> On Thu, 23 Oct 2008, Mi Yan wrote:
>
> >
> > 1. MCA BTL parameters
> > With "-mca btl openib,self", both message between two Cell
> processors on
> > one QS22 and messages between two QS22s go through IB.
> >
> > With "-mca btl openib,sm,slef", message on one QS22 go through
> shared
> > memory, message between QS22 go through IB,
> >
> > Depending on the message size and other MCA parameters, it does not
> > guarantee message passing on shared memory is faster than on IB.
> E.g.
> > the bandwidth for 64KB message is 959MB/s on shared-memory and is
> 694MB/s
> > on IB; the bandwidth for 4MB message is 539 MB/s and 1092 MB/s
> on IB.
> > The bandwidth of 4MB message on shared memory may be higher if you
> tune
> > some MCA parameter.
> >
> > 2. mpi_paffinity_alone
> > "mpi_paffinity_alone =1" is not a good choice for QS22. There
> are two
> > sockets with two physical Cell/B.E. on one QS22. Each Cell/B.E.
> has two
> > SMT threads. So there are four logical CPUs on one QS22. CBE
> Linux
> > kernel maps logical cpu 0 and 1 to socket1 and maps logical cpu 1
> and 2 to
> > socket 2. If mpi_paffinity_alone is set to 1, the two MPI
> instances
> > will be assigned to logical cpu 0 and cpu 1 on socket 1. I
> believe this is
> > not what you want.
> >
> > A temporaily solution to force the affinity on QS22 is to use
> > "numactl", E.g. assuming the hostname is "qs22" and the
> executable is
> > "foo". the following command can be used
> > mpirun -np 1 -H qs22 numactl -c0 -m0 foo : -np
> 1 -H qs22
> > numactl -c1 -m1 foo
> >
> > In the long run, I wish CBE kernel export CPU topology in /
> sys and
> > use PLPA to force the processor affinity.
> >
> > Best Regards,
> > Mi
> >
> >
> >
> >
> > "Lenny
> > Verkhovsky"
> >
> <lenny.verkhovsky To
> > @gmail.com> "Open MPI Users"
> > Sent by: <users_at_[hidden]>
> > users-
> bounces_at_ope cc
> > n-mpi.org
> >
> Subject
> > Re: [OMPI users] Working
> with a
> > 10/23/2008 05:48 CellBlade cluster
> > AM
> >
> >
> > Please respond to
> > Open MPI Users
> > <users_at_open-mpi.o
> > rg>
> >
> >
> >
> >
> >
> >
> > Hi,
> >
> >
> > If I understand you correctly the most suitable way to do it is by
> > paffinity that we have in Open MPI 1.3 and the trank.
> > how ever usually OS is distributing processes evenly between
> sockets by it
> > self.
> >
> > There still no formal FAQ due to a multiple reasons but you can
> read how to
> > use it in the attached scratch ( there were few name changings of
> the
> > params, so check with ompi_info )
> >
> > shared memory is used between processes that share same machine,
> and openib
> > is used between different machines ( hostnames ), no special mca
> params are
> > needed.
> >
> > Best Regards
> > Lenny,
> >
> >
> >
> >
> >
> >
> >
> > On Sun, Oct 19, 2008 at 10:32 AM, Gilbert Grosdidier <grodid_at_[hidden]
> >
> > wrote:
> > Working with a CellBlade cluster (QS22), the requirement is to
> have one
> > instance of the executable running on each socket of the blade
> (there are
> > 2
> > sockets). The application is of the 'domain decomposition' type,
> and each
> > instance is required to often send/receive data with both the
> remote
> > blades and
> > the neighbor socket.
> >
> > Question is : which specification must be used for the mca btl
> component
> > to force 1) shmem type messages when communicating with this
> neighbor
> > socket,
> > while 2) using openib to communicate with the remote blades ?
> > Is '-mca btl sm,openib,self' suitable for this ?
> >
> > Also, which debug flags could be used to crosscheck that the
> messages
> > are
> > _actually_ going thru the right channel for a given channel,
> please ?
> >
> > We are currently using OpenMPI 1.2.5 shipped with RHEL5.2
> (ppc64).
> > Which version do you think is currently the most optimised for
> these
> > processors and problem type ? Should we go towards OpenMPI 1.2.8
> > instead ?
> > Or even try some OpenMPI 1.3 nightly build ?
> >
> > Thanks in advance for your help, Gilbert.
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > (See attached file: RANKS_FAQ.doc)
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> --
> *---------------------------------------------------------------------*
> Gilbert Grosdidier Gilbert.Grosdidier_at_[hidden]
> LAL / IN2P3 / CNRS Phone : +33 1 6446 8909
> Faculté des Sciences, Bat. 200 Fax : +33 1 6446 8546
> B.P. 34, F-91898 Orsay Cedex (FRANCE)
> ---------------------------------------------------------------------
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems