Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] strange IMB runs
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-08-13 07:35:02


Just a couple of data points:

1. so we don't confuse folks, there is no legal thing about a space in
OpenMPI. Heck, most of us developers drop the space in our
discussions. It was put in there to avoid confusion with OpenMP. While
the more marketing oriented worry about it, the rest of the world
doesn't.

2. we regularly see shared memory with lower bandwidth than openib in
our tests on 1.3. sm's latency is better, but the bandwidth is lower.
I'll provide numbers when I get in the office, and can test against
tcp as well at that time.

Ralph

On Aug 12, 2009, at 11:51 PM, Eugene Loh wrote:

> I was away on vacation for two weeks and therefore missed most of
> this thread, but I'm quite interested.
>
> Michael Di Domenico wrote:
>
> >I'm not sure I understand what's actually happened here. I'm running
> >IMB on an HP superdome, just comparing the PingPong benchmark
> >
> >HP-MPI v2.3
> >Max ~ 700-800MB/sec
> >
> >OpenMPI v1.3
> >-mca btl self,sm - Max ~ 125-150MB/sec
> >-mca btl self,tcp - Max ~ 500-550MB/sec
> >
> >Is this behavior expected? Are there any tunables to get the OpenMPI
> >sockets up near HP-MPI?
>
> First, I want to understand the configuration. It's just a single
> node. No interconnect (InfiniBand or Ethernet or anything). Right?
>
> If so, without knowing too much about the Superdome, I assume the
> only puzzle here is why the Open MPI sm bandwidth is so low. The
> other stuff (like HP results or OMPI tcp results) are fine so far as
> I know.
>
> Specifically, I tried some on-node bandwidth tests on another system
> comparing sm and tcp, and tcp is about 1.4x slower than sm. I think
> this is consistent with expectations and makes the OMPI tcp
> performance roughly consistent with the HP MPI performance.
>
> So, again, the single oddity here appears to be the very slow OMPI
> sm bandwidth.
>
> George Bosilca wrote:
> >
> >The leave pinned will not help in this context.
>
> Michael Di Domenico wrote:
> >
> >mpi_leave_pinned didn't help still at ~145MB/sec
>
> Right. The "leave pinned" variable should be irrelevent, both for
> TCP (which isn't the issue here) and for sm (which is disturbingly
> low).
>
> Michael Di Domenico wrote:
>
> >On Thu, Jul 30, 2009 at 10:08 AM, George
> Bosilca<bosilca_at_[hidden]> wrote:
> >>
> >>The Open MPI version is something you compiled or it came
> installed with the
> >>OS? If you compiled it can you please provide us the configure line?
> >
> >OpenMPI was compiled from source v1.3 with only a --prefix line, no
> >other options.
>
> I think a configure line with only --prefix is okay, but for
> performance you probably need compiler optimization flags set one
> way or the other. One way is with environment variables. E.g., for
> csh shell and GCC compilers, maybe something like:
>
> setenv CFLAGS "-O -m64 -g"
> setenv CXXFLAGS "-O -m64 -g"
> setenv FFLAGS "-O -m64 -g"
> setenv FCFLAGS "-O -m64 -g"
>
> or whatever.
>
> That said, I just tried building OMPI with and without optimization,
> and the on-node bandwidth seems basically unaffected. I suppose
> that is perhaps no surprise since the data movement will basically
> just be driven by memcpy calls anyhow.
>
> Michael Di Domenico wrote:
> >
> >Here's an interesting data point. I installed the RHEL rpm version
> of
> >OpenMPI 1.2.7-6 for ia64
> >
> >mpirun -np 2 -mca btl self,sm -mca mpi_paffinity_alone 1 -mca
> >mpi_leave_pinned 1 $PWD/IMB-MPI1 pingpong
> >
> >With v1.3 and -mca btl self,sm i get ~150MB/sec
> >With v1.3 and -mca btl self,tcp i get ~550MB/sec
> >
> >With v1.2.7-6 and -mca btl self,sm i get ~225MB/sec
> >With v1.2.7-6 and -mca btl self,tcp i get ~650MB/sec
>
> Michael Di Domenico wrote:
> >
> >So pushing this along a little more
> >
> >running with openmpi-1.3 svn rev 20295
> >
> >mpirun -np 2
> > -mca btl sm,self
> > -mca mpi_paffinity_alone 1
> > -mca mpi_leave_pinned 1
> > -mca btl_sm_eager_limit 8192
> >$PWD/IMB-MPI1 pingpong
> >
> >Yields ~390MB/sec
> >
> >So we're getting there, but still only about half speed
>
> One of the differences among MPI implementations is the default
> placement of processes within the node. E.g., should processes by
> default be collocated on cores of the same socket or on cores of
> different sockets? I don't know if that issue is applicable here
> (that is, HP MPI vs Open MPI or on Superdome architecture), but it's
> potentially an issue to look at. With HP MPI, mpirun has a -
> cpu_bind switch for controlling placement. With Open MPI, mpirun
> controls placement with -rankfile.
>
> E.g., what happens if you try
>
> % cat rf1
> rank 0=XX slot=0
> rank 1=XX slot=1
> % cat rf2
> rank 0=XX slot=0
> rank 1=XX slot=2
> % cat rf3
> rank 0=XX slot=0
> rank 1=XX slot=3
> [...etc...]
> % mpirun -np 2 --mca btl self,sm --host XX,XX -rf rf1 $PWD/IMB-MPI1
> pingpong
> % mpirun -np 2 --mca btl self,sm --host XX,XX -rf rf2 $PWD/IMB-MPI1
> pingpong
> % mpirun -np 2 --mca btl self,sm --host XX,XX -rf rf3 $PWD/IMB-MPI1
> pingpong
> [...etc...]
>
> where XX is the name of your node and you march through all the
> cores on your Superdome node?
>
> Also, I'm puzzled why you should see better results by changing
> btl_sm_eager_limit. That shouldn't change long-message bandwidth,
> but only the message size at which one transitions from short to
> long messages. If anything, tweaking btl_sm_max_send_size would be
> the variable to try.
>
> Final note: Notice the space in "Open MPI". Some legal thing.
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users