My best guess is that you are seeing differences in scheduling behavior with respect to memory locale. I notice that you are not binding your processes, and so they are free to move around the various processors on the node. I would guess that your thread is winding up on a processor that is non-local to your memory in one case, but local to your memory in the other. This is an OS-related scheduler decision.
You might try binding your processes to see if it helps. With threads, you don't really want to bind to a core, but binding to a socket should help. Try adding --bind-to-socket to your mpirun cmd line (you can't do this if you run it as a singleton - have to use mpirun).
On Oct 25, 2011, at 2:45 AM, 吕慧伟 wrote:
Thanks, Ralph. Yes, I have taking that into account. The problem is not to compare two proc with one proc, but the "multi-threading effect". Multi-threading is good on the first machine for one and two proc, but on the second machine, it disappears for two proc.
To narrow down the problem, I reinstalled the operating system on the second machine from SUSE 11(kernel 184.108.40.206, gcc 4.3.4) to Red Hat 5.4 (kernel 2.6.18, gcc 4.1.2) which is similar to the first machine (Cent OS 5.3, kernel 2.6.18, gcc 4.1.2). Then the problem disappears. So the problem must lies somewhere in OS kernel or GCC version. Any suggestions? Thanks.
On Tue, Oct 25, 2011 at 3:11 PM, Ralph Castain <email@example.com>
Okay - thanks for testing it.
Of course, one obvious difference is that there isn't any communication when you run only one proc, but there is when you run two or more, assuming your application has MPI send/recv (or calls collective and other functions that communicate) calls in it. Communication to yourself is very fast as no bits actually move - sending messages to another proc is considerably slower.
Are you taking that into account?
On Oct 24, 2011, at 8:47 PM, 吕慧伟 wrote:
No. There's a difference between "mpirun -np 1 ./my_hybrid_app..." and "mpirun -np 2 ./...".
Run "mpirun -np 1 ./my_hybrid_app..." will increase the performance with more number of threads, but run "mpirun -np 2 ./..." decrease the performance.
On Tue, Oct 25, 2011 at 12:00 AM, <firstname.lastname@example.org>
Date: Mon, 24 Oct 2011 07:14:21 -0600
From: Ralph Castain <email@example.com>
Subject: Re: [OMPI users] Hybrid MPI/Pthreads program behaves
differently on two different machines with same hardware
To: Open MPI Users <firstname.lastname@example.org>
Content-Type: text/plain; charset="utf-8"
Does the difference persist if you run the single process using mpirun? In other words, does "mpirun -np 1 ./my_hybrid_app..." behave the same as "mpirun -np 2 ./..."?
There is a slight difference in the way procs start when run as singletons. It shouldn't make a difference here, but worth testing.
On Oct 24, 2011, at 12:37 AM, ??? wrote:
> Dear List,
> I have a hybrid MPI/Pthreads program named "my_hybrid_app", this program is memory-intensive and take advantage of multi-threading to improve memory throughput. I run "my_hybrid_app" on two machines, which have same hardware configuration but different OS and GCC. The problem is: when I run "my_hybrid_app" with one process, two machines behaves the same: the more number of threads, the better the performance; however, when I run "my_hybrid_app" with two or more processes. The first machine still increase performance with more threads, the second machine degrades in performance with more threads.
> Since running "my_hybrid_app" with one process behaves correctly, I suspect my linking to MPI library has some problem. Would somebody point me in the right direction? Thanks in advance.
> Attached are the commandline used, my machine informantion and link informantion.
> p.s. 1: Commandline
> single process: ./my_hybrid_app <number of threads>
> multiple process: mpirun -np 2 ./my_hybrid_app <number of threads>
> p.s. 2: Machine Informantion
> The first machine is CentOS 5.3 with GCC 4.1.2:
> Target: x86_64-redhat-linux
> Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-libgcj-multifile --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-220.127.116.11/jre --with-cpu=generic --host=x86_64-redhat-linux
> Thread model: posix
> gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)
> The second machine is SUSE Enterprise Server 11 with GCC 4.3.4:
> Target: x86_64-suse-linux
> Configured with: ../configure --prefix=/usr --infodir=/usr/share/info --mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64 --enable-languages=c,c++,objc,fortran,obj-c++,java,ada --enable-checking=release --with-gxx-include-dir=/usr/include/c++/4.3 --enable-ssp --disable-libssp --with-bugurl=http://bugs.opensuse.org/ --with-pkgversion='SUSE Linux' --disable-libgcj --disable-libmudflap --with-slibdir=/lib64 --with-system-zlib --enable-__cxa_atexit --enable-libstdcxx-allocator=new --disable-libstdcxx-pch --enable-version-specific-runtime-libs --program-suffix=-4.3 --enable-linux-futex --without-system-libunwind --with-cpu=generic --build=x86_64-suse-linux
> Thread model: posix
> gcc version 4.3.4 [gcc-4_3-branch revision 152973] (SUSE Linux)
> p.s. 3: ldd Informantion
> The first machine:
> $ ldd my_hybrid_app
> libm.so.6 => /lib64/libm.so.6 (0x000000358d400000)
> libmpi.so.0 => /usr/local/openmpi/lib/libmpi.so.0 (0x00002af0d53a7000)
> libopen-rte.so.0 => /usr/local/openmpi/lib/libopen-rte.so.0 (0x00002af0d564a000)
> libopen-pal.so.0 => /usr/local/openmpi/lib/libopen-pal.so.0 (0x00002af0d5895000)
> libdl.so.2 => /lib64/libdl.so.2 (0x000000358d000000)
> libnsl.so.1 => /lib64/libnsl.so.1 (0x000000358f000000)
> libutil.so.1 => /lib64/libutil.so.1 (0x000000359a600000)
> libgomp.so.1 => /usr/lib64/libgomp.so.1 (0x00002af0d5b07000)
> libpthread.so.0 => /lib64/libpthread.so.0 (0x000000358d800000)
> libc.so.6 => /lib64/libc.so.6 (0x000000358cc00000)
> /lib64/ld-linux-x86-64.so.2 (0x000000358c800000)
> librt.so.1 => /lib64/librt.so.1 (0x000000358dc00000)
> The second machine:
> $ ldd my_hybrid_app
> linux-vdso.so.1 => (0x00007fff3eb5f000)
> libmpi.so.0 => /root/opt/openmpi/lib/libmpi.so.0 (0x00007f68627a1000)
> libm.so.6 => /lib64/libm.so.6 (0x00007f686254b000)
> libopen-rte.so.0 => /root/opt/openmpi/lib/libopen-rte.so.0 (0x00007f68622fc000)
> libopen-pal.so.0 => /root/opt/openmpi/lib/libopen-pal.so.0 (0x00007f68620a5000)
> libdl.so.2 => /lib64/libdl.so.2 (0x00007f6861ea1000)
> libnsl.so.1 => /lib64/libnsl.so.1 (0x00007f6861c89000)
> libutil.so.1 => /lib64/libutil.so.1 (0x00007f6861a86000)
> libgomp.so.1 => /usr/lib64/libgomp.so.1 (0x00007f686187d000)
> libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f6861660000)
> libc.so.6 => /lib64/libc.so.6 (0x00007f6861302000)
> /lib64/ld-linux-x86-64.so.2 (0x00007f6862a58000)
> librt.so.1 => /lib64/librt.so.1 (0x00007f68610f9000)
> I installed openmpi-1.4.2 to a user directory /root/opt/openmpi and use "-L/root/opt/openmpi -Wl,-rpath,/root/opt/openmpi" when linking.
> Huiwei Lv
> PhD. student at Institute of Computing Technology,
> Beijing, China
users mailing list