Thanks, Ralph. Yes, I have taking that into account. The problem is not to compare two proc with one proc, but the "multi-threading effect". Multi-threading is good on the first machine for one and two proc, but on the second machine, it disappears for two proc.
Okay - thanks for testing it.Of course, one obvious difference is that there isn't any communication when you run only one proc, but there is when you run two or more, assuming your application has MPI send/recv (or calls collective and other functions that communicate) calls in it. Communication to yourself is very fast as no bits actually move - sending messages to another proc is considerably slower.Are you taking that into account?On Oct 24, 2011, at 8:47 PM, 吕慧伟 wrote:No. There's a difference between "mpirun -np 1 ./my_hybrid_app..." and "mpirun -np 2 ./...".Run "mpirun -np 1 ./my_hybrid_app..." will increase the performance with more number of threads, but run "mpirun -np 2 ./..." decrease the performance.--Huiwei LvOn Tue, Oct 25, 2011 at 12:00 AM, <firstname.lastname@example.org> wrote:
Date: Mon, 24 Oct 2011 07:14:21 -0600
From: Ralph Castain <email@example.com>
Subject: Re: [OMPI users] Hybrid MPI/Pthreads program behaves
differently on two different machines with same hardware
To: Open MPI Users <firstname.lastname@example.org>
Content-Type: text/plain; charset="utf-8"
Does the difference persist if you run the single process using mpirun? In other words, does "mpirun -np 1 ./my_hybrid_app..." behave the same as "mpirun -np 2 ./..."?
There is a slight difference in the way procs start when run as singletons. It shouldn't make a difference here, but worth testing.
On Oct 24, 2011, at 12:37 AM, ??? wrote:
> Dear List,
> I have a hybrid MPI/Pthreads program named "my_hybrid_app", this program is memory-intensive and take advantage of multi-threading to improve memory throughput. I run "my_hybrid_app" on two machines, which have same hardware configuration but different OS and GCC. The problem is: when I run "my_hybrid_app" with one process, two machines behaves the same: the more number of threads, the better the performance; however, when I run "my_hybrid_app" with two or more processes. The first machine still increase performance with more threads, the second machine degrades in performance with more threads.
> Since running "my_hybrid_app" with one process behaves correctly, I suspect my linking to MPI library has some problem. Would somebody point me in the right direction? Thanks in advance.
> Attached are the commandline used, my machine informantion and link informantion.
> p.s. 1: Commandline
> single process: ./my_hybrid_app <number of threads>
> multiple process: mpirun -np 2 ./my_hybrid_app <number of threads>
> p.s. 2: Machine Informantion
> The first machine is CentOS 5.3 with GCC 4.1.2:
> Target: x86_64-redhat-linux
> Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-libgcj-multifile --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-18.104.22.168/jre --with-cpu=generic --host=x86_64-redhat-linux
> Thread model: posix
> gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)
> The second machine is SUSE Enterprise Server 11 with GCC 4.3.4:
> Target: x86_64-suse-linux
> Configured with: ../configure --prefix=/usr --infodir=/usr/share/info --mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64 --enable-languages=c,c++,objc,fortran,obj-c++,java,ada --enable-checking=release --with-gxx-include-dir=/usr/include/c++/4.3 --enable-ssp --disable-libssp --with-bugurl=http://bugs.opensuse.org/ --with-pkgversion='SUSE Linux' --disable-libgcj --disable-libmudflap --with-slibdir=/lib64 --with-system-zlib --enable-__cxa_atexit --enable-libstdcxx-allocator=new --disable-libstdcxx-pch --enable-version-specific-runtime-libs --program-suffix=-4.3 --enable-linux-futex --without-system-libunwind --with-cpu=generic --build=x86_64-suse-linux
> Thread model: posix
> gcc version 4.3.4 [gcc-4_3-branch revision 152973] (SUSE Linux)
> p.s. 3: ldd Informantion
> The first machine:
> $ ldd my_hybrid_app
> libm.so.6 => /lib64/libm.so.6 (0x000000358d400000)
> libmpi.so.0 => /usr/local/openmpi/lib/libmpi.so.0 (0x00002af0d53a7000)
> libopen-rte.so.0 => /usr/local/openmpi/lib/libopen-rte.so.0 (0x00002af0d564a000)
> libopen-pal.so.0 => /usr/local/openmpi/lib/libopen-pal.so.0 (0x00002af0d5895000)
> libdl.so.2 => /lib64/libdl.so.2 (0x000000358d000000)
> libnsl.so.1 => /lib64/libnsl.so.1 (0x000000358f000000)
> libutil.so.1 => /lib64/libutil.so.1 (0x000000359a600000)
> libgomp.so.1 => /usr/lib64/libgomp.so.1 (0x00002af0d5b07000)
> libpthread.so.0 => /lib64/libpthread.so.0 (0x000000358d800000)
> libc.so.6 => /lib64/libc.so.6 (0x000000358cc00000)
> /lib64/ld-linux-x86-64.so.2 (0x000000358c800000)
> librt.so.1 => /lib64/librt.so.1 (0x000000358dc00000)
> The second machine:
> $ ldd my_hybrid_app
> linux-vdso.so.1 => (0x00007fff3eb5f000)
> libmpi.so.0 => /root/opt/openmpi/lib/libmpi.so.0 (0x00007f68627a1000)
> libm.so.6 => /lib64/libm.so.6 (0x00007f686254b000)
> libopen-rte.so.0 => /root/opt/openmpi/lib/libopen-rte.so.0 (0x00007f68622fc000)
> libopen-pal.so.0 => /root/opt/openmpi/lib/libopen-pal.so.0 (0x00007f68620a5000)
> libdl.so.2 => /lib64/libdl.so.2 (0x00007f6861ea1000)
> libnsl.so.1 => /lib64/libnsl.so.1 (0x00007f6861c89000)
> libutil.so.1 => /lib64/libutil.so.1 (0x00007f6861a86000)
> libgomp.so.1 => /usr/lib64/libgomp.so.1 (0x00007f686187d000)
> libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f6861660000)
> libc.so.6 => /lib64/libc.so.6 (0x00007f6861302000)
> /lib64/ld-linux-x86-64.so.2 (0x00007f6862a58000)
> librt.so.1 => /lib64/librt.so.1 (0x00007f68610f9000)
> I installed openmpi-1.4.2 to a user directory /root/opt/openmpi and use "-L/root/opt/openmpi -Wl,-rpath,/root/opt/openmpi" when linking.
> Huiwei Lv
> PhD. student at Institute of Computing Technology,
> Beijing, China