On 7/12/2011 11:06 PM, Mohan, Ashwin wrote:
> Thanks for your message. I was however not clear about your suggestions. Would appreciate if you could clarify.
> You say," So, if you want a sane comparison but aren't willing to study the compiler manuals, you might use (if your source code doesn't violate the aliasing rules) mpiicpc -prec-div -prec-sqrt -ansi-alias and at least (if your linux compiler is g++) mpiCC -O2 possibly with some of the other options I mentioned earlier."
> ###From your response above, I understand to use, for Intel, this syntax: "mpiicpc -prec-div -prec-sqrt -ansi-alias" and for OPENMPI use "mpiCC -O2". I am not certain about the other options you mention.
> ###Also, I presently use a hostfile while submitting my mpirun. Each node has four slots and my hostfile was "nodename slots=4". My compile code is mpiCC -o xxx.xpp<filename>.
> If you have as ancient a g++ as your indication of FC3 implies, it really isn't fair to compare it with a currently supported compiler.
> ###Do you suggest upgrading the current installation of g++? Would that help?
How much it would help would depend greatly on your source code. It
won't help much anyway if you don't choose appropriate options. Current
g++ is nearly as good at auto-vectorization as icpc, unless you dive
into the pragmas and cilk stuff provided with icpc.
You really need to look at the gcc manual to understand those options;
going into it in any more depth here would try the patience of the list.
> ###How do I ensure that all 4 slots are active when i submit a mpirun -np 4<filename> command. When I do "top", I notice that all 4 slots are active. I noticed this when I did "top" with the Intel machine too, that is, it showed four slots active.
> Thank you..ashwin.
I was having trouble inferring what platform you are running on, I
guessed a single core HyperThread, which doesn't seem to agree with your
"4 slots" terminology. If you have 2 single core hyperthread CPUs, it
would be a very unusual application to find a gain for running 2 MPI
processes per core, but if the sight of 4 processes running on your
graph was your goal, I won't argue against it. You must be aware that
most clusters running CPUs of the past have HT disabled in BIOS setup.