Each node have two processors (no dual-core).
2011/3/28 Michele Marena <michelemarena_at_[hidden]>
> However, I thank you Tim, Ralh and Jeff.
> My sequential application runs in 24s (wall clock time).
> My parallel application runs in 13s with two processes on different nodes.
> With shared memory, when two processes are on the same node, my app runs in
> I'm not understand why.
> 2011/3/28 Jeff Squyres <jsquyres_at_[hidden]>
>> If your program runs faster across 3 processes, 2 of which are local to
>> each other, with --mca btl tcp,self compared to --mca btl tcp,sm,self, then
>> something is very, very strange.
>> Tim cites all kinds of things that can cause slowdowns, but it's still
>> very, very odd that simply enabling using the shared memory communications
>> channel in Open MPI *slows your overall application down*.
>> How much does your application slow down in wall clock time? Seconds?
>> Minutes? Hours? (anything less than 1 second is in the noise)
>> On Mar 27, 2011, at 10:33 AM, Ralph Castain wrote:
>> > On Mar 27, 2011, at 7:37 AM, Tim Prince wrote:
>> >> On 3/27/2011 2:26 AM, Michele Marena wrote:
>> >>> Hi,
>> >>> My application performs good without shared memory utilization, but
>> >>> shared memory I get performance worst than without of it.
>> >>> Do I make a mistake? Don't I pay attention to something?
>> >>> I know OpenMPI uses /tmp directory to allocate shared memory and it is
>> >>> in the local filesystem.
>> >> I guess you mean shared memory message passing. Among relevant
>> parameters may be the message size where your implementation switches from
>> cached copy to non-temporal (if you are on a platform where that terminology
>> is used). If built with Intel compilers, for example, the copy may be
>> performed by intel_fast_memcpy, with a default setting which uses
>> non-temporal when the message exceeds about some preset size, e.g. 50% of
>> smallest L2 cache for that architecture.
>> >> A quick search for past posts seems to indicate that OpenMPI doesn't
>> itself invoke non-temporal, but there appear to be several useful articles
>> not connected with OpenMPI.
>> >> In case guesses aren't sufficient, it's often necessary to profile
>> (gprof, oprofile, Vtune, ....) to pin this down.
>> >> If shared message slows your application down, the question is whether
>> this is due to excessive eviction of data from cache; not a simple question,
>> as most recent CPUs have 3 levels of cache, and your application may require
>> more or less data which was in use prior to the message receipt, and may use
>> immediately only a small piece of a large message.
>> > There were several papers published in earlier years about shared memory
>> performance in the 1.2 series.There were known problems with that
>> implementation, which is why it was heavily revised for the 1.3/4 series.
>> > You might also look at the following links, though much of it has been
>> updated to the 1.3/4 series as we don't really support 1.2 any more:
>> > http://www.open-mpi.org/faq/?category=sm
>> > http://www.open-mpi.org/faq/?category=perftools
>> >> --
>> >> Tim Prince
>> >> _______________________________________________
>> >> users mailing list
>> >> users_at_[hidden]
>> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > _______________________________________________
>> > users mailing list
>> > users_at_[hidden]
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Jeff Squyres
>> For corporate legal information go to:
>> users mailing list