Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Shared Memory Performance Problem.
From: Michele Marena (michelemarena_at_[hidden])
Date: 2011-03-28 10:41:02


I run ompi_info --param btl sm and this is the output

                 MCA btl: parameter "btl_base_debug" (current value: "0")
                          If btl_base_debug is 1 standard debug is output,
if > 1 verbose debug is output
                 MCA btl: parameter "btl" (current value: <none>)
                          Default selection set of components for the btl
framework (<none> means "use all components that can be found")
                 MCA btl: parameter "btl_base_verbose" (current value: "0")
                          Verbosity level for the btl framework (0 = no
verbosity)
                 MCA btl: parameter "btl_sm_free_list_num" (current value:
"8")
                 MCA btl: parameter "btl_sm_free_list_max" (current value:
"-1")
                 MCA btl: parameter "btl_sm_free_list_inc" (current value:
"64")
                 MCA btl: parameter "btl_sm_exclusivity" (current value:
"65535")
                 MCA btl: parameter "btl_sm_latency" (current value: "100")
                 MCA btl: parameter "btl_sm_max_procs" (current value: "-1")
                 MCA btl: parameter "btl_sm_sm_extra_procs" (current value:
"2")
                 MCA btl: parameter "btl_sm_mpool" (current value: "sm")
                 MCA btl: parameter "btl_sm_eager_limit" (current value:
"4096")
                 MCA btl: parameter "btl_sm_max_frag_size" (current value:
"32768")
                 MCA btl: parameter "btl_sm_size_of_cb_queue" (current
value: "128")
                 MCA btl: parameter "btl_sm_cb_lazy_free_freq" (current
value: "120")
                 MCA btl: parameter "btl_sm_priority" (current value: "0")
                 MCA btl: parameter "btl_base_warn_component_unused"
(current value: "1")
                          This parameter is used to turn on warning messages
when certain NICs are not used

2011/3/28 Ralph Castain <rhc_at_[hidden]>

> The fact that this exactly matches the time you measured with shared memory
> is suspicious. My guess is that you aren't actually using shared memory at
> all.
>
> Does your "ompi_info" output show shared memory as being available? Jeff or
> others may be able to give you some params that would let you check to see
> if sm is actually being used between those procs.
>
>
>
> On Mar 28, 2011, at 7:51 AM, Michele Marena wrote:
>
> What happens with 2 processes on the same node with tcp?
> With --mca btl self,tcp my app runs in 23s.
>
> 2011/3/28 Jeff Squyres (jsquyres) <jsquyres_at_[hidden]>
>
>> Ah, I didn't catch before that there were more variables than just tcp vs.
>> shmem.
>>
>> What happens with 2 processes on the same node with tcp?
>>
>> Eg, when both procs are on the same node, are you thrashing caches or
>> memory?
>>
>> Sent from my phone. No type good.
>>
>> On Mar 28, 2011, at 6:27 AM, "Michele Marena" <michelemarena_at_[hidden]>
>> wrote:
>>
>> However, I thank you Tim, Ralh and Jeff.
>> My sequential application runs in 24s (wall clock time).
>> My parallel application runs in 13s with two processes on different nodes.
>> With shared memory, when two processes are on the same node, my app runs
>> in 23s.
>> I'm not understand why.
>>
>> 2011/3/28 Jeff Squyres < <jsquyres_at_[hidden]>jsquyres_at_[hidden]>
>>
>>> If your program runs faster across 3 processes, 2 of which are local to
>>> each other, with --mca btl tcp,self compared to --mca btl tcp,sm,self, then
>>> something is very, very strange.
>>>
>>> Tim cites all kinds of things that can cause slowdowns, but it's still
>>> very, very odd that simply enabling using the shared memory communications
>>> channel in Open MPI *slows your overall application down*.
>>>
>>> How much does your application slow down in wall clock time? Seconds?
>>> Minutes? Hours? (anything less than 1 second is in the noise)
>>>
>>>
>>>
>>> On Mar 27, 2011, at 10:33 AM, Ralph Castain wrote:
>>>
>>> >
>>> > On Mar 27, 2011, at 7:37 AM, Tim Prince wrote:
>>> >
>>> >> On 3/27/2011 2:26 AM, Michele Marena wrote:
>>> >>> Hi,
>>> >>> My application performs good without shared memory utilization, but
>>> with
>>> >>> shared memory I get performance worst than without of it.
>>> >>> Do I make a mistake? Don't I pay attention to something?
>>> >>> I know OpenMPI uses /tmp directory to allocate shared memory and it
>>> is
>>> >>> in the local filesystem.
>>> >>>
>>> >>
>>> >> I guess you mean shared memory message passing. Among relevant
>>> parameters may be the message size where your implementation switches from
>>> cached copy to non-temporal (if you are on a platform where that terminology
>>> is used). If built with Intel compilers, for example, the copy may be
>>> performed by intel_fast_memcpy, with a default setting which uses
>>> non-temporal when the message exceeds about some preset size, e.g. 50% of
>>> smallest L2 cache for that architecture.
>>> >> A quick search for past posts seems to indicate that OpenMPI doesn't
>>> itself invoke non-temporal, but there appear to be several useful articles
>>> not connected with OpenMPI.
>>> >> In case guesses aren't sufficient, it's often necessary to profile
>>> (gprof, oprofile, Vtune, ....) to pin this down.
>>> >> If shared message slows your application down, the question is whether
>>> this is due to excessive eviction of data from cache; not a simple question,
>>> as most recent CPUs have 3 levels of cache, and your application may require
>>> more or less data which was in use prior to the message receipt, and may use
>>> immediately only a small piece of a large message.
>>> >
>>> > There were several papers published in earlier years about shared
>>> memory performance in the 1.2 series.There were known problems with that
>>> implementation, which is why it was heavily revised for the 1.3/4 series.
>>> >
>>> > You might also look at the following links, though much of it has been
>>> updated to the 1.3/4 series as we don't really support 1.2 any more:
>>> >
>>> > <http://www.open-mpi.org/faq/?category=sm>
>>> http://www.open-mpi.org/faq/?category=sm
>>> >
>>> > <http://www.open-mpi.org/faq/?category=perftools>
>>> http://www.open-mpi.org/faq/?category=perftools
>>> >
>>> >
>>> >>
>>> >> --
>>> >> Tim Prince
>>> >> _______________________________________________
>>> >> users mailing list
>>> >> <users_at_[hidden]>users_at_[hidden]
>>> >> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> >
>>> >
>>> > _______________________________________________
>>> > users mailing list
>>> > <users_at_[hidden]>users_at_[hidden]
>>> > <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> --
>>> Jeff Squyres
>>> <jsquyres_at_[hidden]>jsquyres_at_[hidden]
>>> For corporate legal information go to:
>>> <http://www.cisco.com/web/about/doing_business/legal/cri/>
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> <users_at_[hidden]>users_at_[hidden]
>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>