Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Very poor performance with btl sm on twin nehalem servers with Mellanox ConnectX installed
From: Paul H. Hargrove (PHHargrove_at_[hidden])
Date: 2010-05-17 21:41:36


Entry looks good, but could probably use an additional sentence or two like:

On diskless nodes running Linux, use of /dev/shm may be an option if
supported by your distribution. This will use an in-memory file system
for the session directory, but will NOT result in a doubling of the
memory consumed for the shared memory file (i.e. file system "blocks"
and memory "pages" share a single instance).

-Paul

Jeff Squyres wrote:
> How's this?
>
> http://www.open-mpi.org/faq/?category=sm#poor-sm-btl-performance
>
> What's the advantage of /dev/shm? (I don't know anything about /dev/shm)
>
>
> On May 17, 2010, at 4:08 AM, Sylvain Jeaugey wrote:
>
>
>> I agree with Paul on the fact that a FAQ update would be great on this
>> subject. /dev/shm seems a good place to put the temporary files (when
>> available, of course).
>>
>> Putting files in /dev/shm also showed better performance on our systems,
>> even with /tmp on a local disk.
>>
>> Sylvain
>>
>> On Sun, 16 May 2010, Paul H. Hargrove wrote:
>>
>>
>>> If I google "ompi sm btl performance" the top match is
>>> http://www.open-mpi.org/faq/?category=sm
>>>
>>> I scanned the entire page from top to bottom and don't see any questions of
>>> the form
>>> Why is SM performance slower than ...?
>>>
>>> The words "NFS", "network", "file system" or "filesystem" appear nowhere on
>>> the page. The closest I could find is
>>>
>>>> 7. Where is the file that sm will mmap in?
>>>>
>>>> The file will be in the OMPI session directory, which is typically
>>>> something like /tmp/openmpi-sessions-myusername_at_mynodename* . The file
>>>> itself will have the name shared_mem_pool.mynodename. For example, the full
>>>> path could be
>>>> /tmp/openmpi-sessions-myusername_at_node0_0/1543/1/shared_mem_pool.node0.
>>>>
>>>> To place the session directory in a non-default location, use the MCA
>>>> parameter orte_tmpdir_base.
>>>>
>>> which says nothing about where one should or should not place the session
>>> directory.
>>>
>>> Not having read the entire FAQ from start to end, I will not contradict
>>> Ralph's claim that the "your SM performance might suck if you put the session
>>> directory on a remote filesystem" FAQ entry does exist, but I will assert
>>> that I did not find it in the SM section of the FAQ. I tried google on "ompi
>>> session directory" and "ompi orte_tmpdir_base" and still didn't find whatever
>>> entry Ralph is talking about. So, I think the average user with no clue
>>> about the relationship between the SM BLT and the session directory would
>>> need some help finding it. Therefore, I still feel an FAQ entry in the SM
>>> category is warranted, even if it just references whatever entry Ralph is
>>> referring to.
>>>
>>> -Paul
>>>
>>> Ralph Castain wrote:
>>>
>>>> We have had a FAQ on this for a long time...problem is, nobody reads it :-/
>>>>
>>>> Glad you found the problem!
>>>>
>>>> On May 14, 2010, at 3:15 PM, Paul H. Hargrove wrote:
>>>>
>>>>
>>>>
>>>>> Oskar Enoksson wrote:
>>>>>
>>>>>
>>>>>> Christopher Samuel wrote:
>>>>>>
>>>>>>
>>>>>>> Subject: Re: [OMPI devel] Very poor performance with btl sm on twin
>>>>>>> nehalem servers with Mellanox ConnectX installed
>>>>>>> To: devel_at_[hidden]
>>>>>>> Message-ID:
>>>>>>> <D45958078CD65C429557B4C5F492B6A60770E51F_at_[hidden]>
>>>>>>> Content-Type: text/plain; charset="iso-8859-1"
>>>>>>>
>>>>>>> On 13/05/10 20:56, Oskar Enoksson wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> The problem is that I get very bad performance unless I
>>>>>>>> explicitly exclude the "sm" btl and I can't figure out why.
>>>>>>>>
>>>>>>>>
>>>>>>> Recently someone reported issues which were traced back to
>>>>>>> the fact that the files that sm uses for mmap() were in a
>>>>>>> /tmp which was NFS mounted; changing the location where their
>>>>>>> files were kept to another directory with the orte_tmpdir_base
>>>>>>> MCA parameter fixed that issue for them.
>>>>>>>
>>>>>>> Could it be similar for yourself ?
>>>>>>>
>>>>>>> cheers,
>>>>>>> Chris
>>>>>>>
>>>>>>>
>>>>>> That was exactly right, as you guessed these are diskless nodes that
>>>>>> mounts the root filesystem over NFS.
>>>>>>
>>>>>> Setting orte_tmpdir_base to /dev/shm and btl_sm_num_fifos=9 and then
>>>>>> running mpi_stress on eight cores measures speeds of 1650MB/s for both
>>>>>> 1MB messages and 1600MB/s for 10kB messages.
>>>>>>
>>>>>> Thanks!
>>>>>> /Oskar
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>>>
>>>>> Sounds like a new FAQ entry is warranted.
>>>>>
>>>>> -Paul
>>>>>
>>>>> --
>>>>> Paul H. Hargrove PHHargrove_at_[hidden]
>>>>> Future Technologies Group
>>>>> HPC Research Department Tel: +1-510-495-2352
>>>>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>>
>>> --
>>> Paul H. Hargrove PHHargrove_at_[hidden]
>>> Future Technologies Group Tel: +1-510-495-2352
>>> HPC Research Department Fax: +1-510-486-6900
>>> Lawrence Berkeley National Laboratory
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>
>
>

-- 
Paul H. Hargrove                          PHHargrove_at_[hidden]
Future Technologies Group                 Tel: +1-510-495-2352
HPC Research Department                   Fax: +1-510-486-6900
Lawrence Berkeley National Laboratory