We have had a FAQ on this for a long time...problem is, nobody reads it :-/
Glad you found the problem!
On May 14, 2010, at 3:15 PM, Paul H. Hargrove wrote:
> Oskar Enoksson wrote:
>> Christopher Samuel wrote:
>>> Subject: Re: [OMPI devel] Very poor performance with btl sm on twin
>>> nehalem servers with Mellanox ConnectX installed
>>> To: devel_at_[hidden]
>>> Content-Type: text/plain; charset="iso-8859-1"
>>> On 13/05/10 20:56, Oskar Enoksson wrote:
>>>> The problem is that I get very bad performance unless I
>>>> explicitly exclude the "sm" btl and I can't figure out why.
>>> Recently someone reported issues which were traced back to
>>> the fact that the files that sm uses for mmap() were in a
>>> /tmp which was NFS mounted; changing the location where their
>>> files were kept to another directory with the orte_tmpdir_base
>>> MCA parameter fixed that issue for them.
>>> Could it be similar for yourself ?
>> That was exactly right, as you guessed these are diskless nodes that
>> mounts the root filesystem over NFS.
>> Setting orte_tmpdir_base to /dev/shm and btl_sm_num_fifos=9 and then
>> running mpi_stress on eight cores measures speeds of 1650MB/s for both
>> 1MB messages and 1600MB/s for 10kB messages.
>> devel mailing list
> Sounds like a new FAQ entry is warranted.
> Paul H. Hargrove PHHargrove_at_[hidden]
> Future Technologies Group
> HPC Research Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> devel mailing list