Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] EXTERNAL: Re: qp memory allocation problem
From: Blosch, Edwin L (edwin.l.blosch_at_[hidden])
Date: 2011-09-12 12:32:02


Nathan, I found this parameters under /sys/module/mlx4_core/parameters. How do you incorporate a changed value? What to restart/rebuild?

-----Original Message-----
From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On Behalf Of Nathan Hjelm
Sent: Monday, September 12, 2011 11:00 AM
To: Open MPI Users
Subject: EXTERNAL: Re: [OMPI users] qp memory allocation problem

I also recommend checking the log_mtts_per_set parameter to the mlx4 module. This parameter controls how much memory can be registered for use by the mlx4 driver and it should be in the range 1-5 (or 0-7 depending on the version of the mlx4 driver). I recommend tthe parameter be set such that you can register all the memory on the node. Assuming you are not using huge pages here is a list of possible values (and how much memory can be registered).

0 - 2 GB (new mlx4 default-- bad setting)
1 - 4 GB
2 - 8 GB
3 - 16 GB (old mlx4 default)
4 - 32 GB
5 - 64 GB
6 - 128 GB
7 - 256 GB

-Nathan Hjelm
Los Alamos National Laboratory

On Mon, 12 Sep 2011, Samuel K. Gutierrez wrote:

> Hi,
> This problem can be caused by a variety of things, but I suspect our
> default queue pair parameters (QP) aren't helping the situation :-).
>
> What happens when you add the following to your mpirun command?
>
> -mca btl_openib_receive_queues S,4096,128:S,12288,128:S,65536,12
>
> OMPI Developers:
>
> Maybe we should consider disabling the use of per-peer queue pairs by
> default. Do they buy us anything? For what it is worth, we have stopped using them on all of our large systems here at LANL.
>
> Thanks,
>
> Samuel K. Gutierrez
> Los Alamos National Laboratory
>
> On Sep 12, 2011, at 9:23 AM, Blosch, Edwin L wrote:
>
> I am getting this error message below and I don't know what it means or how to fix it. It only happens when I
> run on a large number of processes, e.g. 960. Things work fine on 480, and I don't think the application has a
> bug. Any help is appreciated...
>
> [c1n01][[30697,1],3][connect/btl_openib_connect_oob.c:464:qp_create_on
> e] error creating qp errno says Cannot allocate memory
> [c1n01][[30697,1],4][connect/btl_openib_connect_oob.c:464:qp_create_on
> e] error creating qp errno says Cannot allocate memory
>
> Here's the mpirun command I used:
> mpirun --prefix /usr/mpi/intel/openmpi-1.4.3 --machinefile <host file>
> -np 960 --mca btl ^tcp --mca mpool_base_use_mem_hooks 1 --mca
> mpi_leave_pinned 1 -x LD_LIBRARY_PATH -x MPI_ENVIRONMENT=1
> <application name +
> arguments>
>
> Here's the applicable hardware from the
> /usr/mpi/intel/openmpi-1.4.3/share/openmpi/mca-btl-openib-device-param
> s.ini
> file:
> # A.k.a. ConnectX
> [Mellanox Hermon]
> vendor_id = 0x2c9,0x5ad,0x66a,0x8f1,0x1708,0x03ba,0x15b3,0x119f
> vendor_part_id =
> 25408,25418,25428,26418,26428,25448,26438,26448,26468,26478,26488
> use_eager_rdma = 1
> mtu = 2048
> max_inline_data = 128
>
> And this is the output of ompi_info -param btl openib:
> MCA btl: parameter "btl_base_verbose" (current value:
> "0", data source: default value)
> Verbosity level of the BTL framework
> MCA btl: parameter "btl" (current value: <none>, data
> source: default value)
> Default selection set of components for the
> btl framework (<none> means use all components that can be found)
> MCA btl: parameter "btl_openib_verbose" (current
> value: "0", data source: default value)
> Output some verbose OpenIB BTL information
> (0 = no output, nonzero = output)
> MCA btl: parameter
> "btl_openib_warn_no_device_params_found" (current value: "1", data source: default value, synonyms:
> btl_openib_warn_no_hca_params_found)
> Warn when no device-specific parameters are
> found in the INI file specified by the btl_openib_device_param_files
> MCA
> parameter (0 = do not warn; any other value
> = warn)
> MCA btl: parameter
> "btl_openib_warn_no_hca_params_found" (current value: "1", data
> source: default value, deprecated, synonym
> of: btl_openib_warn_no_device_params_found)
> Warn when no device-specific parameters are
> found in the INI file specified by the btl_openib_device_param_files
> MCA
> parameter (0 = do not warn; any other value
> = warn)
> MCA btl: parameter
> "btl_openib_warn_default_gid_prefix" (current value: "1", data source:
> default
> value)
> Warn when there is more than one active
> ports and at least one of them connected to the network with only
> default GID
> prefix configured (0 = do not warn; any
> other value = warn)
> MCA btl: parameter "btl_openib_warn_nonexistent_if"
> (current value: "1", data source: default value)
> Warn if non-existent devices and/or ports
> are specified in the btl_openib_if_[in|ex]clude MCA parameters (0 = do
> not
> warn; any other value = warn)
> MCA btl: parameter "btl_openib_want_fork_support"
> (current value: "-1", data source: default value)
> Whether fork support is desired or not
> (negative = try to enable fork support, but continue even if it is not
> available, 0 = do not enable fork support,
> positive = try to enable fork support and fail if it is not available)
> MCA btl: parameter "btl_openib_device_param_files" (current value:
> "/usr/mpi/intel/openmpi-1.4.3/share/openmpi/mca-btl-openib-device-params.ini", data source:
> default value, synonyms:
> btl_openib_hca_param_files)
> Colon-delimited list of INI-style files that
> contain device vendor/part-specific parameters
> MCA btl: parameter "btl_openib_hca_param_files" (current value:
> "/usr/mpi/intel/openmpi-1.4.3/share/openmpi/mca-btl-openib-device-params.ini", data source:
> default value, deprecated,
> synonym of: btl_openib_device_param_files)
> Colon-delimited list of INI-style files that
> contain device vendor/part-specific parameters
> MCA btl: parameter "btl_openib_device_type" (current
> value: "all", data source: default value)
> Specify to only use IB or iWARP network
> adapters (infiniband = only use InfiniBand HCAs; iwarp = only use
> iWARP NICs;
> all = use any available adapters)
> MCA btl: parameter "btl_openib_max_btls" (current
> value: "-1", data source: default value)
> Maximum number of device ports to use (-1 =
> use all available, otherwise must be >= 1)
> MCA btl: parameter "btl_openib_free_list_num"
> (current value: "8", data source: default value)
> Intial size of free lists (must be >= 1)
> MCA btl: parameter "btl_openib_free_list_max"
> (current value: "-1", data source: default value)
> Maximum size of free lists (-1 = infinite,
> otherwise must be >= 0)
> MCA btl: parameter "btl_openib_free_list_inc"
> (current value: "32", data source: default value)
> Increment size of free lists (must be >= 1)
> MCA btl: parameter "btl_openib_mpool" (current value:
> "rdma", data source: default value)
> Name of the memory pool to be used (it is
> unlikely that you will ever want to change this
> MCA btl: parameter "btl_openib_reg_mru_len" (current
> value: "16", data source: default value)
> Length of the registration cache most
> recently used list (must be >= 1)
> MCA btl: parameter "btl_openib_cq_size" (current value: "1000", data source: default value, synonyms:
> btl_openib_ib_cq_size)
> Size of the OpenFabrics completion queue
> (will automatically be set to a minimum of (2 * number_of_peers *
> btl_openib_rd_num))
> MCA btl: parameter "btl_openib_ib_cq_size" (current
> value: "1000", data source: default value, deprecated, synonym of:
> btl_openib_cq_size)
> Size of the OpenFabrics completion queue
> (will automatically be set to a minimum of (2 * number_of_peers *
> btl_openib_rd_num))
> MCA btl: parameter "btl_openib_max_inline_data"
> (current value: "-1", data source: default value,
> synonyms:
> btl_openib_ib_max_inline_data)
> Maximum size of inline data segment (-1 =
> run-time probe to discover max value, otherwise must be >= 0). If not
> explicitly set, use max_inline_data from the
> INI file containing device-specific parameters
> MCA btl: parameter "btl_openib_ib_max_inline_data"
> (current value: "-1", data source: default value, deprecated, synonym of:
> btl_openib_max_inline_data)
> Maximum size of inline data segment (-1 =
> run-time probe to discover max value, otherwise must be >= 0). If not
> explicitly set, use max_inline_data from the
> INI file containing device-specific parameters
> MCA btl: parameter "btl_openib_pkey" (current value: "0", data source: default value, synonyms:
> btl_openib_ib_pkey_val)
> OpenFabrics partition key (pkey) value.
> Unsigned integer decimal or hex values are allowed (e.g., "3" or
> "0x3f") and
> will be masked against the maximum allowable
> IB paritition key value (0x7fff)
> MCA btl: parameter "btl_openib_ib_pkey_val" (current
> value: "0", data source: default value, deprecated, synonym of:
> btl_openib_pkey)
> OpenFabrics partition key (pkey) value.
> Unsigned integer decimal or hex values are allowed (e.g., "3" or
> "0x3f") and
> will be masked against the maximum allowable
> IB paritition key value (0x7fff)
> MCA btl: parameter "btl_openib_psn" (current value: "0", data source: default value, synonyms:
> btl_openib_ib_psn)
> OpenFabrics packet sequence starting number
> (must be >= 0)
> MCA btl: parameter "btl_openib_ib_psn" (current
> value: "0", data source: default value, deprecated, synonym of:
> btl_openib_psn)
> OpenFabrics packet sequence starting number
> (must be >= 0)
> MCA btl: parameter "btl_openib_ib_qp_ous_rd_atom"
> (current value: "4", data source: default value)
> InfiniBand outstanding atomic reads (must be
> >= 0)
> MCA btl: parameter "btl_openib_mtu" (current value: "3", data source: default value, synonyms:
> btl_openib_ib_mtu)
> OpenFabrics MTU, in bytes (if not specified
> in INI files). Valid values are: 1=256 bytes,
> 2=512 bytes, 3=1024 bytes,
> 4=2048 bytes, 5=4096 bytes
> MCA btl: parameter "btl_openib_ib_mtu" (current
> value: "3", data source: default value, deprecated, synonym of:
> btl_openib_mtu)
> OpenFabrics MTU, in bytes (if not specified
> in INI files). Valid values are: 1=256 bytes,
> 2=512 bytes, 3=1024 bytes,
> 4=2048 bytes, 5=4096 bytes
> MCA btl: parameter "btl_openib_ib_min_rnr_timer"
> (current value: "25", data source: default value)
> InfiniBand minimum "receiver not ready"
> timer, in seconds (must be >= 0 and <= 31)
> MCA btl: parameter "btl_openib_ib_timeout" (current
> value: "20", data source: default value)
> InfiniBand transmit timeout, plugged into
> formula: 4.096 microseconds * (2^btl_openib_ib_timeout)(must be >= 0
> and <=
> 31)
> MCA btl: parameter "btl_openib_ib_retry_count"
> (current value: "7", data source: default value)
> InfiniBand transmit retry count (must be >=
> 0 and <= 7)
> MCA btl: parameter "btl_openib_ib_rnr_retry" (current
> value: "7", data source: default value)
> InfiniBand "receiver not ready" retry count;
> applies *only* to SRQ/XRC queues. PP queues use RNR retry values of 0
> because Open MPI performs software flow
> control to guarantee that RNRs never occur (must be >= 0 and <= 7; 7 =
> "infinite")
> MCA btl: parameter "btl_openib_ib_max_rdma_dst_ops"
> (current value: "4", data source: default value)
> InfiniBand maximum pending RDMA destination
> operations (must be >= 0)
> MCA btl: parameter "btl_openib_ib_service_level"
> (current value: "0", data source: default value)
> InfiniBand service level (must be >= 0 and
> <= 15)
> MCA btl: parameter "btl_openib_use_eager_rdma"
> (current value: "-1", data source: default value)
> Use RDMA for eager messages (-1 = use device
> default, 0 = do not use eager RDMA, 1 = use eager
> RDMA)
> MCA btl: parameter "btl_openib_eager_rdma_threshold"
> (current value: "16", data source: default value)
> Use RDMA for short messages after this
> number of messages are received from a given peer (must be >= 1)
> MCA btl: parameter "btl_openib_max_eager_rdma"
> (current value: "16", data source: default value)
> Maximum number of peers allowed to use RDMA
> for short messages (RDMA is used for all long messages, except if
> explicitly disabled, such as with the "dr"
> pml) (must be >= 0)
> MCA btl: parameter "btl_openib_eager_rdma_num"
> (current value: "16", data source: default value)
> Number of RDMA buffers to allocate for small
> messages(must be >= 1)
> MCA btl: parameter "btl_openib_btls_per_lid" (current
> value: "1", data source: default value)
> Number of BTLs to create for each InfiniBand
> LID (must be >= 1)
> MCA btl: parameter "btl_openib_max_lmc" (current
> value: "0", data source: default value)
> Maximum number of LIDs to use for each
> device port (must be >= 0, where 0 = use all available)
> MCA btl: parameter "btl_openib_enable_apm_over_lmc"
> (current value: "0", data source: default value)
> Maximum number of alterative paths for each
> device port (must be >= -1, where 0 = disable apm,
> -1 = all availible
> alternative paths )
> MCA btl: parameter "btl_openib_enable_apm_over_ports"
> (current value: "0", data source: default value)
> Enable alterative path migration (APM) over
> different ports of the same device (must be >= 0, where 0 = disable
> APM
> over ports , 1 = enable APM over ports of
> the same device)
> MCA btl: parameter
> "btl_openib_use_async_event_thread" (current value: "1", data source:
> default value)
> If nonzero, use the thread that will handle
> InfiniBand asyncihronous events
> MCA btl: parameter "btl_openib_buffer_alignment"
> (current value: "64", data source: default value)
> Prefered communication buffer alignment, in
> bytes (must be > 0 and power of two)
> MCA btl: parameter
> "btl_openib_use_message_coalescing" (current value: "1", data source:
> default value)
> Use message coalescing
> MCA btl: parameter "btl_openib_cq_poll_ratio"
> (current value: "100", data source: default value)
> how often poll high priority CQ versus low
> priority CQ
> MCA btl: parameter "btl_openib_eager_rdma_poll_ratio"
> (current value: "100", data source: default
> value)
> how often poll eager RDMA channel versus CQ
> MCA btl: parameter
> "btl_openib_hp_cq_poll_per_progress" (current value: "10", data
> source: default
> value)
> max number of completion events to process
> for each call of BTL progress engine
> MCA btl: information "btl_openib_have_fork_support"
> (value: "1", data source: default value)
> Whether the OpenFabrics stack supports
> applications that invoke the "fork()" system call or not (0 = no, 1 = yes).
> Note that this value does NOT indicate
> whether the system being run on supports "fork()" with OpenFabrics
> applications
> or not.
> MCA btl: parameter "btl_openib_exclusivity" (current
> value: "1024", data source: default value)
> BTL exclusivity (must be >= 0)
> MCA btl: parameter "btl_openib_flags" (current value:
> "310", data source: default value)
> BTL bit flags (general flags: SEND=1, PUT=2,
> GET=4, SEND_INPLACE=8, RDMA_MATCHED=64, HETEROGENEOUS_RDMA=256; flags
> only used by the "dr" PML (ignored by
> others): ACK=16, CHECKSUM=32, RDMA_COMPLETION=128)
> MCA btl: parameter "btl_openib_rndv_eager_limit"
> (current value: "12288", data source: default value)
> Size (in bytes) of "phase 1" fragment sent
> for all large messages (must be >= 0 and <=
> eager_limit)
> MCA btl: parameter "btl_openib_eager_limit" (current
> value: "12288", data source: default value)
> Maximum size (in bytes) of "short" messages (must be >= 1).
> MCA btl: parameter "btl_openib_max_send_size"
> (current value: "65536", data source: default value)
> Maximum size (in bytes) of a single "phase
> 2" fragment of a long message when using the pipeline protocol (must
> be >=
> 1)
> MCA btl: parameter "btl_openib_rdma_pipeline_send_length" (current value: "1048576", data source:
> default value)
> Length of the "phase 2" portion of a large
> message (in bytes) when using the pipeline protocol. This part of the
> message will be split into fragments of size
> max_send_size and sent using send/receive semantics (must be >= 0;
> only
> relevant when the PUT flag is set)
> MCA btl: parameter
> "btl_openib_rdma_pipeline_frag_size" (current value: "1048576", data
> source: default
> value)
> Maximum size (in bytes) of a single "phase
> 3" fragment from a long message when using the pipeline protocol.
> These
> fragments will be sent using RDMA semantics
> (must be >= 1; only relevant when the PUT flag is
> set)
> MCA btl: parameter
> "btl_openib_min_rdma_pipeline_size" (current value: "262144", data
> source: default
> value)
> Messages smaller than this size (in bytes)
> will not use the RDMA pipeline protocol. Instead, they will be split
> into
> fragments of max_send_size and sent using
> send/receive semantics (must be >=0, and is automatically adjusted up
> to at
> least
> (eager_limit+btl_rdma_pipeline_send_length); only relevant when the
> PUT flag is set)
> MCA btl: parameter "btl_openib_bandwidth" (current
> value: "800", data source: default value)
> Approximate maximum bandwidth of
> interconnect(must be >= 1)
> MCA btl: parameter "btl_openib_latency" (current
> value: "10", data source: default value)
> Approximate latency of interconnect (must be
> >= 0)
> MCA btl: parameter "btl_openib_receive_queues" (current value:
> "P,128,256,192,128:S,2048,256,128,32:S,12288,256,128,32:S,65536,256,128,32", data source:
> default value)
> Colon-delimited, comma delimited list of
> receive queues: P,4096,8,6,4:P,32768,8,6,4
> MCA btl: parameter "btl_openib_if_include" (current
> value: <none>, data source: default value)
> Comma-delimited list of devices/ports to be
> used (e.g. "mthca0,mthca1:2"; empty value means to use all ports found).
> Mutually exclusive with btl_openib_if_exclude.
> MCA btl: parameter "btl_openib_if_exclude" (current
> value: <none>, data source: default value)
> Comma-delimited list of device/ports to be
> excluded (empty value means to not exclude any ports). Mutually
> exclusive
> with btl_openib_if_include.
> MCA btl: parameter "btl_openib_ipaddr_include"
> (current value: <none>, data source: default value)
> Comma-delimited list of IP Addresses to be
> used (e.g. "192.168.1.0/24"). Mutually exclusive with
> btl_openib_ipaddr_exclude.
> MCA btl: parameter "btl_openib_ipaddr_exclude"
> (current value: <none>, data source: default value)
> Comma-delimited list of IP Addresses to be
> excluded (e.g. "192.168.1.0/24"). Mutually exclusive with
> btl_openib_ipaddr_include.
> MCA btl: parameter "btl_openib_cpc_include" (current
> value: <none>, data source: default value)
> Method used to select OpenFabrics
> connections (valid values: oob,xoob,rdmacm)
> MCA btl: parameter "btl_openib_cpc_exclude" (current
> value: <none>, data source: default value)
> Method used to exclude OpenFabrics
> connections (valid values: oob,xoob,rdmacm)
> MCA btl: parameter "btl_openib_connect_oob_priority"
> (current value: "50", data source: default value)
> The selection method priority for oob
> MCA btl: parameter "btl_openib_connect_xoob_priority"
> (current value: "60", data source: default value)
> The selection method priority for xoob
> MCA btl: parameter
> "btl_openib_connect_rdmacm_priority" (current value: "30", data
> source: default
> value)
> The selection method priority for rdma_cm
> MCA btl: parameter "btl_openib_connect_rdmacm_port"
> (current value: "0", data source: default value)
> The selection method port for rdma_cm
> MCA btl: parameter "btl_openib_connect_rdmacm_resolve_timeout" (current value: "1000", data source:
> default value)
> The timeout (in miliseconds) for address and
> route resolution
> MCA btl: parameter
> "btl_openib_connect_rdmacm_retry_count" (current value: "20", data
> source: default
> value)
> Maximum number of times rdmacm will retry
> route resolution
> MCA btl: parameter
> "btl_openib_connect_rdmacm_reject_causes_connect_error" (current
> value: "0", data
> source: default value)
> The drivers for some devices are buggy such
> that an RDMA REJECT action may result in a CONNECT_ERROR event instead
> of
> a REJECTED event. Setting this MCA
> parameter to true tells Open MPI to treat CONNECT_ERROR events on
> connections
> where a REJECT is expected as a REJECT
> (default: false)
> MCA btl: parameter "btl_openib_priority" (current
> value: "0", data source: default value)
> MCA btl: parameter "btl_base_warn_component_unused"
> (current value: "1", data source: default value)
> This parameter is used to turn on warning
> messages when certain NICs are not used
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
>
>
>
>