Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] Fwd: [OMPI users] OpenMPI with openib partitions
From: Pavel Shamis (Pasha) (pasha_at_[hidden])
Date: 2008-10-07 13:29:30


Matt,
For all 1.2.X versions you should use btl_openib_ib_pkey_val
In ongoing 1.3 version the parameter was renamed to btl_openib_of_pkey_val.

BTW we plan to release 1.2.8 version very soon and it will include the
partition bug fix.

Regards,
Pasha

Matt Burgess wrote:
> Pasha,
>
> With your patch and parameter suggestion, it works! So to be clear
> btl_openib_ib_pkey_val is for 1.2.6 and btl_openib_of_pkey_val is for
> 1.2.7?
>
> Thanks again,
> Matt
>
> On Tue, Oct 7, 2008 at 12:24 PM, Pavel Shamis (Pasha)
> <pasha_at_[hidden] <mailto:pasha_at_[hidden]>> wrote:
>
> Matt,
> Can you please run " cat
> /sys/class/infiniband/mlx4_0/ports/1/pkeys/* " on your d2-ib,d3-ib.
> I would like to check the partition configuration.
>
> Ohh, BTW I see that the command line in previous email was wrong,
> Please use follow command line (the parameter name should be
> "btl_openib_ib_pkey_val" for ompi-1.2.6 and my patch accepts
> HEX/DEC values):
> /opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -H d2-ib,d3-ib -mca btl
> openib,self -mca btl_openib_ib_pkey_val 0x8109
> /cluster/pallas/x86_64-ib/IMB-MPI1
>
> Ompi 1.2.6 version should work ok with this patch.
>
>
> Thanks,
> Pasha
>
> Matt Burgess wrote:
>
> Pasha,
>
> Thanks for the patch. Unfortunately, it doesn't seem like that
> fixed the problem. I realized earlier I didn't mention what
> version of OpenMPI I was trying - it's 1.2.6. <http://1.2.6.>
> <http://1.2.6.> Should I be trying 1.2.7 with this patch?
>
> Thanks,
> Matt
>
> 2008/10/7 Pavel Shamis (Pasha) <pasha_at_[hidden]
> <mailto:pasha_at_[hidden]>
> <mailto:pasha_at_[hidden]
> <mailto:pasha_at_[hidden]>>>
>
>
> Matt,
> Can you please try attached patch ? I guess it will resolve
> this
> issue.
>
> Thanks,
> Pasha
>
> Matt Burgess wrote:
>
> Lenny,
>
> Thanks for the info. It doesn't seem to be be working
> still.
> My command line is:
>
> /opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -H d2-ib,d3-ib
> -mca btl
> openib,self -mca btl_openib_of_pkey_val 33033
> /cluster/pallas/x86_64-ib/IMB-MPI1
>
> I don't have a
> "/sys/class/infiniband/mthca0/ports/1/pkeys/"
> but I do have
> "/sys/class/infiniband/mlx4_0/ports/1/pkeys/".
> It's contents are:
>
> 0 106 114 122 16 24 32 40 49 57 65
> 73 81
> 9 98
> 1 107 115 123 17 25 33 41 5 58 66
> 74 82
> 90 99
> 10 108 116 124 18 26 34 42 50 59 67
> 75 83
> 91 100 109 117 125 19 27 35 43 51 6
> 68 76 84 92 101 11 118 126 2 28 36
> 44 52 60
> 69 77 85 93 102 110 119 127 20 29 37
> 45 53 61 7 78 86 94 103 111 12 13
> 21 3 38
> 46 54 62 70 79 87 95 104 112 120 14
> 22 30 39 47 55 63 71 8 88 96 105
> 113 121 15
> 23 31 4 48 56 64 72 80 89 97
> We aren't using the opensm, but voltaire's SM on a 2012
> switch.
>
> Thanks again,
> Matt
>
>
> On Tue, Oct 7, 2008 at 9:37 AM, Lenny Verkhovsky
> <lenny.verkhovsky_at_[hidden]
> <mailto:lenny.verkhovsky_at_[hidden]>
> <mailto:lenny.verkhovsky_at_[hidden]
> <mailto:lenny.verkhovsky_at_[hidden]>>
> <mailto:lenny.verkhovsky_at_[hidden]
> <mailto:lenny.verkhovsky_at_[hidden]>
> <mailto:lenny.verkhovsky_at_[hidden]
> <mailto:lenny.verkhovsky_at_[hidden]>>>> wrote:
>
> Hi Matt,
>
> It seems that the right way to do it is the fallowing:
>
> -mca btl openib,self -mca btl_openib_ib_pkey_val 33033
>
> when the value is a decimal number of the pkey, in
> your case
> 0x8109 = 33033, and no need for
> btl_openib_ib_pkey_ix value.
>
> ex.
> mpirun -np 2 -H witch2,witch3 -mca btl openib,self -mca
> btl_openib_ib_pkey_val 32769 ./mpi_p1_4_1_2 -t lt
> LT (2) (size min max avg) 1 3.511429 3.511429 3.511429
>
> if it's not working check cat
> /sys/class/infiniband/mthca0/ports/1/pkeys/* for
> pkeys ans SM,
> maybe it's a setup.
>
> Pasha is currently checking this issue.
>
> Best regards,
>
> Lenny.
>
>
>
>
>
> On 10/7/08, *Jeff Squyres* <jsquyres_at_[hidden]
> <mailto:jsquyres_at_[hidden]>
> <mailto:jsquyres_at_[hidden] <mailto:jsquyres_at_[hidden]>>
> <mailto:jsquyres_at_[hidden]
> <mailto:jsquyres_at_[hidden]> <mailto:jsquyres_at_[hidden]
> <mailto:jsquyres_at_[hidden]>>>> wrote:
>
> FWIW, if this configuration is for all of your
> users, you
> might want to specify these MCA params in the
> default MCA
> param file, or the environment, ...etc. Just so
> that you
> don't have to specify it on every mpirun command
> line.
>
> See
>
> http://www.open-mpi.org/faq/?category=tuning#setting-mca-params.
>
>
>
> On Oct 7, 2008, at 5:43 AM, Lenny Verkhovsky wrote:
>
> Sorry, misunderstood the question,
>
> thanks for Pasha the right command line will be
>
> -mca btl openib,self -mca
> btl_openib_of_pkey_val 0x8109
> -mca btl_openib_of_pkey_ix 1
>
> ex.
>
> #mpirun -np 2 -H witch2,witch3 -mca btl
> openib,self
> -mca
> btl_openib_of_pkey_val 0x8001 -mca
> btl_openib_of_pkey_ix 1
> ./mpi_p1_4_TRUNK -t lt
> LT (2) (size min max avg) 1 3.443480
> 3.443480 3.443480
>
>
> Best regards
>
> Lenny.
>
>
> On 10/6/08, Jeff Squyres <jsquyres_at_[hidden]
> <mailto:jsquyres_at_[hidden]>
> <mailto:jsquyres_at_[hidden] <mailto:jsquyres_at_[hidden]>>
> <mailto:jsquyres_at_[hidden]
> <mailto:jsquyres_at_[hidden]>
>
> <mailto:jsquyres_at_[hidden]
> <mailto:jsquyres_at_[hidden]>>>> wrote: On Oct 5, 2008, at
>
> 1:22 PM, Lenny Verkhovsky wrote:
>
> you should probably use -mca tcp,self -mca
> btl_openib_if_include ib0.8109
>
>
> Really? I thought we only took OpenFabrics
> device
> names
> in the openib_if_include MCA param...? It
> looks like
> ib0.8109 is an IPoIB device name.
>
>
>
> Lenny.
>
>
>
> On 10/3/08, Matt Burgess
> <burgess.matt_at_[hidden] <mailto:burgess.matt_at_[hidden]>
> <mailto:burgess.matt_at_[hidden]
> <mailto:burgess.matt_at_[hidden]>>
> <mailto:burgess.matt_at_[hidden]
> <mailto:burgess.matt_at_[hidden]>
> <mailto:burgess.matt_at_[hidden]
> <mailto:burgess.matt_at_[hidden]>>>> wrote:
> Hi,
>
>
> I'm trying to get openmpi working over openib
> partitions.
> On this cluster, the partition number is
> 0x109. The ib
> interfaces are pingable over the appropriate
> ib0.8109
> interface:
>
> d2:/opt/openmpi-ib # ifconfig ib0.8109
> ib0.8109 Link encap:UNSPEC HWaddr
> 80-00-00-4A-FE-80-00-00-00-00-00-00-00-00-00-00
> inet addr:10.21.48.2
> <http://10.21.48.2> <http://10.21.48.2>
> <http://10.21.48.2>
> Bcast:10.21.255.255 <http://10.21.255.255>
> <http://10.21.255.255>
> <http://10.21.255.255>
> Mask:255.255.0.0 <http://255.255.0.0>
> <http://255.255.0.0>
> <http://255.255.0.0>
>
> inet6 addr: fe80::202:c902:26:ca01/64
> Scope:Link
> UP BROADCAST RUNNING MULTICAST
> MTU:65520
> Metric:1
> RX packets:16811 errors:0 dropped:0
> overruns:0 frame:0
> TX packets:15848 errors:0 dropped:1
> overruns:0
> carrier:0
> collisions:0 txqueuelen:256
> RX bytes:102229428 (97.4 Mb) TX
> bytes:102324172
> (97.5 Mb)
>
>
> I have tried the following:
>
> /opt/openmpi-ib/1.2.6/bin/mpirun -np 2
> -machinefile
> machinefile -mca btl openib,self -mca
> btl_openib_max_btls
> 1 -mca btl_openib_ib_pkey_val 0x8109 -mca
> btl_openib_ib_pkey_ix 1
> /cluster/pallas/x86_64-ib/IMB-MPI1
>
> but I just get a RETRY EXCEEDED ERROR. Is
> there a MCA
> parameter I am missing?
>
> I was successful using tcp only:
>
> /opt/openmpi-ib/1.2.6/bin/mpirun -np 2
> -machinefile
> machinefile -mca btl tcp,self -mca
> btl_openib_max_btls 1
> -mca btl_openib_ib_pkey_val 0x8109
> /cluster/pallas/x86_64-ib/IMB-MPI1
>
>
>
> Thanks,
> Matt Burgess
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> <mailto:users_at_[hidden]> <mailto:users_at_[hidden]
> <mailto:users_at_[hidden]>>
> <mailto:users_at_[hidden] <mailto:users_at_[hidden]>
> <mailto:users_at_[hidden] <mailto:users_at_[hidden]>>>
>
>
>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> <mailto:users_at_[hidden]> <mailto:users_at_[hidden]
> <mailto:users_at_[hidden]>>
> <mailto:users_at_[hidden] <mailto:users_at_[hidden]>
> <mailto:users_at_[hidden] <mailto:users_at_[hidden]>>>
>
>
>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> -- Jeff Squyres
> Cisco Systems
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> <mailto:users_at_[hidden]> <mailto:users_at_[hidden]
> <mailto:users_at_[hidden]>>
> <mailto:users_at_[hidden] <mailto:users_at_[hidden]>
> <mailto:users_at_[hidden] <mailto:users_at_[hidden]>>>
>
>
>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> -- Jeff Squyres
> Cisco Systems
>
>
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden] <mailto:devel_at_[hidden]>
> <mailto:devel_at_[hidden] <mailto:devel_at_[hidden]>>
>
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
> -- --
> Pavel Shamis (Pasha)
> Mellanox Technologies LTD.
>
>
> Index: ompi/mca/btl/openib/btl_openib_component.c
>
> ===================================================================
> --- ompi/mca/btl/openib/btl_openib_component.c (revision
> 19490)
> +++ ompi/mca/btl/openib/btl_openib_component.c (working copy)
> @@ -558,7 +558,7 @@ static int init_one_hca(opal_list_t *btl
> goto dealloc_pd;
> }
>
> - ret = OMPI_SUCCESS;
> + ret = OMPI_SUCCESS;
> /* Note ports are 1 based hence j = 1 */
> for(i = 1; i <= hca->ib_dev_attr.phys_port_cnt; i++){
> struct ibv_port_attr ib_port_attr;
> @@ -580,7 +580,7 @@ static int init_one_hca(opal_list_t *btl
> uint16_t pkey,j;
> for (j=0; j < hca->ib_dev_attr.max_pkeys;
> j++) {
> ibv_query_pkey(hca->ib_dev_context, i,
> j, &pkey);
> - pkey=ntohs(pkey);
> + pkey=ntohs(pkey) & 0x7fff;
> if(pkey ==
> mca_btl_openib_component.ib_pkey_val){
> ret = init_one_port(btl_list, hca,
> i, j,
> &ib_port_attr);
> break;
> Index: ompi/mca/btl/openib/btl_openib_ini.c
>
> ===================================================================
> --- ompi/mca/btl/openib/btl_openib_ini.c (revision
> 19490)
> +++ ompi/mca/btl/openib/btl_openib_ini.c (working copy)
> @@ -90,8 +90,6 @@ static int parse_line(parsed_section_val
> static void reset_section(bool had_previous_value,
> parsed_section_values_t *s);
> static void reset_values(ompi_btl_openib_ini_values_t *v);
> static int save_section(parsed_section_values_t *s);
> -static int intify(char *string);
> -static int intify_list(char *str, uint32_t **values, int
> *len);
> static inline void show_help(const char *topic);
>
>
> @@ -364,14 +362,14 @@ static int parse_line(parsed_section_val
> all whitespace at the beginning and ending of the
> value. */
>
> if (0 == strcasecmp(key_buffer, "vendor_id")) {
> - if (OMPI_SUCCESS != (ret = intify_list(value,
> &sv->vendor_ids,
> + if (OMPI_SUCCESS != (ret =
> ompi_btl_openib_ini_intify_list(value, &sv->vendor_ids,
>
> &sv->vendor_ids_len))) {
> return ret;
> }
> }
>
> else if (0 == strcasecmp(key_buffer, "vendor_part_id")) {
> - if (OMPI_SUCCESS != (ret = intify_list(value,
> &sv->vendor_part_ids,
> + if (OMPI_SUCCESS != (ret =
> ompi_btl_openib_ini_intify_list(value, &sv->vendor_part_ids,
>
> &sv->vendor_part_ids_len))) {
> return ret;
> }
> @@ -379,13 +377,13 @@ static int parse_line(parsed_section_val
>
> else if (0 == strcasecmp(key_buffer, "mtu")) {
> /* Single value */
> - sv->values.mtu = (uint32_t) intify(value);
> + sv->values.mtu = (uint32_t)
> ompi_btl_openib_ini_intify(value);
> sv->values.mtu_set = true;
> }
>
> else if (0 == strcasecmp(key_buffer, "use_eager_rdma")) {
> /* Single value */
> - sv->values.use_eager_rdma = (uint32_t) intify(value);
> + sv->values.use_eager_rdma = (uint32_t)
> ompi_btl_openib_ini_intify(value);
> sv->values.use_eager_rdma_set = true;
> }
>
> @@ -547,7 +545,7 @@ static int save_section(parsed_section_v
> /*
> * Do string-to-integer conversion, for both hex and
> decimal numbers
> */
> -static int intify(char *str)
> +int ompi_btl_openib_ini_intify(char *str)
> {
> while (isspace(*str)) {
> ++str;
> @@ -568,7 +566,7 @@ static int intify(char *str)
> /*
> * Take a comma-delimited list and infity them all
> */
> -static int intify_list(char *value, uint32_t **values, int
> *len)
> +int ompi_btl_openib_ini_intify_list(char *value, uint32_t
> **values, int *len)
> {
> char *comma;
> char *str = value;
> @@ -584,7 +582,7 @@ static int intify_list(char *value, uint
> if (NULL == *values) {
> return OMPI_ERR_OUT_OF_RESOURCE;
> }
> - *values[0] = (uint32_t) intify(str);
> + *values[0] = (uint32_t)
> ompi_btl_openib_ini_intify(str);
> *len = 1;
> } else {
> /* If we found a comma, loop over all the values. Be a
> @@ -594,7 +592,7 @@ static int intify_list(char *value, uint
> do {
> *comma = '\0';
> *values = realloc(*values, sizeof(uint32_t) *
> (*len + 2));
> - (*values)[*len] = (int32_t) intify(str);
> + (*values)[*len] = (int32_t)
> ompi_btl_openib_ini_intify(str);
> ++(*len);
> str = comma + 1;
> comma = strchr(str, ',');
> @@ -602,7 +600,7 @@ static int intify_list(char *value, uint
> /* Get the last value (i.e., the value after the last
> comma, because it won't have been snarfed in the
> loop) */
> - (*values)[*len] = (uint32_t) intify(str);
> + (*values)[*len] = (uint32_t)
> ompi_btl_openib_ini_intify(str);
> ++(*len);
> }
>
> Index: ompi/mca/btl/openib/btl_openib_ini.h
>
> ===================================================================
> --- ompi/mca/btl/openib/btl_openib_ini.h (revision
> 19490)
> +++ ompi/mca/btl/openib/btl_openib_ini.h (working copy)
> @@ -49,6 +49,9 @@ extern "C" {
> */
> int ompi_btl_openib_ini_finalize(void);
>
> + int ompi_btl_openib_ini_intify(char *string);
> + int ompi_btl_openib_ini_intify_list(char *str, uint32_t
> **values, int *len);
> +
> #if defined(c_plusplus) || defined(__cplusplus)
> }
> #endif
> Index: ompi/mca/btl/openib/btl_openib_mca.c
>
> ===================================================================
> --- ompi/mca/btl/openib/btl_openib_mca.c (revision
> 19490)
> +++ ompi/mca/btl/openib/btl_openib_mca.c (working copy)
> @@ -27,6 +27,7 @@
> #include "opal/mca/base/mca_base_param.h"
> #include "btl_openib.h"
> #include "btl_openib_mca.h"
> +#include "btl_openib_ini.h"
>
> /*
> * Local flags
> @@ -97,7 +98,7 @@ static inline int reg_int(const char* pa
> */
> int btl_openib_register_mca_params(void)
> {
> - char *msg, *str;
> + char *msg, *str, *pkey;
> int ival, ival2, ret, tmp;
>
> ret = OMPI_SUCCESS;
> @@ -192,13 +193,15 @@ int btl_openib_register_mca_params(void)
> 0, &ival, REGINT_GE_ZERO));
> mca_btl_openib_component.ib_pkey_ix = (uint32_t) ival;
>
> - CHECK(reg_int("ib_pkey_val", "InfiniBand pkey value"
> + CHECK(reg_string("ib_pkey_val", "InfiniBand pkey value"
> "(must be > 0 and < 0xffff)",
> - 0, &ival, REGINT_GE_ZERO));
> - if (ival > 0xffff) {
> + "0", &pkey, 0));
> + mca_btl_openib_component.ib_pkey_val =
> ompi_btl_openib_ini_intify(pkey) & 0x7fff;
> + if (mca_btl_openib_component.ib_pkey_val > 0xffff ||
> + mca_btl_openib_component.ib_pkey_val < 0) {
> ret = OMPI_ERR_BAD_PARAM;
> }
> - mca_btl_openib_component.ib_pkey_val = (uint32_t) ival;
> + free(pkey);
>
> CHECK(reg_int("ib_psn", "InfiniBand packet sequence
> starting
> number "
> "(must be >= 0)",
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden] <mailto:devel_at_[hidden]>
> <mailto:devel_at_[hidden] <mailto:devel_at_[hidden]>>
>
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
>
> --
> --
> Pavel Shamis (Pasha)
> Mellanox Technologies LTD.
>
>

-- 
--
Pavel Shamis (Pasha)
Mellanox Technologies LTD.