Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Fwd: [OMPI users] OpenMPI with openib partitions
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-10-07 14:17:59


"Soon."

:-)

On Oct 7, 2008, at 1:30 PM, Matt Burgess wrote:

> Pasha,
>
> That's great, thanks for the help. When exactly do you expect that
> 1.2.8 will be released?
>
> Thanks,
> Matt
>
> On Tue, Oct 7, 2008 at 1:29 PM, Pavel Shamis (Pasha) <pasha_at_[hidden]
> > wrote:
> Matt,
> For all 1.2.X versions you should use btl_openib_ib_pkey_val
> In ongoing 1.3 version the parameter was renamed to
> btl_openib_of_pkey_val.
>
> BTW we plan to release 1.2.8 version very soon and it will include
> the partition bug fix.
>
> Regards,
> Pasha
>
> Matt Burgess wrote:
> Pasha,
>
> With your patch and parameter suggestion, it works! So to be clear
> btl_openib_ib_pkey_val is for 1.2.6 and btl_openib_of_pkey_val is
> for 1.2.7?
>
> Thanks again,
> Matt
>
> On Tue, Oct 7, 2008 at 12:24 PM, Pavel Shamis (Pasha) <pasha_at_[hidden]
> <mailto:pasha_at_[hidden]>> wrote:
>
> Matt,
> Can you please run " cat
> /sys/class/infiniband/mlx4_0/ports/1/pkeys/* " on your d2-ib,d3-ib.
> I would like to check the partition configuration.
>
> Ohh, BTW I see that the command line in previous email was wrong,
> Please use follow command line (the parameter name should be
> "btl_openib_ib_pkey_val" for ompi-1.2.6 and my patch accepts
> HEX/DEC values):
> /opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -H d2-ib,d3-ib -mca btl
> openib,self -mca btl_openib_ib_pkey_val 0x8109
> /cluster/pallas/x86_64-ib/IMB-MPI1
>
> Ompi 1.2.6 version should work ok with this patch.
>
>
> Thanks,
> Pasha
>
> Matt Burgess wrote:
>
> Pasha,
>
> Thanks for the patch. Unfortunately, it doesn't seem like that
> fixed the problem. I realized earlier I didn't mention what
> version of OpenMPI I was trying - it's 1.2.6. <http://1.2.6.>
> <http://1.2.6.> Should I be trying 1.2.7 with this patch?
>
> Thanks,
> Matt
>
> 2008/10/7 Pavel Shamis (Pasha) <pasha_at_[hidden]
> <mailto:pasha_at_[hidden]>
> <mailto:pasha_at_[hidden]
> <mailto:pasha_at_[hidden]>>>
>
>
> Matt,
> Can you please try attached patch ? I guess it will resolve
> this
> issue.
>
> Thanks,
> Pasha
>
> Matt Burgess wrote:
>
> Lenny,
>
> Thanks for the info. It doesn't seem to be be working
> still.
> My command line is:
>
> /opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -H d2-ib,d3-ib
> -mca btl
> openib,self -mca btl_openib_of_pkey_val 33033
> /cluster/pallas/x86_64-ib/IMB-MPI1
>
> I don't have a
> "/sys/class/infiniband/mthca0/ports/1/pkeys/"
> but I do have
> "/sys/class/infiniband/mlx4_0/ports/1/pkeys/".
> It's contents are:
>
> 0 106 114 122 16 24 32 40 49 57
> 65 73 81
> 9 98
> 1 107 115 123 17 25 33 41 5 58
> 66 74 82
> 90 99
> 10 108 116 124 18 26 34 42 50 59
> 67 75 83
> 91 100 109 117 125 19 27 35 43 51
> 6 68 76 84 92 101 11 118 126 2 28
> 36 44 52 60
> 69 77 85 93 102 110 119 127 20 29
> 37 45 53 61 7 78 86 94 103 111 12
> 13 21 3 38
> 46 54 62 70 79 87 95 104 112 120
> 14 22 30 39 47 55 63 71 8 88 96
> 105
> 113 121 15
> 23 31 4 48 56 64 72 80 89 97
> We aren't using the opensm, but voltaire's SM on a 2012
> switch.
>
> Thanks again,
> Matt
>
>
> On Tue, Oct 7, 2008 at 9:37 AM, Lenny Verkhovsky
> <lenny.verkhovsky_at_[hidden]
> <mailto:lenny.verkhovsky_at_[hidden]>
> <mailto:lenny.verkhovsky_at_[hidden]
> <mailto:lenny.verkhovsky_at_[hidden]>>
> <mailto:lenny.verkhovsky_at_[hidden]
> <mailto:lenny.verkhovsky_at_[hidden]>
> <mailto:lenny.verkhovsky_at_[hidden]
> <mailto:lenny.verkhovsky_at_[hidden]>>>> wrote:
>
> Hi Matt,
>
> It seems that the right way to do it is the
> fallowing:
>
> -mca btl openib,self -mca btl_openib_ib_pkey_val
> 33033
>
> when the value is a decimal number of the pkey, in
> your case
> 0x8109 = 33033, and no need for
> btl_openib_ib_pkey_ix value.
>
> ex.
> mpirun -np 2 -H witch2,witch3 -mca btl openib,self -
> mca
> btl_openib_ib_pkey_val 32769 ./mpi_p1_4_1_2 -t lt
> LT (2) (size min max avg) 1 3.511429 3.511429
> 3.511429
>
> if it's not working check cat
> /sys/class/infiniband/mthca0/ports/1/pkeys/* for
> pkeys ans SM,
> maybe it's a setup.
>
> Pasha is currently checking this issue.
>
> Best regards,
>
> Lenny.
>
>
>
>
>
> On 10/7/08, *Jeff Squyres* <jsquyres_at_[hidden]
> <mailto:jsquyres_at_[hidden]>
> <mailto:jsquyres_at_[hidden] <mailto:jsquyres_at_[hidden]>>
> <mailto:jsquyres_at_[hidden]
> <mailto:jsquyres_at_[hidden]> <mailto:jsquyres_at_[hidden]
> <mailto:jsquyres_at_[hidden]>>>> wrote:
>
> FWIW, if this configuration is for all of your
> users, you
> might want to specify these MCA params in the
> default MCA
> param file, or the environment, ...etc. Just so
> that you
> don't have to specify it on every mpirun command
> line.
>
> See
> http://www.open-mpi.org/faq/?category=tuning#setting-mca-params
> .
>
>
>
> On Oct 7, 2008, at 5:43 AM, Lenny Verkhovsky
> wrote:
>
> Sorry, misunderstood the question,
>
> thanks for Pasha the right command line
> will be
>
> -mca btl openib,self -mca
> btl_openib_of_pkey_val 0x8109
> -mca btl_openib_of_pkey_ix 1
>
> ex.
>
> #mpirun -np 2 -H witch2,witch3 -mca btl
> openib,self
> -mca
> btl_openib_of_pkey_val 0x8001 -mca
> btl_openib_of_pkey_ix 1
> ./mpi_p1_4_TRUNK -t lt
> LT (2) (size min max avg) 1 3.443480
> 3.443480 3.443480
>
>
> Best regards
>
> Lenny.
>
>
> On 10/6/08, Jeff Squyres <jsquyres_at_[hidden]
> <mailto:jsquyres_at_[hidden]>
> <mailto:jsquyres_at_[hidden] <mailto:jsquyres_at_[hidden]>>
> <mailto:jsquyres_at_[hidden]
> <mailto:jsquyres_at_[hidden]>
>
> <mailto:jsquyres_at_[hidden]
> <mailto:jsquyres_at_[hidden]>>>> wrote: On Oct 5, 2008, at
>
> 1:22 PM, Lenny Verkhovsky wrote:
>
> you should probably use -mca tcp,self -mca
> btl_openib_if_include ib0.8109
>
>
> Really? I thought we only took OpenFabrics
> device
> names
> in the openib_if_include MCA param...? It
> looks like
> ib0.8109 is an IPoIB device name.
>
>
>
> Lenny.
>
>
>
> On 10/3/08, Matt Burgess
> <burgess.matt_at_[hidden] <mailto:burgess.matt_at_[hidden]>
> <mailto:burgess.matt_at_[hidden]
> <mailto:burgess.matt_at_[hidden]>>
> <mailto:burgess.matt_at_[hidden]
> <mailto:burgess.matt_at_[hidden]>
> <mailto:burgess.matt_at_[hidden]
> <mailto:burgess.matt_at_[hidden]>>>> wrote:
> Hi,
>
>
> I'm trying to get openmpi working over openib
> partitions.
> On this cluster, the partition number is
> 0x109. The ib
> interfaces are pingable over the appropriate
> ib0.8109
> interface:
>
> d2:/opt/openmpi-ib # ifconfig ib0.8109
> ib0.8109 Link encap:UNSPEC HWaddr
> 80-00-00-4A-
> FE-80-00-00-00-00-00-00-00-00-00-00
> inet addr:10.21.48.2
> <http://10.21.48.2> <http://10.21.48.2>
> <http://10.21.48.2>
> Bcast:10.21.255.255 <http://10.21.255.255>
> <http://10.21.255.255>
> <http://10.21.255.255>
> Mask:255.255.0.0 <http://255.255.0.0>
> <http://255.255.0.0>
> <http://255.255.0.0>
>
> inet6 addr: fe80::202:c902:26:ca01/64
> Scope:Link
> UP BROADCAST RUNNING MULTICAST
> MTU:65520
> Metric:1
> RX packets:16811 errors:0 dropped:0
> overruns:0 frame:0
> TX packets:15848 errors:0 dropped:1
> overruns:0
> carrier:0
> collisions:0 txqueuelen:256
> RX bytes:102229428 (97.4 Mb) TX
> bytes:102324172
> (97.5 Mb)
>
>
> I have tried the following:
>
> /opt/openmpi-ib/1.2.6/bin/mpirun -np 2
> -machinefile
> machinefile -mca btl openib,self -mca
> btl_openib_max_btls
> 1 -mca btl_openib_ib_pkey_val 0x8109 -mca
> btl_openib_ib_pkey_ix 1
> /cluster/pallas/x86_64-ib/IMB-MPI1
>
> but I just get a RETRY EXCEEDED ERROR. Is
> there a MCA
> parameter I am missing?
>
> I was successful using tcp only:
>
> /opt/openmpi-ib/1.2.6/bin/mpirun -np 2
> -machinefile
> machinefile -mca btl tcp,self -mca
> btl_openib_max_btls 1
> -mca btl_openib_ib_pkey_val 0x8109
> /cluster/pallas/x86_64-ib/IMB-MPI1
>
>
>
> Thanks,
> Matt Burgess
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> <mailto:users_at_[hidden]> <mailto:users_at_[hidden]
> <mailto:users_at_[hidden]>>
> <mailto:users_at_[hidden] <mailto:users_at_[hidden]>
> <mailto:users_at_[hidden] <mailto:users_at_[hidden]>>>
>
>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> <mailto:users_at_[hidden]> <mailto:users_at_[hidden]
> <mailto:users_at_[hidden]>>
> <mailto:users_at_[hidden] <mailto:users_at_[hidden]>
> <mailto:users_at_[hidden] <mailto:users_at_[hidden]>>>
>
>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> -- Jeff Squyres
> Cisco Systems
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> <mailto:users_at_[hidden]> <mailto:users_at_[hidden]
> <mailto:users_at_[hidden]>>
> <mailto:users_at_[hidden] <mailto:users_at_[hidden]>
> <mailto:users_at_[hidden] <mailto:users_at_[hidden]>>>
>
>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> -- Jeff Squyres
> Cisco Systems
>
>
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden] <mailto:devel_at_[hidden]>
> <mailto:devel_at_[hidden] <mailto:devel_at_[hidden]>>
>
>
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
> -- --
> Pavel Shamis (Pasha)
> Mellanox Technologies LTD.
>
>
> Index: ompi/mca/btl/openib/btl_openib_component.c
>
> ===================================================================
> --- ompi/mca/btl/openib/btl_openib_component.c (revision
> 19490)
> +++ ompi/mca/btl/openib/btl_openib_component.c (working
> copy)
> @@ -558,7 +558,7 @@ static int init_one_hca(opal_list_t *btl
> goto dealloc_pd;
> }
>
> - ret = OMPI_SUCCESS;
> + ret = OMPI_SUCCESS;
> /* Note ports are 1 based hence j = 1 */
> for(i = 1; i <= hca->ib_dev_attr.phys_port_cnt; i++){
> struct ibv_port_attr ib_port_attr;
> @@ -580,7 +580,7 @@ static int init_one_hca(opal_list_t *btl
> uint16_t pkey,j;
> for (j=0; j < hca->ib_dev_attr.max_pkeys;
> j++) {
> ibv_query_pkey(hca->ib_dev_context, i,
> j, &pkey);
> - pkey=ntohs(pkey);
> + pkey=ntohs(pkey) & 0x7fff;
> if(pkey ==
> mca_btl_openib_component.ib_pkey_val){
> ret = init_one_port(btl_list, hca,
> i, j,
> &ib_port_attr);
> break;
> Index: ompi/mca/btl/openib/btl_openib_ini.c
>
> ===================================================================
> --- ompi/mca/btl/openib/btl_openib_ini.c (revision
> 19490)
> +++ ompi/mca/btl/openib/btl_openib_ini.c (working
> copy)
> @@ -90,8 +90,6 @@ static int parse_line(parsed_section_val
> static void reset_section(bool had_previous_value,
> parsed_section_values_t *s);
> static void reset_values(ompi_btl_openib_ini_values_t *v);
> static int save_section(parsed_section_values_t *s);
> -static int intify(char *string);
> -static int intify_list(char *str, uint32_t **values, int
> *len);
> static inline void show_help(const char *topic);
>
>
> @@ -364,14 +362,14 @@ static int
> parse_line(parsed_section_val
> all whitespace at the beginning and ending of the
> value. */
>
> if (0 == strcasecmp(key_buffer, "vendor_id")) {
> - if (OMPI_SUCCESS != (ret = intify_list(value,
> &sv->vendor_ids,
> + if (OMPI_SUCCESS != (ret =
> ompi_btl_openib_ini_intify_list(value, &sv->vendor_ids,
> &sv-
> >vendor_ids_len))) {
> return ret;
> }
> }
>
> else if (0 == strcasecmp(key_buffer,
> "vendor_part_id")) {
> - if (OMPI_SUCCESS != (ret = intify_list(value,
> &sv->vendor_part_ids,
> + if (OMPI_SUCCESS != (ret =
> ompi_btl_openib_ini_intify_list(value, &sv->vendor_part_ids,
> &sv-
> >vendor_part_ids_len))) {
> return ret;
> }
> @@ -379,13 +377,13 @@ static int
> parse_line(parsed_section_val
>
> else if (0 == strcasecmp(key_buffer, "mtu")) {
> /* Single value */
> - sv->values.mtu = (uint32_t) intify(value);
> + sv->values.mtu = (uint32_t)
> ompi_btl_openib_ini_intify(value);
> sv->values.mtu_set = true;
> }
>
> else if (0 == strcasecmp(key_buffer,
> "use_eager_rdma")) {
> /* Single value */
> - sv->values.use_eager_rdma = (uint32_t)
> intify(value);
> + sv->values.use_eager_rdma = (uint32_t)
> ompi_btl_openib_ini_intify(value);
> sv->values.use_eager_rdma_set = true;
> }
>
> @@ -547,7 +545,7 @@ static int save_section(parsed_section_v
> /*
> * Do string-to-integer conversion, for both hex and
> decimal numbers
> */
> -static int intify(char *str)
> +int ompi_btl_openib_ini_intify(char *str)
> {
> while (isspace(*str)) {
> ++str;
> @@ -568,7 +566,7 @@ static int intify(char *str)
> /*
> * Take a comma-delimited list and infity them all
> */
> -static int intify_list(char *value, uint32_t **values, int
> *len)
> +int ompi_btl_openib_ini_intify_list(char *value, uint32_t
> **values, int *len)
> {
> char *comma;
> char *str = value;
> @@ -584,7 +582,7 @@ static int intify_list(char *value, uint
> if (NULL == *values) {
> return OMPI_ERR_OUT_OF_RESOURCE;
> }
> - *values[0] = (uint32_t) intify(str);
> + *values[0] = (uint32_t)
> ompi_btl_openib_ini_intify(str);
> *len = 1;
> } else {
> /* If we found a comma, loop over all the values.
> Be a
> @@ -594,7 +592,7 @@ static int intify_list(char *value, uint
> do {
> *comma = '\0';
> *values = realloc(*values, sizeof(uint32_t) *
> (*len + 2));
> - (*values)[*len] = (int32_t) intify(str);
> + (*values)[*len] = (int32_t)
> ompi_btl_openib_ini_intify(str);
> ++(*len);
> str = comma + 1;
> comma = strchr(str, ',');
> @@ -602,7 +600,7 @@ static int intify_list(char *value, uint
> /* Get the last value (i.e., the value after the
> last
> comma, because it won't have been snarfed in the
> loop) */
> - (*values)[*len] = (uint32_t) intify(str);
> + (*values)[*len] = (uint32_t)
> ompi_btl_openib_ini_intify(str);
> ++(*len);
> }
>
> Index: ompi/mca/btl/openib/btl_openib_ini.h
>
> ===================================================================
> --- ompi/mca/btl/openib/btl_openib_ini.h (revision
> 19490)
> +++ ompi/mca/btl/openib/btl_openib_ini.h (working
> copy)
> @@ -49,6 +49,9 @@ extern "C" {
> */
> int ompi_btl_openib_ini_finalize(void);
>
> + int ompi_btl_openib_ini_intify(char *string);
> + int ompi_btl_openib_ini_intify_list(char *str, uint32_t
> **values, int *len);
> +
> #if defined(c_plusplus) || defined(__cplusplus)
> }
> #endif
> Index: ompi/mca/btl/openib/btl_openib_mca.c
>
> ===================================================================
> --- ompi/mca/btl/openib/btl_openib_mca.c (revision
> 19490)
> +++ ompi/mca/btl/openib/btl_openib_mca.c (working
> copy)
> @@ -27,6 +27,7 @@
> #include "opal/mca/base/mca_base_param.h"
> #include "btl_openib.h"
> #include "btl_openib_mca.h"
> +#include "btl_openib_ini.h"
>
> /*
> * Local flags
> @@ -97,7 +98,7 @@ static inline int reg_int(const char* pa
> */
> int btl_openib_register_mca_params(void)
> {
> - char *msg, *str;
> + char *msg, *str, *pkey;
> int ival, ival2, ret, tmp;
>
> ret = OMPI_SUCCESS;
> @@ -192,13 +193,15 @@ int
> btl_openib_register_mca_params(void)
> 0, &ival, REGINT_GE_ZERO));
> mca_btl_openib_component.ib_pkey_ix = (uint32_t) ival;
>
> - CHECK(reg_int("ib_pkey_val", "InfiniBand pkey value"
> + CHECK(reg_string("ib_pkey_val", "InfiniBand pkey value"
> "(must be > 0 and < 0xffff)",
> - 0, &ival, REGINT_GE_ZERO));
> - if (ival > 0xffff) {
> + "0", &pkey, 0));
> + mca_btl_openib_component.ib_pkey_val =
> ompi_btl_openib_ini_intify(pkey) & 0x7fff;
> + if (mca_btl_openib_component.ib_pkey_val > 0xffff ||
> + mca_btl_openib_component.ib_pkey_val < 0) {
> ret = OMPI_ERR_BAD_PARAM;
> }
> - mca_btl_openib_component.ib_pkey_val = (uint32_t) ival;
> + free(pkey);
>
> CHECK(reg_int("ib_psn", "InfiniBand packet sequence
> starting
> number "
> "(must be >= 0)",
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden] <mailto:devel_at_[hidden]>
> <mailto:devel_at_[hidden] <mailto:devel_at_[hidden]>>
>
>
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
>
> -- --
> Pavel Shamis (Pasha)
> Mellanox Technologies LTD.
>
>
>
>
> --
> --
> Pavel Shamis (Pasha)
> Mellanox Technologies LTD.
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
Cisco Systems