Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Fwd: [OMPI users] OpenMPI with openib partitions
From: Matt Burgess (burgess.matt_at_[hidden])
Date: 2008-10-07 13:30:42


Pasha,

That's great, thanks for the help. When exactly do you expect that 1.2.8
will be released?

Thanks,
Matt

On Tue, Oct 7, 2008 at 1:29 PM, Pavel Shamis (Pasha) <
pasha_at_[hidden]> wrote:

> Matt,
> For all 1.2.X versions you should use btl_openib_ib_pkey_val
> In ongoing 1.3 version the parameter was renamed to btl_openib_of_pkey_val.
>
> BTW we plan to release 1.2.8 version very soon and it will include the
> partition bug fix.
>
> Regards,
> Pasha
>
> Matt Burgess wrote:
>
>> Pasha,
>>
>> With your patch and parameter suggestion, it works! So to be clear
>> btl_openib_ib_pkey_val is for 1.2.6 and btl_openib_of_pkey_val is for 1.2.7?
>>
>> Thanks again,
>> Matt
>>
>> On Tue, Oct 7, 2008 at 12:24 PM, Pavel Shamis (Pasha) <
>> pasha_at_[hidden] <mailto:pasha_at_[hidden]>> wrote:
>>
>> Matt,
>> Can you please run " cat
>> /sys/class/infiniband/mlx4_0/ports/1/pkeys/* " on your d2-ib,d3-ib.
>> I would like to check the partition configuration.
>>
>> Ohh, BTW I see that the command line in previous email was wrong,
>> Please use follow command line (the parameter name should be
>> "btl_openib_ib_pkey_val" for ompi-1.2.6 and my patch accepts
>> HEX/DEC values):
>> /opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -H d2-ib,d3-ib -mca btl
>> openib,self -mca btl_openib_ib_pkey_val 0x8109
>> /cluster/pallas/x86_64-ib/IMB-MPI1
>>
>> Ompi 1.2.6 version should work ok with this patch.
>>
>>
>> Thanks,
>> Pasha
>>
>> Matt Burgess wrote:
>>
>> Pasha,
>>
>> Thanks for the patch. Unfortunately, it doesn't seem like that
>> fixed the problem. I realized earlier I didn't mention what
>> version of OpenMPI I was trying - it's 1.2.6. <http://1.2.6.>
>> <http://1.2.6.> Should I be trying 1.2.7 with this patch?
>>
>> Thanks,
>> Matt
>>
>> 2008/10/7 Pavel Shamis (Pasha) <pasha_at_[hidden]
>> <mailto:pasha_at_[hidden]>
>> <mailto:pasha_at_[hidden]
>> <mailto:pasha_at_[hidden]>>>
>>
>>
>> Matt,
>> Can you please try attached patch ? I guess it will resolve
>> this
>> issue.
>>
>> Thanks,
>> Pasha
>>
>> Matt Burgess wrote:
>>
>> Lenny,
>>
>> Thanks for the info. It doesn't seem to be be working
>> still.
>> My command line is:
>>
>> /opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -H d2-ib,d3-ib
>> -mca btl
>> openib,self -mca btl_openib_of_pkey_val 33033
>> /cluster/pallas/x86_64-ib/IMB-MPI1
>>
>> I don't have a
>> "/sys/class/infiniband/mthca0/ports/1/pkeys/"
>> but I do have
>> "/sys/class/infiniband/mlx4_0/ports/1/pkeys/".
>> It's contents are:
>>
>> 0 106 114 122 16 24 32 40 49 57 65
>> 73 81
>> 9 98
>> 1 107 115 123 17 25 33 41 5 58 66
>> 74 82
>> 90 99
>> 10 108 116 124 18 26 34 42 50 59 67
>> 75 83
>> 91 100 109 117 125 19 27 35 43 51 6
>> 68 76 84 92 101 11 118 126 2 28 36 44
>> 52 60
>> 69 77 85 93 102 110 119 127 20 29 37
>> 45 53 61 7 78 86 94 103 111 12 13 21
>> 3 38
>> 46 54 62 70 79 87 95 104 112 120 14
>> 22 30 39 47 55 63 71 8 88 96 105
>> 113 121 15
>> 23 31 4 48 56 64 72 80 89 97
>> We aren't using the opensm, but voltaire's SM on a 2012
>> switch.
>>
>> Thanks again,
>> Matt
>>
>>
>> On Tue, Oct 7, 2008 at 9:37 AM, Lenny Verkhovsky
>> <lenny.verkhovsky_at_[hidden]
>> <mailto:lenny.verkhovsky_at_[hidden]>
>> <mailto:lenny.verkhovsky_at_[hidden]
>> <mailto:lenny.verkhovsky_at_[hidden]>>
>> <mailto:lenny.verkhovsky_at_[hidden]
>> <mailto:lenny.verkhovsky_at_[hidden]>
>> <mailto:lenny.verkhovsky_at_[hidden]
>> <mailto:lenny.verkhovsky_at_[hidden]>>>> wrote:
>>
>> Hi Matt,
>>
>> It seems that the right way to do it is the fallowing:
>>
>> -mca btl openib,self -mca btl_openib_ib_pkey_val 33033
>>
>> when the value is a decimal number of the pkey, in
>> your case
>> 0x8109 = 33033, and no need for
>> btl_openib_ib_pkey_ix value.
>>
>> ex.
>> mpirun -np 2 -H witch2,witch3 -mca btl openib,self -mca
>> btl_openib_ib_pkey_val 32769 ./mpi_p1_4_1_2 -t lt
>> LT (2) (size min max avg) 1 3.511429 3.511429 3.511429
>>
>> if it's not working check cat
>> /sys/class/infiniband/mthca0/ports/1/pkeys/* for
>> pkeys ans SM,
>> maybe it's a setup.
>>
>> Pasha is currently checking this issue.
>>
>> Best regards,
>>
>> Lenny.
>>
>>
>>
>>
>>
>> On 10/7/08, *Jeff Squyres* <jsquyres_at_[hidden]
>> <mailto:jsquyres_at_[hidden]>
>> <mailto:jsquyres_at_[hidden] <mailto:jsquyres_at_[hidden]>>
>> <mailto:jsquyres_at_[hidden]
>> <mailto:jsquyres_at_[hidden]> <mailto:jsquyres_at_[hidden]
>> <mailto:jsquyres_at_[hidden]>>>> wrote:
>>
>> FWIW, if this configuration is for all of your
>> users, you
>> might want to specify these MCA params in the
>> default MCA
>> param file, or the environment, ...etc. Just so
>> that you
>> don't have to specify it on every mpirun command
>> line.
>>
>> See
>>
>> http://www.open-mpi.org/faq/?category=tuning#setting-mca-params.
>>
>>
>>
>> On Oct 7, 2008, at 5:43 AM, Lenny Verkhovsky wrote:
>>
>> Sorry, misunderstood the question,
>>
>> thanks for Pasha the right command line will be
>>
>> -mca btl openib,self -mca
>> btl_openib_of_pkey_val 0x8109
>> -mca btl_openib_of_pkey_ix 1
>>
>> ex.
>>
>> #mpirun -np 2 -H witch2,witch3 -mca btl
>> openib,self
>> -mca
>> btl_openib_of_pkey_val 0x8001 -mca
>> btl_openib_of_pkey_ix 1
>> ./mpi_p1_4_TRUNK -t lt
>> LT (2) (size min max avg) 1 3.443480
>> 3.443480 3.443480
>>
>>
>> Best regards
>>
>> Lenny.
>>
>>
>> On 10/6/08, Jeff Squyres <jsquyres_at_[hidden]
>> <mailto:jsquyres_at_[hidden]>
>> <mailto:jsquyres_at_[hidden] <mailto:jsquyres_at_[hidden]>>
>> <mailto:jsquyres_at_[hidden]
>> <mailto:jsquyres_at_[hidden]>
>>
>> <mailto:jsquyres_at_[hidden]
>> <mailto:jsquyres_at_[hidden]>>>> wrote: On Oct 5, 2008, at
>>
>> 1:22 PM, Lenny Verkhovsky wrote:
>>
>> you should probably use -mca tcp,self -mca
>> btl_openib_if_include ib0.8109
>>
>>
>> Really? I thought we only took OpenFabrics
>> device
>> names
>> in the openib_if_include MCA param...? It
>> looks like
>> ib0.8109 is an IPoIB device name.
>>
>>
>>
>> Lenny.
>>
>>
>>
>> On 10/3/08, Matt Burgess
>> <burgess.matt_at_[hidden] <mailto:burgess.matt_at_[hidden]>
>> <mailto:burgess.matt_at_[hidden]
>> <mailto:burgess.matt_at_[hidden]>>
>> <mailto:burgess.matt_at_[hidden]
>> <mailto:burgess.matt_at_[hidden]>
>> <mailto:burgess.matt_at_[hidden]
>> <mailto:burgess.matt_at_[hidden]>>>> wrote:
>> Hi,
>>
>>
>> I'm trying to get openmpi working over openib
>> partitions.
>> On this cluster, the partition number is
>> 0x109. The ib
>> interfaces are pingable over the appropriate
>> ib0.8109
>> interface:
>>
>> d2:/opt/openmpi-ib # ifconfig ib0.8109
>> ib0.8109 Link encap:UNSPEC HWaddr
>> 80-00-00-4A-FE-80-00-00-00-00-00-00-00-00-00-00
>> inet addr:10.21.48.2
>> <http://10.21.48.2> <http://10.21.48.2>
>> <http://10.21.48.2>
>> Bcast:10.21.255.255 <http://10.21.255.255>
>> <http://10.21.255.255>
>> <http://10.21.255.255>
>> Mask:255.255.0.0 <http://255.255.0.0>
>> <http://255.255.0.0>
>> <http://255.255.0.0>
>>
>> inet6 addr: fe80::202:c902:26:ca01/64
>> Scope:Link
>> UP BROADCAST RUNNING MULTICAST
>> MTU:65520
>> Metric:1
>> RX packets:16811 errors:0 dropped:0
>> overruns:0 frame:0
>> TX packets:15848 errors:0 dropped:1
>> overruns:0
>> carrier:0
>> collisions:0 txqueuelen:256
>> RX bytes:102229428 (97.4 Mb) TX
>> bytes:102324172
>> (97.5 Mb)
>>
>>
>> I have tried the following:
>>
>> /opt/openmpi-ib/1.2.6/bin/mpirun -np 2
>> -machinefile
>> machinefile -mca btl openib,self -mca
>> btl_openib_max_btls
>> 1 -mca btl_openib_ib_pkey_val 0x8109 -mca
>> btl_openib_ib_pkey_ix 1
>> /cluster/pallas/x86_64-ib/IMB-MPI1
>>
>> but I just get a RETRY EXCEEDED ERROR. Is
>> there a MCA
>> parameter I am missing?
>>
>> I was successful using tcp only:
>>
>> /opt/openmpi-ib/1.2.6/bin/mpirun -np 2
>> -machinefile
>> machinefile -mca btl tcp,self -mca
>> btl_openib_max_btls 1
>> -mca btl_openib_ib_pkey_val 0x8109
>> /cluster/pallas/x86_64-ib/IMB-MPI1
>>
>>
>>
>> Thanks,
>> Matt Burgess
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> <mailto:users_at_[hidden]> <mailto:users_at_[hidden]
>> <mailto:users_at_[hidden]>>
>> <mailto:users_at_[hidden] <mailto:users_at_[hidden]>
>> <mailto:users_at_[hidden] <mailto:users_at_[hidden]>>>
>>
>>
>>
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> <mailto:users_at_[hidden]> <mailto:users_at_[hidden]
>> <mailto:users_at_[hidden]>>
>> <mailto:users_at_[hidden] <mailto:users_at_[hidden]>
>> <mailto:users_at_[hidden] <mailto:users_at_[hidden]>>>
>>
>>
>>
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> -- Jeff Squyres
>> Cisco Systems
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> <mailto:users_at_[hidden]> <mailto:users_at_[hidden]
>> <mailto:users_at_[hidden]>>
>> <mailto:users_at_[hidden] <mailto:users_at_[hidden]>
>> <mailto:users_at_[hidden] <mailto:users_at_[hidden]>>>
>>
>>
>>
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> -- Jeff Squyres
>> Cisco Systems
>>
>>
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden] <mailto:devel_at_[hidden]>
>> <mailto:devel_at_[hidden] <mailto:devel_at_[hidden]>>
>>
>>
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>>
>> -- --
>> Pavel Shamis (Pasha)
>> Mellanox Technologies LTD.
>>
>>
>> Index: ompi/mca/btl/openib/btl_openib_component.c
>>
>> ===================================================================
>> --- ompi/mca/btl/openib/btl_openib_component.c (revision
>> 19490)
>> +++ ompi/mca/btl/openib/btl_openib_component.c (working copy)
>> @@ -558,7 +558,7 @@ static int init_one_hca(opal_list_t *btl
>> goto dealloc_pd;
>> }
>>
>> - ret = OMPI_SUCCESS;
>> + ret = OMPI_SUCCESS;
>> /* Note ports are 1 based hence j = 1 */
>> for(i = 1; i <= hca->ib_dev_attr.phys_port_cnt; i++){
>> struct ibv_port_attr ib_port_attr;
>> @@ -580,7 +580,7 @@ static int init_one_hca(opal_list_t *btl
>> uint16_t pkey,j;
>> for (j=0; j < hca->ib_dev_attr.max_pkeys;
>> j++) {
>> ibv_query_pkey(hca->ib_dev_context, i,
>> j, &pkey);
>> - pkey=ntohs(pkey);
>> + pkey=ntohs(pkey) & 0x7fff;
>> if(pkey ==
>> mca_btl_openib_component.ib_pkey_val){
>> ret = init_one_port(btl_list, hca,
>> i, j,
>> &ib_port_attr);
>> break;
>> Index: ompi/mca/btl/openib/btl_openib_ini.c
>>
>> ===================================================================
>> --- ompi/mca/btl/openib/btl_openib_ini.c (revision
>> 19490)
>> +++ ompi/mca/btl/openib/btl_openib_ini.c (working copy)
>> @@ -90,8 +90,6 @@ static int parse_line(parsed_section_val
>> static void reset_section(bool had_previous_value,
>> parsed_section_values_t *s);
>> static void reset_values(ompi_btl_openib_ini_values_t *v);
>> static int save_section(parsed_section_values_t *s);
>> -static int intify(char *string);
>> -static int intify_list(char *str, uint32_t **values, int
>> *len);
>> static inline void show_help(const char *topic);
>>
>>
>> @@ -364,14 +362,14 @@ static int parse_line(parsed_section_val
>> all whitespace at the beginning and ending of the
>> value. */
>>
>> if (0 == strcasecmp(key_buffer, "vendor_id")) {
>> - if (OMPI_SUCCESS != (ret = intify_list(value,
>> &sv->vendor_ids,
>> + if (OMPI_SUCCESS != (ret =
>> ompi_btl_openib_ini_intify_list(value, &sv->vendor_ids,
>>
>> &sv->vendor_ids_len))) {
>> return ret;
>> }
>> }
>>
>> else if (0 == strcasecmp(key_buffer, "vendor_part_id")) {
>> - if (OMPI_SUCCESS != (ret = intify_list(value,
>> &sv->vendor_part_ids,
>> + if (OMPI_SUCCESS != (ret =
>> ompi_btl_openib_ini_intify_list(value, &sv->vendor_part_ids,
>>
>> &sv->vendor_part_ids_len))) {
>> return ret;
>> }
>> @@ -379,13 +377,13 @@ static int parse_line(parsed_section_val
>>
>> else if (0 == strcasecmp(key_buffer, "mtu")) {
>> /* Single value */
>> - sv->values.mtu = (uint32_t) intify(value);
>> + sv->values.mtu = (uint32_t)
>> ompi_btl_openib_ini_intify(value);
>> sv->values.mtu_set = true;
>> }
>>
>> else if (0 == strcasecmp(key_buffer, "use_eager_rdma")) {
>> /* Single value */
>> - sv->values.use_eager_rdma = (uint32_t) intify(value);
>> + sv->values.use_eager_rdma = (uint32_t)
>> ompi_btl_openib_ini_intify(value);
>> sv->values.use_eager_rdma_set = true;
>> }
>>
>> @@ -547,7 +545,7 @@ static int save_section(parsed_section_v
>> /*
>> * Do string-to-integer conversion, for both hex and
>> decimal numbers
>> */
>> -static int intify(char *str)
>> +int ompi_btl_openib_ini_intify(char *str)
>> {
>> while (isspace(*str)) {
>> ++str;
>> @@ -568,7 +566,7 @@ static int intify(char *str)
>> /*
>> * Take a comma-delimited list and infity them all
>> */
>> -static int intify_list(char *value, uint32_t **values, int
>> *len)
>> +int ompi_btl_openib_ini_intify_list(char *value, uint32_t
>> **values, int *len)
>> {
>> char *comma;
>> char *str = value;
>> @@ -584,7 +582,7 @@ static int intify_list(char *value, uint
>> if (NULL == *values) {
>> return OMPI_ERR_OUT_OF_RESOURCE;
>> }
>> - *values[0] = (uint32_t) intify(str);
>> + *values[0] = (uint32_t)
>> ompi_btl_openib_ini_intify(str);
>> *len = 1;
>> } else {
>> /* If we found a comma, loop over all the values. Be a
>> @@ -594,7 +592,7 @@ static int intify_list(char *value, uint
>> do {
>> *comma = '\0';
>> *values = realloc(*values, sizeof(uint32_t) *
>> (*len + 2));
>> - (*values)[*len] = (int32_t) intify(str);
>> + (*values)[*len] = (int32_t)
>> ompi_btl_openib_ini_intify(str);
>> ++(*len);
>> str = comma + 1;
>> comma = strchr(str, ',');
>> @@ -602,7 +600,7 @@ static int intify_list(char *value, uint
>> /* Get the last value (i.e., the value after the last
>> comma, because it won't have been snarfed in the
>> loop) */
>> - (*values)[*len] = (uint32_t) intify(str);
>> + (*values)[*len] = (uint32_t)
>> ompi_btl_openib_ini_intify(str);
>> ++(*len);
>> }
>>
>> Index: ompi/mca/btl/openib/btl_openib_ini.h
>>
>> ===================================================================
>> --- ompi/mca/btl/openib/btl_openib_ini.h (revision
>> 19490)
>> +++ ompi/mca/btl/openib/btl_openib_ini.h (working copy)
>> @@ -49,6 +49,9 @@ extern "C" {
>> */
>> int ompi_btl_openib_ini_finalize(void);
>>
>> + int ompi_btl_openib_ini_intify(char *string);
>> + int ompi_btl_openib_ini_intify_list(char *str, uint32_t
>> **values, int *len);
>> +
>> #if defined(c_plusplus) || defined(__cplusplus)
>> }
>> #endif
>> Index: ompi/mca/btl/openib/btl_openib_mca.c
>>
>> ===================================================================
>> --- ompi/mca/btl/openib/btl_openib_mca.c (revision
>> 19490)
>> +++ ompi/mca/btl/openib/btl_openib_mca.c (working copy)
>> @@ -27,6 +27,7 @@
>> #include "opal/mca/base/mca_base_param.h"
>> #include "btl_openib.h"
>> #include "btl_openib_mca.h"
>> +#include "btl_openib_ini.h"
>>
>> /*
>> * Local flags
>> @@ -97,7 +98,7 @@ static inline int reg_int(const char* pa
>> */
>> int btl_openib_register_mca_params(void)
>> {
>> - char *msg, *str;
>> + char *msg, *str, *pkey;
>> int ival, ival2, ret, tmp;
>>
>> ret = OMPI_SUCCESS;
>> @@ -192,13 +193,15 @@ int btl_openib_register_mca_params(void)
>> 0, &ival, REGINT_GE_ZERO));
>> mca_btl_openib_component.ib_pkey_ix = (uint32_t) ival;
>>
>> - CHECK(reg_int("ib_pkey_val", "InfiniBand pkey value"
>> + CHECK(reg_string("ib_pkey_val", "InfiniBand pkey value"
>> "(must be > 0 and < 0xffff)",
>> - 0, &ival, REGINT_GE_ZERO));
>> - if (ival > 0xffff) {
>> + "0", &pkey, 0));
>> + mca_btl_openib_component.ib_pkey_val =
>> ompi_btl_openib_ini_intify(pkey) & 0x7fff;
>> + if (mca_btl_openib_component.ib_pkey_val > 0xffff ||
>> + mca_btl_openib_component.ib_pkey_val < 0) {
>> ret = OMPI_ERR_BAD_PARAM;
>> }
>> - mca_btl_openib_component.ib_pkey_val = (uint32_t) ival;
>> + free(pkey);
>>
>> CHECK(reg_int("ib_psn", "InfiniBand packet sequence
>> starting
>> number "
>> "(must be >= 0)",
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden] <mailto:devel_at_[hidden]>
>> <mailto:devel_at_[hidden] <mailto:devel_at_[hidden]>>
>>
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>>
>>
>> -- --
>> Pavel Shamis (Pasha)
>> Mellanox Technologies LTD.
>>
>>
>>
>
> --
> --
> Pavel Shamis (Pasha)
> Mellanox Technologies LTD.
>
>