Open MPI logo

Network Locality users Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Network Locality users mailing list

Subject: Re: [netloc-users] Trying Netloc on a small IB cluster
From: Brice Goglin (Brice.Goglin_at_[hidden])
Date: 2013-12-06 03:05:11


Hello,
I haven't tested many hwloc released with netloc but I think v1.4.2
should be enough (this is where we added IB port information). v1.8
bring new features but we can work without them. I should clearly
document this at some point.
Brice

Le 05/12/2013 19:17, Raghu a écrit :
> Josh,
>
> I was using a much older version indeed (v1.5). I wanted to get netloc
> up and running functionally before updating to the bleeding-edge, but
> this info helps. I will update hwloc and give this another spin.
>
> Thanks!
>
> Raghu
>
>
> On Thu, Dec 5, 2013 at 3:09 PM, Josh Hursey <jjhursey_at_[hidden]> wrote:
>> I wonder if this is a hwloc version/configuration issue then. What version
>> of hwloc are using using? Does it have the pci access mechanism enabled?
>>
>> Note that the latest hwloc release (1.8) has some other special sauce that
>> works nicely with netloc (i.e., hwloc topology compression).
>>
>> Brice might be able to help with this question a bit more when he gets back
>> from travel if the above does not help.
>>
>>
>>
>> On Thu, Dec 5, 2013 at 4:04 PM, Jeff Squyres (jsquyres) <jsquyres_at_[hidden]>
>> wrote:
>>> (Raghu's mail didn't go through to the list because he's accidentally not
>>> subscribed)
>>>
>>> Thanks for your patience with netloc -- it's still early days with this
>>> stuff.
>>>
>>> It *works* for us, but a) that doesn't necessarily mean it works for
>>> everyone yet (i.e., we haven't shaken out all the bugs / addressed all
>>> corner cases yet), and b) as Josh said, it's not yet terribly user-friendly.
>>>
>>>
>>>
>>> On Dec 5, 2013, at 4:50 PM, Raghu <rajachan_at_[hidden]> wrote:
>>>
>>>> Jeff, Josh,
>>>>
>>>> Thanks for the responses. I did stumble upon the other email on the
>>>> devel list, and tried the steps you listed there. With that route
>>>> though, the gather tool couldn't find any subnets with the hwloc xmls
>>>> I had provided (exited after printing "Found 0 subnets in hwloc
>>>> directory"). I will try to fiddle around a little more with the other
>>>> method to see if I can get things to work.
>>>>
>>>> Raghu
>>>>
>>>>
>>>> On Thu, Dec 5, 2013 at 2:28 PM, Joshua Hursey <jjhursey_at_[hidden]>
>>>> wrote:
>>>>> Raghu,
>>>>>
>>>>> The probably is likely that the subnet has not been specified. The
>>>>> netloc_reader_ib is not terribly user friendly at the moment. We have
>>>>> some
>>>>> supporting tools that help make it easier to use. I highlighted the
>>>>> steps
>>>>> for another user in the mail linked below:
>>>>> http://www.open-mpi.org/community/lists/netloc-devel/2013/11/0005.php
>>>>>
>>>>> Notice that it does not call netloc_reader_ib explicitly, it is wrapped
>>>>> up
>>>>> as part of the netloc-ib-extract-dats script.
>>>>>
>>>>>
>>>>> You will also need to install Jansson (if you have not already) as that
>>>>> is
>>>>> how netloc is currently representing the data. It can be downloaded
>>>>> from:
>>>>> http://www.digip.org/jansson/
>>>>>
>>>>>
>>>>> I am currently working on some FAQs to hopefully help in the future. In
>>>>> the
>>>>> mean time feel free to email us or directly to the users/devel mailing
>>>>> list.
>>>>>
>>>>> Thanks,
>>>>> Josh
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Dec 5, 2013 at 2:01 PM, Raghu <rajachan_at_[hidden]>
>>>>> wrote:
>>>>>> Btw, I have a pretty standard installation -- ./configure
>>>>>> --prefix=/home/rajachan/netloc/install
>>>>>> --with-hwloc=/home/rajachan/hwloc-1.5/install
>>>>>>
>>>>>> Raghu
>>>>>>
>>>>>>
>>>>>> On Thu, Dec 5, 2013 at 2:56 PM, Raghu <rajachan_at_[hidden]>
>>>>>> wrote:
>>>>>>> Hi Josh, Jeff,
>>>>>>>
>>>>>>> I am trying out netloc (the master branch) on a small IB cluster
>>>>>>> (which I have sudo access to). I got stuff built fine, but when I try
>>>>>>> to generate the .ndat files, I am getting this:
>>>>>>>
>>>>>>> Output Directory : /home/rajachan/netloc/install/bin/output/
>>>>>>> Subnet : unknown
>>>>>>> ibnetdiscover File : /home/rajachan/netloc/install/bin/ibnetdata
>>>>>>> ibroutes Directory : None Specified
>>>>>>> Status: Querying the ibnetdiscover data for subnet unknown...
>>>>>>> Error: Invalid network type provided
>>>>>>> Error: Failed to create a new data file
>>>>>>>
>>>>>>> Here's how I am running the reader : ./netloc_reader_ib -o
>>>>>>> /home/rajachan/netloc/install/bin/output/ -f
>>>>>>> /home/rajachan/netloc/install/bin/ibnetdata
>>>>>>>
>>>>>>> Do you guys see any glaring config mistake from my end?
>>>>>>>
>>>>>>> Raghu
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Joshua Hursey
>>>>> Assistant Professor of Computer Science
>>>>> University of Wisconsin-La Crosse
>>>>> http://cs.uwlax.edu/~jjhursey
>>>
>>> --
>>> Jeff Squyres
>>> jsquyres_at_[hidden]
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>
>>> _______________________________________________
>>> netloc-users mailing list
>>> netloc-users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/netloc-users
>>
>>
>>
>> --
>> Joshua Hursey
>> Assistant Professor of Computer Science
>> University of Wisconsin-La Crosse
>> http://cs.uwlax.edu/~jjhursey
>>
>> _______________________________________________
>> netloc-users mailing list
>> netloc-users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/netloc-users
>>
> _______________________________________________
> netloc-users mailing list
> netloc-users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/netloc-users