Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [OMPI users] cartofile
From: Terry Dontje (Terry.Dontje_at_[hidden])
Date: 2009-10-13 10:50:00


I guess my problem with the manpage or any info on carto in general is
that there is no text that describes what happens if you have a
cartofile and how it affects a job when you pass it in.

--td

Sylvain Jeaugey wrote:
> We worked a bit on it and yes, there is some work to do :
>
> * The syntax used to describe the various components is far from being
> consistent from one usage to another ("SOCKET", "NODE", ...). We
> manage to make things reading the various not up to date example files
> - but mainly the code.
>
> * The auto-detect component does not seem to do anything. We
> implemented it, and planned to release it. For now the code is heavily
> based on linux kernel functionalities, but missing the needed ifdefs.
>
> Also, we did a patch to dump in graphviz format the detected (or read)
> topology.
>
> Not much time to work on this right now, but if anyone wants to work
> on it, we may help.
>
> Sylvain
>
> On Tue, 13 Oct 2009, Ralph Castain wrote:
>
>> Here is where OMPI uses it:
>>
>> ompi/mca/btl/openib/btl_openib_component.c:1918:static
>> opal_carto_graph_t *host_topo;
>> ompi/mca/btl/openib/btl_openib_component.c:1923:
>> opal_carto_base_node_t *device_node;
>> ompi/mca/btl/openib/btl_openib_component.c:1931: device_node =
>> opal_carto_base_find_node(host_topo, device);
>> ompi/mca/btl/openib/btl_openib_component.c:1941:
>> opal_carto_base_node_t *slot_node;
>> ompi/mca/btl/openib/btl_openib_component.c:1951: slot_node =
>> opal_carto_base_find_node(host_topo, slot);
>> ompi/mca/btl/openib/btl_openib_component.c:1958: distance =
>> opal_carto_base_spf(host_topo, slot_node, device_node);
>> ompi/mca/btl/openib/btl_openib_component.c:1989:
>> opal_carto_base_get_host_graph(&host_topo, "Infiniband");
>> ompi/mca/btl/openib/btl_openib_component.c:1998:
>> opal_carto_base_free_graph(host_topo);
>> ompi/mca/btl/sm/btl_sm.c:118: opal_carto_graph_t *topo;
>> ompi/mca/btl/sm/btl_sm.c:123: opal_carto_node_distance_t *dist;
>> ompi/mca/btl/sm/btl_sm.c:124: opal_carto_base_node_t *slot_node;
>> ompi/mca/btl/sm/btl_sm.c:129: if (OMPI_SUCCESS !=
>> opal_carto_base_get_host_graph(&topo, "Memory")) {
>> ompi/mca/btl/sm/btl_sm.c:134: opal_value_array_init(&dists,
>> sizeof(opal_carto_node_distance_t));
>> ompi/mca/btl/sm/btl_sm.c:157: slot_node =
>> opal_carto_base_find_node(topo, myslot);
>> ompi/mca/btl/sm/btl_sm.c:163:
>> opal_carto_base_get_nodes_distance(topo, slot_node, "Memory", &dists);
>> ompi/mca/btl/sm/btl_sm.c:168: dist = (opal_carto_node_distance_t
>> *) opal_value_array_get_item(&dists, 0);
>> ompi/mca/btl/sm/btl_sm.c:175: opal_carto_base_free_graph(topo);
>>
>> No idea if it is of any value or not. I don't know of anyone who has
>> ever written a carto file for a system, has any idea how to do so, or
>> why they should. Looking at the code, it wouldn't appear to have any
>> value on any of the machines at LANL, but I may be missing something
>> - not a lot of help around to understand it.
>>
>> On Oct 13, 2009, at 7:08 AM, Terry Dontje wrote:
>>
>>> After rereading the manpage for the umpteenth time I agree with
>>> Eugene that the information provided on cartofile is next to
>>> useless. Ok, so you describe what your node looks like but what
>>> does mpirun or libmpi do with that information? Other than the
>>> option to provide the cartofile it isn't obvious how a user or
>>> libmpi uses this information.
>>>
>>> I've looked on the faq and wiki and have not found anything yet on
>>> how one "current" uses cartofile.
>>>
>>> --td
>>>
>>> Eugene Loh wrote:
>>>> This e-mail was on the users alias... see
>>>> http://www.open-mpi.org/community/lists/users/2009/09/10710.php
>>>>
>>>> There wasn't much response, so let me ask another question. How
>>>> about if we remove the cartofile section from the DESCRIPTION
>>>> section of the OMPI mpirun man page? It's a lot of text that
>>>> illustrates how to create a cartofile without saying anything about
>>>> why one would want to go to the trouble. What does this impact?
>>>> What does it change? What's the motivation for doing this stuff?
>>>> What's this stuff good for?
>>>>
>>>> Another alternative could be to move the cartofile description to a
>>>> FAQ page.
>>>>
>>>> The mpirun man page is rather long and I was thinking that if we
>>>> could remove some "low impact" stuff out, we could improve the
>>>> overall signal-to-noise ratio of the page.
>>>>
>>>> In any case, I personally would like to know what cartofiles are
>>>> good for.
>>>>
>>>> Eugene Loh wrote:
>>>>> Thank you, but I don't understand who is consuming this
>>>>> information for what. E.g., the mpirun man page describes the
>>>>> carto file, but doesn't give users any indication whether they
>>>>> should be worrying about this.
>>>>>
>>>>> Lenny Verkhovsky wrote:
>>>>>> Hi Eugene,
>>>>>> carto file is a file with a staic graph topology of your node.
>>>>>> in the opal/mca/carto/file/carto_file.h you can see example.
>>>>>> ( yes I know that , it should be help/man list :) )
>>>>>> Basically it describes a map of your node and inside
>>>>>> interconnection.
>>>>>> Hopefully it will be discovered automatically someday,
>>>>>> but for now you can describe your node manually.
>>>>>> Best regards Lenny.
>>>>>>
>>>>>> On Thu, Sep 17, 2009 at 12:38 AM, Eugene Loh <Eugene.Loh_at_[hidden]
>>>>>> <mailto:Eugene.Loh_at_[hidden]>> wrote:
>>>>>>
>>>>>> I feel like I should know, but what's a cartofile? I guess you
>>>>>> supply "topological" information about a host, but I can't tell
>>>>>> how this information is used by, say, mpirun.
>>>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel