Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [OMPI users] cartofile
From: Sylvain Jeaugey (sylvain.jeaugey_at_[hidden])
Date: 2009-10-13 10:15:42


We worked a bit on it and yes, there is some work to do :

* The syntax used to describe the various components is far from being
consistent from one usage to another ("SOCKET", "NODE", ...). We manage to
make things reading the various not up to date example files - but mainly
the code.

* The auto-detect component does not seem to do anything. We implemented
it, and planned to release it. For now the code is heavily based on linux
kernel functionalities, but missing the needed ifdefs.

Also, we did a patch to dump in graphviz format the detected (or read)
topology.

Not much time to work on this right now, but if anyone wants to work on
it, we may help.

Sylvain

On Tue, 13 Oct 2009, Ralph Castain wrote:

> Here is where OMPI uses it:
>
> ompi/mca/btl/openib/btl_openib_component.c:1918:static opal_carto_graph_t
> *host_topo;
> ompi/mca/btl/openib/btl_openib_component.c:1923: opal_carto_base_node_t
> *device_node;
> ompi/mca/btl/openib/btl_openib_component.c:1931: device_node =
> opal_carto_base_find_node(host_topo, device);
> ompi/mca/btl/openib/btl_openib_component.c:1941:
> opal_carto_base_node_t *slot_node;
> ompi/mca/btl/openib/btl_openib_component.c:1951: slot_node =
> opal_carto_base_find_node(host_topo, slot);
> ompi/mca/btl/openib/btl_openib_component.c:1958: distance =
> opal_carto_base_spf(host_topo, slot_node, device_node);
> ompi/mca/btl/openib/btl_openib_component.c:1989:
> opal_carto_base_get_host_graph(&host_topo, "Infiniband");
> ompi/mca/btl/openib/btl_openib_component.c:1998:
> opal_carto_base_free_graph(host_topo);
> ompi/mca/btl/sm/btl_sm.c:118: opal_carto_graph_t *topo;
> ompi/mca/btl/sm/btl_sm.c:123: opal_carto_node_distance_t *dist;
> ompi/mca/btl/sm/btl_sm.c:124: opal_carto_base_node_t *slot_node;
> ompi/mca/btl/sm/btl_sm.c:129: if (OMPI_SUCCESS !=
> opal_carto_base_get_host_graph(&topo, "Memory")) {
> ompi/mca/btl/sm/btl_sm.c:134: opal_value_array_init(&dists,
> sizeof(opal_carto_node_distance_t));
> ompi/mca/btl/sm/btl_sm.c:157: slot_node = opal_carto_base_find_node(topo,
> myslot);
> ompi/mca/btl/sm/btl_sm.c:163: opal_carto_base_get_nodes_distance(topo,
> slot_node, "Memory", &dists);
> ompi/mca/btl/sm/btl_sm.c:168: dist = (opal_carto_node_distance_t *)
> opal_value_array_get_item(&dists, 0);
> ompi/mca/btl/sm/btl_sm.c:175: opal_carto_base_free_graph(topo);
>
> No idea if it is of any value or not. I don't know of anyone who has ever
> written a carto file for a system, has any idea how to do so, or why they
> should. Looking at the code, it wouldn't appear to have any value on any of
> the machines at LANL, but I may be missing something - not a lot of help
> around to understand it.
>
> On Oct 13, 2009, at 7:08 AM, Terry Dontje wrote:
>
>> After rereading the manpage for the umpteenth time I agree with Eugene that
>> the information provided on cartofile is next to useless. Ok, so you
>> describe what your node looks like but what does mpirun or libmpi do with
>> that information? Other than the option to provide the cartofile it isn't
>> obvious how a user or libmpi uses this information.
>>
>> I've looked on the faq and wiki and have not found anything yet on how one
>> "current" uses cartofile.
>>
>> --td
>>
>> Eugene Loh wrote:
>>> This e-mail was on the users alias... see
>>> http://www.open-mpi.org/community/lists/users/2009/09/10710.php
>>>
>>> There wasn't much response, so let me ask another question. How about if
>>> we remove the cartofile section from the DESCRIPTION section of the OMPI
>>> mpirun man page? It's a lot of text that illustrates how to create a
>>> cartofile without saying anything about why one would want to go to the
>>> trouble. What does this impact? What does it change? What's the
>>> motivation for doing this stuff? What's this stuff good for?
>>>
>>> Another alternative could be to move the cartofile description to a FAQ
>>> page.
>>>
>>> The mpirun man page is rather long and I was thinking that if we could
>>> remove some "low impact" stuff out, we could improve the overall
>>> signal-to-noise ratio of the page.
>>>
>>> In any case, I personally would like to know what cartofiles are good for.
>>>
>>> Eugene Loh wrote:
>>>> Thank you, but I don't understand who is consuming this information for
>>>> what. E.g., the mpirun man page describes the carto file, but doesn't
>>>> give users any indication whether they should be worrying about this.
>>>>
>>>> Lenny Verkhovsky wrote:
>>>>> Hi Eugene,
>>>>> carto file is a file with a staic graph topology of your node.
>>>>> in the opal/mca/carto/file/carto_file.h you can see example.
>>>>> ( yes I know that , it should be help/man list :) )
>>>>> Basically it describes a map of your node and inside interconnection.
>>>>> Hopefully it will be discovered automatically someday,
>>>>> but for now you can describe your node manually.
>>>>> Best regards Lenny.
>>>>>
>>>>> On Thu, Sep 17, 2009 at 12:38 AM, Eugene Loh <Eugene.Loh_at_[hidden]
>>>>> <mailto:Eugene.Loh_at_[hidden]>> wrote:
>>>>>
>>>>> I feel like I should know, but what's a cartofile? I guess you
>>>>> supply "topological" information about a host, but I can't tell
>>>>> how this information is used by, say, mpirun.
>>>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>