Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [OMPI svn] svn:open-mpi r19653
From: Aurélien Bouteiller (bouteill_at_[hidden])
Date: 2008-09-29 16:09:04


> we need to extend the existing RML function to handle the
> subsequent setting of the route to the proc itself. In the current
> dpm, we automatically assume that the route will be to a different
> job family, and hence send the routing info to the HNP. However,
> this may not be true - e.g., after a comm_spawn, there is no reason
> to route through the HNP since the job family is the same.
>
This is not correct. The current code in the DPM already takes care of
the "usual" case where both ends are in the same job family; in that
case it creates a "direct" route to the remote end (maybe it should
just do nothing, though). This logic is pretty simple and is well
contained in the DPM. Moving this logic to the rml should not
basically change much: the complexity will just move from the dpm to
the routed. The existing single dpm code already do everything we need
for current and future use, while we might have to upgrade all the
routed to take into account this special case. This is why I would
advocate for the lesser effort for the exact same functionality at the
end.

> Haven't thought it all through yet, but wanted to suggest we think
> about it as we may (per the FT July discussions) need to define
> routes for things other than just DPM-related operations. Perhaps we
> should do some design discussion off-list to see what makes sense?
>
I'm always open to discussion. Let me know if you find this useful on
some purpose.

> Thanks
> Ralph
>
Aurelien

>
> On Sep 28, 2008, at 8:33 AM, Aurélien Bouteiller wrote:
>
>> Ralph,
>>
>> I just split the existing static function from inside the dpm and
>> exposed it to the outside world. The idea is that the dpm create
>> the (opaque) port strings and therefore nows how they are supposed
>> to be formated. So he is responsible for parsing them. Second, I
>> split the parsing and routing in two different functions because
>> sometimes you might want to parse without creating a route to the
>> target.
>>
>> I'll check the RML function to see if it offers similar
>> functionality om monday. I have no strongly religious belief on
>> wether this should be a rml or dpm function. So I don't care as
>> long as I have what I need :]
>>
>> Thanks for the feedback,
>> Aurelien
>>
>>
>> Le 27 sept. 08 à 20:53, Ralph Castain a écrit :
>>
>>> Yo Aurelien
>>>
>>> Regarding the dpm including a "route_to_port" API. This actually
>>> is pretty close to being an exact duplicate of an already existing
>>> function in the RML that takes a URI as it's input, parses it to
>>> separate the proc name and the contact info, sets the contact info
>>> into the OOB, sets the route to that proc, and returns the proc
>>> name to the caller. Take a look at orte/mca/rml/base/
>>> rml_base_contact.c.
>>>
>>> All we need to do is add the logic to that function so that, if
>>> the target proc is not in our job family, we update the route and
>>> contact info in the HNP instead of locally.
>>>
>>> This would keep all the "setting_route_to_proc" functionality in
>>> one place, instead of duplicating it in the dpm, thus making
>>> maintenance much easier.
>>>
>>> Make sense?
>>> Ralph
>>>
>>>
>>> On Sep 27, 2008, at 7:22 AM, bouteill_at_[hidden] wrote:
>>>
>>>> Author: bouteill
>>>> Date: 2008-09-27 09:22:32 EDT (Sat, 27 Sep 2008)
>>>> New Revision: 19653
>>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/19653
>>>>
>>>> Log:
>>>> Add functions to access the opaque port_string and to add routes
>>>> to a remote port. This is usefull for FT, but could also turn
>>>> usefull when considering MPI3 extentions to the MPI2 dynamics.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Text files modified:
>>>> trunk/ompi/mca/dpm/base/base.h | 3 +
>>>> trunk/ompi/mca/dpm/base/dpm_base_null_fns.c | 12 ++++
>>>> trunk/ompi/mca/dpm/base/dpm_base_open.c | 2
>>>> trunk/ompi/mca/dpm/dpm.h | 20 +++++++
>>>> trunk/ompi/mca/dpm/orte/dpm_orte.c | 114 ++++++++++++++
>>>> +++++++------------------
>>>> 5 files changed, 99 insertions(+), 52 deletions(-)
>>>>
>>>> Modified: trunk/ompi/mca/dpm/base/base.h
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> ===================================================================
>>>> --- trunk/ompi/mca/dpm/base/base.h (original)
>>>> +++ trunk/ompi/mca/dpm/base/base.h 2008-09-27 09:22:32 EDT (Sat,
>>>> 27 Sep 2008)
>>>> @@ -92,6 +92,9 @@
>>>> int ompi_dpm_base_null_dyn_finalize (void);
>>>> void ompi_dpm_base_null_mark_dyncomm (ompi_communicator_t *comm);
>>>> int ompi_dpm_base_null_open_port(char *port_name, orte_rml_tag_t
>>>> given_tag);
>>>> +int ompi_dpm_base_null_parse_port(char *port_name,
>>>> + orte_process_name_t *rproc,
>>>> orte_rml_tag_t *tag);
>>>> +int ompi_dpm_base_null_route_to_port(char *rml_uri,
>>>> orte_process_name_t *rproc);
>>>> int ompi_dpm_base_null_close_port(char *port_name);
>>>>
>>>> /* useful globals */
>>>>
>>>> Modified: trunk/ompi/mca/dpm/base/dpm_base_null_fns.c
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> ===================================================================
>>>> --- trunk/ompi/mca/dpm/base/dpm_base_null_fns.c (original)
>>>> +++ trunk/ompi/mca/dpm/base/dpm_base_null_fns.c 2008-09-27
>>>> 09:22:32 EDT (Sat, 27 Sep 2008)
>>>> @@ -36,6 +36,7 @@
>>>> {
>>>> return OMPI_ERR_NOT_SUPPORTED;
>>>> }
>>>> +
>>>> void ompi_dpm_base_null_disconnect(ompi_communicator_t *comm)
>>>> {
>>>> return;
>>>> @@ -70,6 +71,17 @@
>>>> return OMPI_ERR_NOT_SUPPORTED;
>>>> }
>>>>
>>>> +int ompi_dpm_base_null_parse_port(char *port_name,
>>>> + orte_process_name_t *rproc,
>>>> orte_rml_tag_t *tag)
>>>> +{
>>>> + return OMPI_ERR_NOT_SUPPORTED;
>>>> +}
>>>> +
>>>> +int ompi_dpm_base_null_route_to_port(char *rml_uri,
>>>> orte_process_name_t *rproc)
>>>> +{
>>>> + return OMPI_ERR_NOT_SUPPORTED;
>>>> +}
>>>> +
>>>> int ompi_dpm_base_null_close_port(char *port_name)
>>>> {
>>>> return OMPI_ERR_NOT_SUPPORTED;
>>>>
>>>> Modified: trunk/ompi/mca/dpm/base/dpm_base_open.c
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> ===================================================================
>>>> --- trunk/ompi/mca/dpm/base/dpm_base_open.c (original)
>>>> +++ trunk/ompi/mca/dpm/base/dpm_base_open.c 2008-09-27 09:22:32
>>>> EDT (Sat, 27 Sep 2008)
>>>> @@ -42,6 +42,8 @@
>>>> ompi_dpm_base_null_dyn_finalize,
>>>> ompi_dpm_base_null_mark_dyncomm,
>>>> ompi_dpm_base_null_open_port,
>>>> + ompi_dpm_base_null_parse_port,
>>>> + ompi_dpm_base_null_route_to_port,
>>>> ompi_dpm_base_null_close_port,
>>>> NULL
>>>> };
>>>>
>>>> Modified: trunk/ompi/mca/dpm/dpm.h
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> ===================================================================
>>>> --- trunk/ompi/mca/dpm/dpm.h (original)
>>>> +++ trunk/ompi/mca/dpm/dpm.h 2008-09-27 09:22:32 EDT (Sat, 27 Sep
>>>> 2008)
>>>> @@ -58,6 +58,8 @@
>>>> #define OMPI_RML_TAG_DYNAMIC
>>>> OMPI_RML_TAG_BASE+200
>>>>
>>>>
>>>> +
>>>> +
>>>> /*
>>>> * Initialize a module
>>>> */
>>>> @@ -116,6 +118,20 @@
>>>> typedef int (*ompi_dpm_base_module_open_port_fn_t)(char
>>>> *port_name, orte_rml_tag_t tag);
>>>>
>>>> /*
>>>> + * Converts an opaque port string to a RML process nane and tag.
>>>> + */
>>>> +typedef int (*ompi_dpm_base_module_parse_port_name_t)(char
>>>> *port_name,
>>>> +
>>>> orte_process_name_t *rproc,
>>>> +
>>>> orte_rml_tag_t *tag);
>>>> +
>>>> +/*
>>>> + * Update the routed component to make sure that the RML can
>>>> send messages to
>>>> + * the remote port
>>>> + */
>>>> +typedef int (*ompi_dpm_base_module_route_to_port_t)(char
>>>> *rml_uri, orte_process_name_t *rproc);
>>>> +
>>>> +
>>>> +/*
>>>> * Close a port
>>>> */
>>>> typedef int (*ompi_dpm_base_module_close_port_fn_t)(char
>>>> *port_name);
>>>> @@ -145,6 +161,10 @@
>>>> ompi_dpm_base_module_mark_dyncomm_fn_t mark_dyncomm;
>>>> /* open port */
>>>> ompi_dpm_base_module_open_port_fn_t open_port;
>>>> + /* parse port string */
>>>> + ompi_dpm_base_module_parse_port_name_t parse_port;
>>>> + /* update route to a port */
>>>> + ompi_dpm_base_module_route_to_port_t route_to_port;
>>>> /* close port */
>>>> ompi_dpm_base_module_close_port_fn_t close_port;
>>>> /* finalize */
>>>>
>>>> Modified: trunk/ompi/mca/dpm/orte/dpm_orte.c
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> ===================================================================
>>>> --- trunk/ompi/mca/dpm/orte/dpm_orte.c (original)
>>>> +++ trunk/ompi/mca/dpm/orte/dpm_orte.c 2008-09-27 09:22:32 EDT
>>>> (Sat, 27 Sep 2008)
>>>> @@ -61,9 +61,6 @@
>>>> opal_buffer_t *buffer,
>>>> orte_rml_tag_t tag, void *cbdata);
>>>> static void process_cb(int fd, short event, void *data);
>>>> -static int parse_port_name(char *port_name,
>>>> - orte_process_name_t *rproc,
>>>> - orte_rml_tag_t *tag);
>>>>
>>>> /* API functions */
>>>> static int init(void);
>>>> @@ -78,6 +75,9 @@
>>>> char *port_name);
>>>> static int dyn_init(void);
>>>> static int open_port(char *port_name, orte_rml_tag_t given_tag);
>>>> +static int parse_port_name(char *port_name, orte_process_name_t
>>>> *rproc,
>>>> + orte_rml_tag_t *tag);
>>>> +static int route_to_port(char *rml_uri, orte_process_name_t
>>>> *rproc);
>>>> static int close_port(char *port_name);
>>>> static int finalize(void);
>>>>
>>>> @@ -93,6 +93,8 @@
>>>> ompi_dpm_base_dyn_finalize,
>>>> ompi_dpm_base_mark_dyncomm,
>>>> open_port,
>>>> + parse_port_name,
>>>> + route_to_port,
>>>> close_port,
>>>> finalize
>>>> };
>>>> @@ -145,14 +147,17 @@
>>>> * set us up to communicate with it
>>>> */
>>>> if (NULL != port_string && 0 < strlen(port_string)) {
>>>> - /* separate the string into the RML URI and tag - this
>>>> function performs
>>>> - * whatever route initialization is required by the
>>>> selected routed module
>>>> - */
>>>> + /* separate the string into the RML URI and tag */
>>>> if (ORTE_SUCCESS != (rc = parse_port_name(port_string,
>>>> &port, &tag))) {
>>>> ORTE_ERROR_LOG(rc);
>>>> return rc;
>>>> }
>>>> }
>>>> + /* make sure we can route rml messages to the destination */
>>>> + if (ORTE_SUCCESS != (rc = route_to_port(port_string,
>>>> &port))) {
>>>> + ORTE_ERROR_LOG(rc);
>>>> + return rc;
>>>> + }
>>>>
>>>> /* tell the progress engine to tick the event library more
>>>> often, to make sure that the OOB messages get sent */
>>>> @@ -783,48 +788,11 @@
>>>> ORTE_MESSAGE_EVENT(sender, buffer, tag, release_ack);
>>>> }
>>>>
>>>> -
>>>> -static int parse_port_name(char *port_name,
>>>> - orte_process_name_t *rproc,
>>>> - orte_rml_tag_t *tag)
>>>> +static int route_to_port(char *uri, orte_process_name_t *rproc)
>>>> {
>>>> - char *tmpstring=NULL, *ptr, *rml_uri=NULL;
>>>> - orte_rml_cmd_flag_t cmd = ORTE_RML_UPDATE_CMD;
>>>> - int rc;
>>>> opal_buffer_t route;
>>>> -
>>>> - /* don't mangle the port name */
>>>> - tmpstring = strdup(port_name);
>>>> -
>>>> - /* find the ':' demarking the RML tag we added to the end */
>>>> - if (NULL == (ptr = strrchr(tmpstring, ':'))) {
>>>> - rc = ORTE_ERR_NOT_FOUND;
>>>> - goto cleanup;
>>>> - }
>>>> -
>>>> - /* terminate the port_name at that location */
>>>> - *ptr = '\0';
>>>> - ptr++;
>>>> -
>>>> - /* convert the RML tag */
>>>> - sscanf(ptr,"%d", (int*)tag);
>>>> -
>>>> - /* now split out the second field - the uri of the remote
>>>> proc */
>>>> - if (NULL == (ptr = strchr(tmpstring, '+'))) {
>>>> - rc = ORTE_ERR_NOT_FOUND;
>>>> - goto cleanup;
>>>> - }
>>>> - *ptr = '\0';
>>>> - ptr++;
>>>> -
>>>> - /* save that info */
>>>> - rml_uri = strdup(ptr);
>>>> -
>>>> - /* extract the originating proc's name */
>>>> - if (ORTE_SUCCESS != (rc = orte_rml_base_parse_uris(ptr,
>>>> rproc, NULL))) {
>>>> - ORTE_ERROR_LOG(rc);
>>>> - goto cleanup;
>>>> - }
>>>> + orte_rml_cmd_flag_t cmd = ORTE_RML_UPDATE_CMD;
>>>> + int rc;
>>>>
>>>> /* if this proc is part of my job family, then I need to
>>>> * update my RML contact hash table and my routes
>>>> @@ -835,14 +803,14 @@
>>>> ORTE_NAME_PRINT(ORTE_PROC_MY_NAME)));
>>>>
>>>> /* set the contact info into the hash table */
>>>> - if (ORTE_SUCCESS != (rc =
>>>> orte_rml.set_contact_info(rml_uri))) {
>>>> + if (ORTE_SUCCESS != (rc =
>>>> orte_rml.set_contact_info(uri))) {
>>>> ORTE_ERROR_LOG(rc);
>>>> - goto cleanup;
>>>> + return rc;
>>>> }
>>>> if (ORTE_SUCCESS != (rc = orte_routed.update_route(rproc,
>>>> rproc))) {
>>>> ORTE_ERROR_LOG(rc);
>>>> }
>>>> - goto cleanup;
>>>> + return ORTE_SUCCESS;
>>>> }
>>>>
>>>> /* the proc must be part of another job family. In this case, we
>>>> @@ -855,7 +823,7 @@
>>>> opal_dss.pack(&route, &cmd, 1, ORTE_RML_CMD);
>>>>
>>>> /* pack the HNP uri */
>>>> - opal_dss.pack(&route, &tmpstring, 1, OPAL_STRING);
>>>> + opal_dss.pack(&route, &uri, 1, OPAL_STRING);
>>>>
>>>> OPAL_OUTPUT_VERBOSE((3, ompi_dpm_base_output,
>>>> "%s dpm_parse_port: %s in diff job family -
>>>> sending update to %s",
>>>> @@ -867,7 +835,7 @@
>>>> ORTE_RML_TAG_RML_INFO_UPDATE,
>>>> 0))) {
>>>> ORTE_ERROR_LOG(rc);
>>>> OBJ_DESTRUCT(&route);
>>>> - goto cleanup;
>>>> + return rc;
>>>> }
>>>>
>>>> /* wait right here until the HNP acks the update to ensure that
>>>> @@ -886,7 +854,49 @@
>>>>
>>>> /* our get_route function automatically routes all messages for
>>>> * other job families via the HNP, so nothing more to do here
>>>> - */
>>>> + */
>>>> + return rc;
>>>> +}
>>>> +
>>>> +static int parse_port_name(char *port_name,
>>>> + orte_process_name_t *rproc,
>>>> + orte_rml_tag_t *tag)
>>>> +{
>>>> + char *tmpstring=NULL, *ptr, *rml_uri=NULL;
>>>> + int rc;
>>>> +
>>>> + /* don't mangle the port name */
>>>> + tmpstring = strdup(port_name);
>>>> +
>>>> + /* find the ':' demarking the RML tag we added to the end */
>>>> + if (NULL == (ptr = strrchr(tmpstring, ':'))) {
>>>> + rc = ORTE_ERR_NOT_FOUND;
>>>> + goto cleanup;
>>>> + }
>>>> +
>>>> + /* terminate the port_name at that location */
>>>> + *ptr = '\0';
>>>> + ptr++;
>>>> +
>>>> + /* convert the RML tag */
>>>> + sscanf(ptr,"%d", (int*)tag);
>>>> +
>>>> + /* now split out the second field - the uri of the remote
>>>> proc */
>>>> + if (NULL == (ptr = strchr(tmpstring, '+'))) {
>>>> + rc = ORTE_ERR_NOT_FOUND;
>>>> + goto cleanup;
>>>> + }
>>>> + *ptr = '\0';
>>>> + ptr++;
>>>> +
>>>> + /* save that info */
>>>> + rml_uri = strdup(ptr);
>>>> +
>>>> + /* extract the originating proc's name */
>>>> + if (ORTE_SUCCESS != (rc = orte_rml_base_parse_uris(ptr,
>>>> rproc, NULL))) {
>>>> + ORTE_ERROR_LOG(rc);
>>>> + goto cleanup;
>>>> + }
>>>> rc = ORTE_SUCCESS;
>>>>
>>>> cleanup:
>>>> _______________________________________________
>>>> svn mailing list
>>>> svn_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/svn
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel