Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [OMPI svn] svn:open-mpi r31577 - trunk/ompi/mca/rte/base
From: George Bosilca (bosilca_at_[hidden])
Date: 2014-04-30 22:06:04


Another solution will be to get rid of ompi_process_name_t and use everywhere the opal_identifier_t. We can shift the responsibility to convert between this unique identifier (and we have 64 bits to ensure uniqueness) and the orte_process_name_t to the RTE. Similar to the opal_help we can implement:
- a naming function converting an opal_identifier_t into a const string
- a comparaison function comparing two opal_identifier_t for identical jobid, and so on
- maybe an accessor for the jobid/vpid.

These function will default to some trivial local information, but will be changed by the RTE ASAP, allowing simple and flexible access to the RTE naming scheme. Moreover, this information will only be used sporadically (during connection setup and error reporting), so using functions will not have a significant overhead. Btw, this is the approach I implemented into the OPAL BTL branch, and seems to work pretty well so far.

  George.

On Apr 30, 2014, at 22:01 , George Bosilca <bosilca_at_[hidden]> wrote:

> On Apr 30, 2014, at 20:04 , Jeff Squyres (jsquyres) <jsquyres_at_[hidden]> wrote:
>
>> All we need in usnic is the ompi_process_name_t of the local process leader. However that happens is fine with me. :-)
>
> Why do you need the ompi_process_name_t? Isn’t the opal_identifier_t enough to dig for the info of the peer into the opal_db?
>
> George.
>
>
>
>>
>>
>> On Apr 30, 2014, at 6:46 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>
>>> George makes a fair point - I was unaware they were hashing down to the opal_identifier_t. Only real requirement is that there be some way to reduce to 64-bits when accessing the common data.
>>>
>>> I think the usnic BTL may have an issue with that approach, so maybe some way of "unhashing" will be required?
>>>
>>>
>>> On Apr 30, 2014, at 3:42 PM, Jeff Squyres (jsquyres) <jsquyres_at_[hidden]> wrote:
>>>
>>>> On Apr 30, 2014, at 6:35 PM, George Bosilca <bosilca_at_[hidden]> wrote:
>>>>
>>>>> Puzzling. We survived so far without such a requirement.
>>>>
>>>> Ralph tells me that this is a requirement. So I figured we should check for it.
>>>>
>>>>> In the BTLs where we needed a 64 bits entity corresponding to the ompi_process_name_t we took advantage of the ompi_rte_hash_name function. This function is supposed to convert from an ompi_process_name_t to a uint64_t (in fact an opal_identifier_t) which can be then used to handle hash tables.
>>>>
>>>> ...I don't really care, either way. I'll do whatever you guys tell me to do here. :-)
>>>>
>>>> I put that assert there because Ralph told me it was a requirement, and I now have code in the usnic BTL that is doing a memcpy from a union uint64_t member to an ompi_process_name_t, and it assumes that the sizes are guaranteed to be the same.
>>>>
>>>> If we want to do it some other way, that's fine with me, too.
>>>>
>>>>
>>>>> George.
>>>>>
>>>>>
>>>>> On Apr 30, 2014, at 18:12 , svn-commit-mailer_at_[hidden] wrote:
>>>>>
>>>>>> Author: jsquyres (Jeff Squyres)
>>>>>> Date: 2014-04-30 18:12:54 EDT (Wed, 30 Apr 2014)
>>>>>> New Revision: 31577
>>>>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/31577
>>>>>>
>>>>>> Log:
>>>>>> rte_base_frame.c: add sanity check to ensure proper sizes
>>>>>>
>>>>>> There's a requirement in several places (e.g., opal dstore) that
>>>>>> sizeof(ompi_process_name_t) -- which comes from the compile-time
>>>>>> selected ompi/mca/rte component -- is equal to sizeof(uint64_t). If
>>>>>> it's not, Bad Things will happen.
>>>>>>
>>>>>> So put an assert here to catch that case.
>>>>>>
>>>>>> Text files modified:
>>>>>> trunk/ompi/mca/rte/base/rte_base_frame.c | 10 +++++++++-
>>>>>> 1 files changed, 9 insertions(+), 1 deletions(-)
>>>>>>
>>>>>> Modified: trunk/ompi/mca/rte/base/rte_base_frame.c
>>>>>> ==============================================================================
>>>>>> --- trunk/ompi/mca/rte/base/rte_base_frame.c Wed Apr 30 18:10:30 2014 (r31576)
>>>>>> +++ trunk/ompi/mca/rte/base/rte_base_frame.c 2014-04-30 18:12:54 EDT (Wed, 30 Apr 2014) (r31577)
>>>>>> @@ -1,6 +1,7 @@
>>>>>> /*
>>>>>> * Copyright (c) 2012-2013 Los Alamos National Security, LLC.
>>>>>> * All rights reserved.
>>>>>> + * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved.
>>>>>> * $COPYRIGHT$
>>>>>> *
>>>>>> * Additional copyrights may follow
>>>>>> @@ -12,6 +13,7 @@
>>>>>> #include "ompi_config.h"
>>>>>> #include "ompi/constants.h"
>>>>>>
>>>>>> +#include "opal_stdint.h"
>>>>>> #include "opal/util/output.h"
>>>>>> #include "opal/mca/mca.h"
>>>>>> #include "opal/mca/base/base.h"
>>>>>> @@ -36,7 +38,13 @@
>>>>>> static int ompi_rte_base_open(mca_base_open_flag_t flags)
>>>>>> {
>>>>>> /* Open up all available components */
>>>>>> - return mca_base_framework_components_open(&ompi_rte_base_framework, flags);
>>>>>> + int ret = mca_base_framework_components_open(&ompi_rte_base_framework, flags);
>>>>>> +
>>>>>> + /* Sanity check. Many things will break if this is not true
>>>>>> + (e.g., opal dstore needs this to be true). */
>>>>>> + assert(sizeof(ompi_process_name_t) == sizeof(uint64_t));
>>>>>> +
>>>>>> + return ret;
>>>>>> }
>>>>>>
>>>>>> MCA_BASE_FRAMEWORK_DECLARE(ompi, rte, "OMPI Run-Time Environment Interface", NULL,
>>>>>> _______________________________________________
>>>>>> svn mailing list
>>>>>> svn_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/svn
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> Link to this post: http://www.open-mpi.org/community/lists/devel/2014/04/14663.php
>>>>
>>>>
>>>> --
>>>> Jeff Squyres
>>>> jsquyres_at_[hidden]
>>>> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> Link to this post: http://www.open-mpi.org/community/lists/devel/2014/04/14664.php
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: http://www.open-mpi.org/community/lists/devel/2014/04/14667.php
>>
>>
>> --
>> Jeff Squyres
>> jsquyres_at_[hidden]
>> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: http://www.open-mpi.org/community/lists/devel/2014/04/14668.php
>