Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [OMPI svn] svn:open-mpi r31577 - trunk/ompi/mca/rte/base
From: Ralph Castain (rhc_at_[hidden])
Date: 2014-05-01 00:31:35


A somewhat simpler solution (since we do use the jobid and vpid in various places) might be to just define an OMPI_PROCESS_NAME_T data type in the rte.h files for each rte, and then let opal_dstore simply store and retrieve that type. Problem solved - except you still need to have a jobid and vpid defined in a struct format in the result, or else we are back to providing accessors - which I remind everyone was rejected by this community years ago when it was implemented in the 1.2 series (you all yelled at me for it).

Even simpler solution: revert back to the initial agreement made when we created the RTE abstraction that the process name is 64-bits and the problem goes away. If/when someone comes along who truly needs something else, and can make a compelling case for the change, then we can worry about complicating things. Why make life harder in the meantime? I think we all have higher priority work on our plate.

Jeff and others are using opal_dstore to retrieve the ompi_process_name_t of another process - in Jeff's case, it is local_rank=0 for this node. So he asks for the local "leader" for his proc, and gets the other proc's name back.

On Apr 30, 2014, at 7:06 PM, George Bosilca <bosilca_at_[hidden]> wrote:

> Another solution will be to get rid of ompi_process_name_t and use everywhere the opal_identifier_t. We can shift the responsibility to convert between this unique identifier (and we have 64 bits to ensure uniqueness) and the orte_process_name_t to the RTE. Similar to the opal_help we can implement:
> - a naming function converting an opal_identifier_t into a const string
> - a comparaison function comparing two opal_identifier_t for identical jobid, and so on
> - maybe an accessor for the jobid/vpid.
>
> These function will default to some trivial local information, but will be changed by the RTE ASAP, allowing simple and flexible access to the RTE naming scheme. Moreover, this information will only be used sporadically (during connection setup and error reporting), so using functions will not have a significant overhead. Btw, this is the approach I implemented into the OPAL BTL branch, and seems to work pretty well so far.
>
> George.
>
>
>
> On Apr 30, 2014, at 22:01 , George Bosilca <bosilca_at_[hidden]> wrote:
>
>> On Apr 30, 2014, at 20:04 , Jeff Squyres (jsquyres) <jsquyres_at_[hidden]> wrote:
>>
>>> All we need in usnic is the ompi_process_name_t of the local process leader. However that happens is fine with me. :-)
>>
>> Why do you need the ompi_process_name_t? Isn’t the opal_identifier_t enough to dig for the info of the peer into the opal_db?
>>
>> George.
>>
>>
>>
>>>
>>>
>>> On Apr 30, 2014, at 6:46 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>>
>>>> George makes a fair point - I was unaware they were hashing down to the opal_identifier_t. Only real requirement is that there be some way to reduce to 64-bits when accessing the common data.
>>>>
>>>> I think the usnic BTL may have an issue with that approach, so maybe some way of "unhashing" will be required?
>>>>
>>>>
>>>> On Apr 30, 2014, at 3:42 PM, Jeff Squyres (jsquyres) <jsquyres_at_[hidden]> wrote:
>>>>
>>>>> On Apr 30, 2014, at 6:35 PM, George Bosilca <bosilca_at_[hidden]> wrote:
>>>>>
>>>>>> Puzzling. We survived so far without such a requirement.
>>>>>
>>>>> Ralph tells me that this is a requirement. So I figured we should check for it.
>>>>>
>>>>>> In the BTLs where we needed a 64 bits entity corresponding to the ompi_process_name_t we took advantage of the ompi_rte_hash_name function. This function is supposed to convert from an ompi_process_name_t to a uint64_t (in fact an opal_identifier_t) which can be then used to handle hash tables.
>>>>>
>>>>> ...I don't really care, either way. I'll do whatever you guys tell me to do here. :-)
>>>>>
>>>>> I put that assert there because Ralph told me it was a requirement, and I now have code in the usnic BTL that is doing a memcpy from a union uint64_t member to an ompi_process_name_t, and it assumes that the sizes are guaranteed to be the same.
>>>>>
>>>>> If we want to do it some other way, that's fine with me, too.
>>>>>
>>>>>
>>>>>> George.
>>>>>>
>>>>>>
>>>>>> On Apr 30, 2014, at 18:12 , svn-commit-mailer_at_[hidden] wrote:
>>>>>>
>>>>>>> Author: jsquyres (Jeff Squyres)
>>>>>>> Date: 2014-04-30 18:12:54 EDT (Wed, 30 Apr 2014)
>>>>>>> New Revision: 31577
>>>>>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/31577
>>>>>>>
>>>>>>> Log:
>>>>>>> rte_base_frame.c: add sanity check to ensure proper sizes
>>>>>>>
>>>>>>> There's a requirement in several places (e.g., opal dstore) that
>>>>>>> sizeof(ompi_process_name_t) -- which comes from the compile-time
>>>>>>> selected ompi/mca/rte component -- is equal to sizeof(uint64_t). If
>>>>>>> it's not, Bad Things will happen.
>>>>>>>
>>>>>>> So put an assert here to catch that case.
>>>>>>>
>>>>>>> Text files modified:
>>>>>>> trunk/ompi/mca/rte/base/rte_base_frame.c | 10 +++++++++-
>>>>>>> 1 files changed, 9 insertions(+), 1 deletions(-)
>>>>>>>
>>>>>>> Modified: trunk/ompi/mca/rte/base/rte_base_frame.c
>>>>>>> ==============================================================================
>>>>>>> --- trunk/ompi/mca/rte/base/rte_base_frame.c Wed Apr 30 18:10:30 2014 (r31576)
>>>>>>> +++ trunk/ompi/mca/rte/base/rte_base_frame.c 2014-04-30 18:12:54 EDT (Wed, 30 Apr 2014) (r31577)
>>>>>>> @@ -1,6 +1,7 @@
>>>>>>> /*
>>>>>>> * Copyright (c) 2012-2013 Los Alamos National Security, LLC.
>>>>>>> * All rights reserved.
>>>>>>> + * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved.
>>>>>>> * $COPYRIGHT$
>>>>>>> *
>>>>>>> * Additional copyrights may follow
>>>>>>> @@ -12,6 +13,7 @@
>>>>>>> #include "ompi_config.h"
>>>>>>> #include "ompi/constants.h"
>>>>>>>
>>>>>>> +#include "opal_stdint.h"
>>>>>>> #include "opal/util/output.h"
>>>>>>> #include "opal/mca/mca.h"
>>>>>>> #include "opal/mca/base/base.h"
>>>>>>> @@ -36,7 +38,13 @@
>>>>>>> static int ompi_rte_base_open(mca_base_open_flag_t flags)
>>>>>>> {
>>>>>>> /* Open up all available components */
>>>>>>> - return mca_base_framework_components_open(&ompi_rte_base_framework, flags);
>>>>>>> + int ret = mca_base_framework_components_open(&ompi_rte_base_framework, flags);
>>>>>>> +
>>>>>>> + /* Sanity check. Many things will break if this is not true
>>>>>>> + (e.g., opal dstore needs this to be true). */
>>>>>>> + assert(sizeof(ompi_process_name_t) == sizeof(uint64_t));
>>>>>>> +
>>>>>>> + return ret;
>>>>>>> }
>>>>>>>
>>>>>>> MCA_BASE_FRAMEWORK_DECLARE(ompi, rte, "OMPI Run-Time Environment Interface", NULL,
>>>>>>> _______________________________________________
>>>>>>> svn mailing list
>>>>>>> svn_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/svn
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> Link to this post: http://www.open-mpi.org/community/lists/devel/2014/04/14663.php
>>>>>
>>>>>
>>>>> --
>>>>> Jeff Squyres
>>>>> jsquyres_at_[hidden]
>>>>> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> Link to this post: http://www.open-mpi.org/community/lists/devel/2014/04/14664.php
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> Link to this post: http://www.open-mpi.org/community/lists/devel/2014/04/14667.php
>>>
>>>
>>> --
>>> Jeff Squyres
>>> jsquyres_at_[hidden]
>>> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: http://www.open-mpi.org/community/lists/devel/2014/04/14668.php
>>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: http://www.open-mpi.org/community/lists/devel/2014/04/14670.php