Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [OMPI svn] svn:open-mpi r21548
From: George Bosilca (bosilca_at_[hidden])
Date: 2009-07-01 18:00:55


On Wed, 1 Jul 2009, Ralph Castain wrote:

>
> Okay, let me know. I'll test some more here.
>

Problem fixed.

   Thanks,
     george.

> Thanks again for catching it.
> Ralph
>
>>
>> Thanks,
>> george.
>>
>> On Wed, 1 Jul 2009, Ralph Castain wrote:
>>
>>> Believe this is now fixed with r21582 - let me know if it now works for
>>> you.
>>> Sorry for the problem. It was indeed miscounting the number of daemons in
>>> the system, though apparently
>>> this wasn't causing problems for slurm and torque (still investigating why
>>> since it should have).
>>> Unfortunately, just changing the index caused shared memory to think
>>> everyone was remote, so the fix was a
>>> tad more involved - though not particularly difficult.
>>> Ralph
>>> On Wed, Jul 1, 2009 at 2:06 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>> Hmmm...I'll take a look. It seems to be working for me under Torque
>>> and SLURM, though I cannot
>>> vouch for the tree launch. The problem with letting the index start at
>>> 0 is it breaks other
>>> things, so I'll have to see about fixing the routing schemes, or find
>>> some compromise.
>>>
>>> Thanks for the heads up.
>>> Ralph
>>> On Wed, Jul 1, 2009 at 1:49 PM, George Bosilca <bosilca_at_[hidden]>
>>> wrote:
>>> Ralph,
>>>
>>> This commit break several components in OMPI, mainly the routing
>>> schemes and the tree
>>> launch. The part with the problem is the reduction of the number of
>>> declared daemons on
>>> the second part of the commit, where you change the boundary for the
>>> for loop from 0 to
>>> 1. As a result the number of daemons was decreased by one (I guess in
>>> order to exclude
>>> the HNP), which is not something that the routing implementations
>>> tolerate.
>>>
>>> Setting the loop boundary back to 0 seems to fix all problems. Please
>>> reconsider your
>>> patch.
>>>
>>> george.
>>>
>>> On Fri, 26 Jun 2009, rhc_at_[hidden] wrote:
>>>
>>> Author: rhc
>>> Date: 2009-06-26 18:07:25 EDT (Fri, 26 Jun 2009)
>>> New Revision: 21548
>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/21548
>>>
>>> Log:
>>> Cleanup some indexing bugs so that shared memory can function
>>>
>>> Text files modified:
>>> trunk/orte/util/nidmap.c | 12 +++++++-----
>>> 1 files changed, 7 insertions(+), 5 deletions(-)
>>>
>>> Modified: trunk/orte/util/nidmap.c
>>> ==============================================================================
>>> --- trunk/orte/util/nidmap.c (original)
>>> +++ trunk/orte/util/nidmap.c 2009-06-26 18:07:25 EDT (Fri, 26
>>> Jun 2009)
>>> @@ -341,10 +341,10 @@
>>>
>>> /* pack every nodename individually */
>>> for (i=1; i < orte_node_pool->size; i++) {
>>> + if (NULL == (node =
>>> (orte_node_t*)opal_pointer_array_get_item(orte_node_pool, i))) {
>>> + continue;
>>> + }
>>> if (!orte_keep_fqdn_hostnames) {
>>> - if (NULL == (node =
>>> (orte_node_t*)opal_pointer_array_get_item(orte_node_pool, i))) {
>>> - continue;
>>> - }
>>> nodename = strdup(node->name);
>>> if (NULL != (ptr = strchr(nodename, '.'))) {
>>> *ptr = '\0';
>>> @@ -553,6 +553,8 @@
>>> ORTE_ERROR_LOG(rc);
>>> return rc;
>>> }
>>> + /* set the daemon to 0 */
>>> + node->daemon = 0;
>>>
>>> /* loop over nodes and unpack the raw nodename */
>>> for (i=1; i < num_nodes; i++) {
>>> @@ -570,7 +572,7 @@
>>> }
>>> }
>>>
>>> - /* unpack the daemon names */
>>> + /* unpack the daemon vpids */
>>> vpids = (orte_vpid_t*)malloc(num_nodes *
>>> sizeof(orte_vpid_t));
>>> n=num_nodes;
>>> if (ORTE_SUCCESS != (rc = opal_dss.unpack(&buf, vpids, &n,
>>> ORTE_VPID))) {
>>> @@ -581,7 +583,7 @@
>>> * daemons in the system
>>> */
>>> num_daemons = 0;
>>> - for (i=0; i < num_nodes; i++) {
>>> + for (i=1; i < num_nodes; i++) {
>>> if (NULL != (ndptr =
>>> (orte_nid_t*)opal_pointer_array_get_item(&orte_nidmap, i))) {
>>> ndptr->daemon = vpids[i];
>>> if (ORTE_VPID_INVALID != vpids[i]) {
>>> _______________________________________________
>>> svn mailing list
>>> svn_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/svn
>>>
>>> "We must accept finite disappointment, but we must never lose infinite
>>> hope."
>>> Martin Luther King
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>> "We must accept finite disappointment, but we must never lose infinite
>> hope."
>> Martin Luther
>> King_______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

"We must accept finite disappointment, but we must never lose infinite
hope."
                                   Martin Luther King