Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r27035 - trunk/orte/util
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-08-14 16:23:38


yes tim, i'm aware of it - this just needed to be fixed quickly so lanl could operate.

On Aug 14, 2012, at 1:00 PM, Tim Mattox <timattox_at_[hidden]> wrote:

> Is a linear search actually necessary? Is there some order to the
> vpid's in the array?
> I would hope you could do a binary search, or if the vpid's are unordered, then
> hopefully this is a rarely invoked code path. Just thinking of scalability.
>
> On Tue, Aug 14, 2012 at 2:18 PM, <svn-commit-mailer_at_[hidden]> wrote:
>> Author: rhc (Ralph Castain)
>> Date: 2012-08-14 14:17:59 EDT (Tue, 14 Aug 2012)
>> New Revision: 27035
>> URL: https://svn.open-mpi.org/trac/ompi/changeset/27035
>>
>> Log:
>> We can't just lookup the node in the node pool by daemon vpid as the daemons aren't stored that way - this was done because when holes exist in daemon vpids, we can generate huge orte_node_pool arrays even when only a few daemons actually exist. So we have to search for the vpid in the array
>>
>> Text files modified:
>> trunk/orte/util/nidmap.c | 42 +++++++++++++++++++++++++++++++++++++--
>> 1 files changed, 39 insertions(+), 3 deletions(-)
>>
>> Modified: trunk/orte/util/nidmap.c
>> ==============================================================================
>> --- trunk/orte/util/nidmap.c Tue Aug 14 14:11:09 2012 (r27034)
>> +++ trunk/orte/util/nidmap.c 2012-08-14 14:17:59 EDT (Tue, 14 Aug 2012) (r27035)
>> @@ -1045,7 +1045,7 @@
>> orte_std_cntr_t n;
>> opal_buffer_t buf;
>> int rc, j, k;
>> - orte_job_t *jdata;
>> + orte_job_t *jdata, *daemons;
>> orte_proc_t *proc, *pptr;
>> orte_node_t *node, *nptr;
>> orte_proc_state_t *states=NULL;
>> @@ -1061,6 +1061,8 @@
>> goto cleanup;
>> }
>>
>> + daemons = orte_get_job_data_object(ORTE_PROC_MY_NAME->jobid);
>> +
>> n = 1;
>> /* cycle through the buffer */
>> while (ORTE_SUCCESS == (rc = opal_dss.unpack(&buf, &jobid, &n, ORTE_JOBID))) {
>> @@ -1167,10 +1169,44 @@
>> proc->name.vpid = i;
>> opal_pointer_array_set_item(jdata->procs, i, proc);
>> }
>> - if (NULL == (node = (orte_node_t*)opal_pointer_array_get_item(orte_node_pool, nodes[i]))) {
>> + /* we can't just lookup the node in the node pool by daemon vpid
>> + * as the daemons aren't stored that way - this was done because
>> + * when holes exist in daemon vpids, we can generate huge orte_node_pool
>> + * arrays even when only a few daemons actually exist. So we have to
>> + * search for the vpid in the array
>> + */
>> + node = NULL;
>> + for (j=0; j < orte_node_pool->size; j++) {
>> + if (NULL == (nptr = (orte_node_t*)opal_pointer_array_get_item(orte_node_pool, j))) {
>> + continue;
>> + }
>> + if (nptr->daemon->name.vpid == nodes[i]) {
>> + node = nptr;
>> + break;
>> + }
>> + }
>> + if (NULL == node) {
>> /* this should never happen, but protect ourselves anyway */
>> node = OBJ_NEW(orte_node_t);
>> - opal_pointer_array_set_item(orte_node_pool, nodes[i], node);
>> + /* find the daemon */
>> + found = false;
>> + for (j=0; j < daemons->procs->size; j++) {
>> + if (NULL == (pptr = (orte_proc_t*)opal_pointer_array_get_item(daemons->procs, j))) {
>> + continue;
>> + }
>> + if (pptr->name.vpid == nodes[i]) {
>> + found = true;
>> + break;
>> + }
>> + }
>> + if (!found) {
>> + pptr = OBJ_NEW(orte_proc_t);
>> + pptr->name.jobid = ORTE_PROC_MY_NAME->jobid;
>> + pptr->name.vpid = nodes[i];
>> + opal_pointer_array_set_item(daemons->procs, nodes[i], pptr);
>> + }
>> + node->daemon = pptr;
>> + opal_pointer_array_add(orte_node_pool, node);
>> }
>> if (NULL != proc->node) {
>> if (node != proc->node) {
>> _______________________________________________
>> svn-full mailing list
>> svn-full_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/svn-full
>
>
>
> --
> Tim Mattox, Ph.D. - I'm a bright... http://www.the-brights.net/
> timattox_at_[hidden] || tmattox_at_[hidden]
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel