Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Nathan DeBardeleben (ndebard_at_[hidden])
Date: 2006-07-05 18:31:50


I'm running this on my mac where I expected to only get back the
localhost. I upgraded to 1.0.2 a little while back, had been using one
of the alphas (I think it was alpha 9 but I can't be sure) up until that
point when this function returned '1' on my mac.

-- Nathan
Correspondence
---------------------------------------------------------------------
Nathan DeBardeleben, Ph.D.
Los Alamos National Laboratory
Parallel Tools Team
High Performance Computing Environments
phone: 505-667-3428
email: ndebard_at_[hidden]
---------------------------------------------------------------------

Ralph H Castain wrote:
> Rc=0 indicates that the "get" function was successful, so this means that
> there were no nodes on the NODE_SEGMENT. Were you running this in an
> environment where nodes had been allocated to you? Or were you expecting to
> find only "localhost" on the segment?
>
> I'm not entirely sure, but I don't believe there have been significant
> changes in 1.0.2 for some time. My guess is that something has changed on
> your system as opposed to in the OpenMPI code you're using. Did you do an
> update recently and then begin seeing this behavior? Your revision level is
> 1000+ behind the current repository, so my guess is that you haven't updated
> for awhile - since 1.0.2 is under maintenance for bugs only, that shouldn't
> be a problem. I'm just trying to understand why your function is doing
> something different if the OpenMPI code your using hasn't changed.
>
> Ralph
>
>
>
> On 7/5/06 2:40 PM, "Nathan DeBardeleben" <ndebard_at_[hidden]> wrote:
>
>
>>> Open MPI: 1.0.2
>>> Open MPI SVN revision: r9571
>>>
>> The rc value returned by the 'get' call is '0'.
>> All I'm doing is calling init with my own daemon name, it's coming up
>> fine, then I immediately call this to figure out how many nodes are
>> associated with this machine.
>>
>> -- Nathan
>> Correspondence
>> ---------------------------------------------------------------------
>> Nathan DeBardeleben, Ph.D.
>> Los Alamos National Laboratory
>> Parallel Tools Team
>> High Performance Computing Environments
>> phone: 505-667-3428
>> email: ndebard_at_[hidden]
>> ---------------------------------------------------------------------
>>
>>
>>
>> Ralph H Castain wrote:
>>
>>> Hi Nathan
>>>
>>> Could you tell us which version of the code you are using, and print out the
>>> rc value that was returned by the "get" call? I see nothing obviously wrong
>>> with the code, but much depends on what happened prior to this call too.
>>>
>>> BTW: you might want to release the memory stored in the returned values - it
>>> could represent a substantial memory leak.
>>>
>>> Ralph
>>>
>>>
>>>
>>> On 7/5/06 9:28 AM, "Nathan DeBardeleben" <ndebard_at_[hidden]> wrote:
>>>
>>>
>>>
>>>> I used to use this code to get the number of nodes in a cluster /
>>>> machine / whatever:
>>>>
>>>>
>>>>> int
>>>>> get_num_nodes(void)
>>>>> {
>>>>> int rc;
>>>>> size_t cnt;
>>>>> orte_gpr_value_t **values;
>>>>>
>>>>> rc = orte_gpr.get(ORTE_GPR_KEYS_OR|ORTE_GPR_TOKENS_OR,
>>>>> ORTE_NODE_SEGMENT, NULL, NULL, &cnt, &values);
>>>>>
>>>>> if(rc != ORTE_SUCCESS) {
>>>>> return 0;
>>>>> }
>>>>>
>>>>> return cnt;
>>>>> }
>>>>>
>>>>>
>>>> This now returns '0' on my MAC when it used to return 1. Is this not an
>>>> acceptable way of doing this? Is there a cleaner / better way these days?
>>>>
>>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>