Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Error after ompi-restart
From: Leonardo Fialho (lfialho_at_[hidden])
Date: 2008-11-04 11:10:49


Josh,

It works fine for me. I think that it is the error.

Leonardo

Josh Hursey escribió:
> Leonardo,
>
> Sorry I have been really slow in replying, I have been pretty swamped
> lately.
>
> What version of the trunk are you using? I've been seeing C/R failures
> starting around r19872, but I haven't had time to focus on trying to
> find out what is going wrong.
>
> You may be right in your assessment below, I'll try to look into it
> this week. If you find that making this changes fixes your problem,
> let me know and I'll apply the patch.
>
> Thanks,
> Josh
>
> On Nov 4, 2008, at 10:16 AM, Leonardo Fialho wrote:
>
>> I´m not sure, but I think that line 659 on file
>> orte/mca/ess/env/ess_env_module.c should contain
>>
>> if (ORTE_SUCCESS != (ret =
>> orte_ess_base_build_nidmap(orte_process_info.sync_buf, &nidmap,
>> *jmap*))) {
>>
>> But actually it contains
>>
>> if (ORTE_SUCCESS != (ret =
>> orte_ess_base_build_nidmap(orte_process_info.sync_buf, &nidmap,
>> *&jmap->pmap*))) {
>>
>> No?
>>
>> Leonardo
>>
>>
>> Leonardo Fialho escribió:
>>> Hi All,
>>>
>>> I think that exists an error in the trunk version while trying to
>>> restore a checkpoint.
>>>
>>> The function orte_util_decode_pidmap while attempts to execute the
>>> following code
>>>
>>> /* store the data */
>>> for (i=0; i < num_procs; i++) {
>>> pmap.node = nodes[i];
>>> pmap.local_rank = local_rank[i];
>>> pmap.node_rank = node_rank[i];
>>> opal_value_array_set_item(procs, i, &pmap);
>>> }
>>>
>>> produces a segmentation fault
>>>
>>> [nodo2:18027] *** Process received signal ***
>>> [nodo2:18027] Signal: Segmentation fault (11)
>>> [nodo2:18027] Signal code: Address not mapped (1)
>>> [nodo2:18027] Failing at address: (nil)
>>>
>>> I was trying to trace the problem and I think that it occurs in the
>>> line opal_value_array_set_item(procs, i, &pmap);
>>>
>>> Thanks,
>>>
>>
>>
>> --
>> Leonardo Fialho
>> Computer Architecture and Operating Systems Department - CAOS
>> Universidad Autonoma de Barcelona - UAB
>> ETSE, Edifcio Q, QC/3088
>> http://www.caos.uab.es
>> Phone: +34-93-581-2888
>> Fax: +34-93-581-2478
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Leonardo Fialho
Computer Architecture and Operating Systems Department - CAOS
Universidad Autonoma de Barcelona - UAB
ETSE, Edifcio Q, QC/3088
http://www.caos.uab.es
Phone: +34-93-581-2888
Fax: +34-93-581-2478