Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Is trunk broken ?
From: Ralph H Castain (rhc_at_[hidden])
Date: 2008-06-19 15:43:35


Hmmm...something isn't right, Pasha. There is simply no way you should be
encountering this error. You are picking up the wrong grpcomm module.

I went ahead and fixed the grpcomm/basic module, but as I note in the commit
message, that is now an experimental area. The grpcomm/bad module is the
default for that reason.

Check to ensure you have the orte/mca/grpcomm/bad directory, and that it is
getting built. My guess is that you have a corrupted checkout or build and
that the component is either missing or not getting built.

On 6/19/08 1:37 PM, "Pavel Shamis (Pasha)" <pasha_at_[hidden]> wrote:

> Ralph H Castain wrote:
>> I can't find anything wrong so far. I'm waiting in a queue on Odin to try
>> there since Jeff indicated you are using rsh as a launcher, and that's the
>> only access I have to such an environment. Guess Odin is being pounded
>> because the queue isn't going anywhere.
>>
> I use ssh., here is command line:
> ./bin/mpirun -np 2 -H sw214,sw214 -mca btl openib,sm,self
> ./osu_benchmarks-3.0/osu_latency
>> Meantime, I'm building on RoadRunner and will test there (TM enviro).
>>
>>
>> On 6/19/08 1:18 PM, "Pavel Shamis (Pasha)" <pasha_at_[hidden]> wrote:
>>
>>
>>>> You'll have to tell us something more than that, Pasha. What kind of
>>>> environment, what rev level were you at, etc.
>>>>
>>>>
>>> Ahh, sorry :) I run on Linux x86_64 Sles10 sp1. (Open MPI) 1.3a1r18682M
>>> , OFED 1.3.1
>>> Pasha.
>>>
>>>> So far as I know, the trunk is fine.
>>>>
>>>>
>>>> On 6/19/08 12:01 PM, "Pavel Shamis (Pasha)" <pasha_at_[hidden]>
>>>> wrote:
>>>>
>>>>
>>>>
>>>>> I tried to run trunk on my machines and I got follow error:
>>>>>
>>>>> [sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past
>>>>> end of buffer in file base/grpcomm_base_modex.c at line 451
>>>>> [sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past
>>>>> end of buffer in file grpcomm_basic_module.c at line 560
>>>>> [sw214:04365]
>>>>> --------------------------------------------------------------------------
>>>>> It looks like MPI_INIT failed for some reason; your parallel process is
>>>>> likely to abort. There are many reasons that a parallel process can
>>>>> fail during MPI_INIT; some of which are due to configuration or
>>>>> environment
>>>>> problems. This failure appears to be an internal failure; here's some
>>>>> additional information (which may only be relevant to an Open MPI
>>>>> developer):
>>>>>
>>>>> orte_grpcomm_modex failed
>>>>> --> Returned "Data unpack would read past end of buffer" (-26) instead
>>>>> of "Success" (0)
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel