Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Is trunk broken ?
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-06-19 17:31:27


Yo Ralph --

Is the "bad" grpcomm component both new and the default? Further, is
the old "basic" grpcomm component now the non-default / testing
component?

If so, I wonder if what happened was that Pasha did an "svn up", but
without re-running autogen/configure, he wouldn't have seen the new
"bad" component and therefore was falling back on the old "basic"
component that is now the non-default / testing component...?

On Jun 19, 2008, at 4:21 PM, Pavel Shamis (Pasha) wrote:

> I did fresh check out and everything works well.
> So looks like some svn up screw my svn.
> Ralph, thanks for help !
>
> Ralph H Castain wrote:
>> Hmmm...something isn't right, Pasha. There is simply no way you
>> should be
>> encountering this error. You are picking up the wrong grpcomm module.
>>
>> I went ahead and fixed the grpcomm/basic module, but as I note in
>> the commit
>> message, that is now an experimental area. The grpcomm/bad module
>> is the
>> default for that reason.
>>
>> Check to ensure you have the orte/mca/grpcomm/bad directory, and
>> that it is
>> getting built. My guess is that you have a corrupted checkout or
>> build and
>> that the component is either missing or not getting built.
>>
>>
>> On 6/19/08 1:37 PM, "Pavel Shamis (Pasha)"
>> <pasha_at_[hidden]> wrote:
>>
>>
>>> Ralph H Castain wrote:
>>>
>>>> I can't find anything wrong so far. I'm waiting in a queue on
>>>> Odin to try
>>>> there since Jeff indicated you are using rsh as a launcher, and
>>>> that's the
>>>> only access I have to such an environment. Guess Odin is being
>>>> pounded
>>>> because the queue isn't going anywhere.
>>>>
>>> I use ssh., here is command line:
>>> ./bin/mpirun -np 2 -H sw214,sw214 -mca btl openib,sm,self
>>> ./osu_benchmarks-3.0/osu_latency
>>>
>>>> Meantime, I'm building on RoadRunner and will test there (TM
>>>> enviro).
>>>>
>>>>
>>>> On 6/19/08 1:18 PM, "Pavel Shamis (Pasha)" <pasha_at_[hidden]
>>>> > wrote:
>>>>
>>>>
>>>>>> You'll have to tell us something more than that, Pasha. What
>>>>>> kind of
>>>>>> environment, what rev level were you at, etc.
>>>>>>
>>>>> Ahh, sorry :) I run on Linux x86_64 Sles10 sp1. (Open MPI)
>>>>> 1.3a1r18682M
>>>>> , OFED 1.3.1
>>>>> Pasha.
>>>>>
>>>>>> So far as I know, the trunk is fine.
>>>>>>
>>>>>>
>>>>>> On 6/19/08 12:01 PM, "Pavel Shamis (Pasha)" <pasha_at_[hidden]
>>>>>> >
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>> I tried to run trunk on my machines and I got follow error:
>>>>>>>
>>>>>>> [sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would
>>>>>>> read past
>>>>>>> end of buffer in file base/grpcomm_base_modex.c at line 451
>>>>>>> [sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would
>>>>>>> read past
>>>>>>> end of buffer in file grpcomm_basic_module.c at line 560
>>>>>>> [sw214:04365]
>>>>>>> --------------------------------------------------------------------------
>>>>>>> It looks like MPI_INIT failed for some reason; your parallel
>>>>>>> process is
>>>>>>> likely to abort. There are many reasons that a parallel
>>>>>>> process can
>>>>>>> fail during MPI_INIT; some of which are due to configuration or
>>>>>>> environment
>>>>>>> problems. This failure appears to be an internal failure;
>>>>>>> here's some
>>>>>>> additional information (which may only be relevant to an Open
>>>>>>> MPI
>>>>>>> developer):
>>>>>>>
>>>>>>> orte_grpcomm_modex failed
>>>>>>> --> Returned "Data unpack would read past end of
>>>>>>> buffer" (-26) instead
>>>>>>> of "Success" (0)
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> devel_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
Cisco Systems