Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Is trunk broken ?
From: Ralph Castain (rhc_at_[hidden])
Date: 2008-06-19 17:38:08


On 6/19/08 3:31 PM, "Jeff Squyres" <jsquyres_at_[hidden]> wrote:

> Yo Ralph --
>
> Is the "bad" grpcomm component both new and the default? Further, is
> the old "basic" grpcomm component now the non-default / testing
> component?

Yes to both

>
> If so, I wonder if what happened was that Pasha did an "svn up", but
> without re-running autogen/configure, he wouldn't have seen the new
> "bad" component and therefore was falling back on the old "basic"
> component that is now the non-default / testing component...?
>

Could be - though I thought that if you do a "make" in that situation, it
would force a re-autogen/configure when it saw a new component?

Of course, if he didn't do a "make" at the top level, and he is in a dynamic
build, then maybe OMPI wouldn't figure out that something was different...

Don't know - but we have had problems with svn missing things in the past
too, so it could be a number of things.

<shrug>

>
> On Jun 19, 2008, at 4:21 PM, Pavel Shamis (Pasha) wrote:
>
>> I did fresh check out and everything works well.
>> So looks like some svn up screw my svn.
>> Ralph, thanks for help !
>>
>> Ralph H Castain wrote:
>>> Hmmm...something isn't right, Pasha. There is simply no way you
>>> should be
>>> encountering this error. You are picking up the wrong grpcomm module.
>>>
>>> I went ahead and fixed the grpcomm/basic module, but as I note in
>>> the commit
>>> message, that is now an experimental area. The grpcomm/bad module
>>> is the
>>> default for that reason.
>>>
>>> Check to ensure you have the orte/mca/grpcomm/bad directory, and
>>> that it is
>>> getting built. My guess is that you have a corrupted checkout or
>>> build and
>>> that the component is either missing or not getting built.
>>>
>>>
>>> On 6/19/08 1:37 PM, "Pavel Shamis (Pasha)"
>>> <pasha_at_[hidden]> wrote:
>>>
>>>
>>>> Ralph H Castain wrote:
>>>>
>>>>> I can't find anything wrong so far. I'm waiting in a queue on
>>>>> Odin to try
>>>>> there since Jeff indicated you are using rsh as a launcher, and
>>>>> that's the
>>>>> only access I have to such an environment. Guess Odin is being
>>>>> pounded
>>>>> because the queue isn't going anywhere.
>>>>>
>>>> I use ssh., here is command line:
>>>> ./bin/mpirun -np 2 -H sw214,sw214 -mca btl openib,sm,self
>>>> ./osu_benchmarks-3.0/osu_latency
>>>>
>>>>> Meantime, I'm building on RoadRunner and will test there (TM
>>>>> enviro).
>>>>>
>>>>>
>>>>> On 6/19/08 1:18 PM, "Pavel Shamis (Pasha)" <pasha_at_[hidden]
>>>>>> wrote:
>>>>>
>>>>>
>>>>>>> You'll have to tell us something more than that, Pasha. What
>>>>>>> kind of
>>>>>>> environment, what rev level were you at, etc.
>>>>>>>
>>>>>> Ahh, sorry :) I run on Linux x86_64 Sles10 sp1. (Open MPI)
>>>>>> 1.3a1r18682M
>>>>>> , OFED 1.3.1
>>>>>> Pasha.
>>>>>>
>>>>>>> So far as I know, the trunk is fine.
>>>>>>>
>>>>>>>
>>>>>>> On 6/19/08 12:01 PM, "Pavel Shamis (Pasha)" <pasha_at_[hidden]
>>>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>> I tried to run trunk on my machines and I got follow error:
>>>>>>>>
>>>>>>>> [sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would
>>>>>>>> read past
>>>>>>>> end of buffer in file base/grpcomm_base_modex.c at line 451
>>>>>>>> [sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would
>>>>>>>> read past
>>>>>>>> end of buffer in file grpcomm_basic_module.c at line 560
>>>>>>>> [sw214:04365]
>>>>>>>> -----------------------------------------------------------------------
>>>>>>>> ---
>>>>>>>> It looks like MPI_INIT failed for some reason; your parallel
>>>>>>>> process is
>>>>>>>> likely to abort. There are many reasons that a parallel
>>>>>>>> process can
>>>>>>>> fail during MPI_INIT; some of which are due to configuration or
>>>>>>>> environment
>>>>>>>> problems. This failure appears to be an internal failure;
>>>>>>>> here's some
>>>>>>>> additional information (which may only be relevant to an Open
>>>>>>>> MPI
>>>>>>>> developer):
>>>>>>>>
>>>>>>>> orte_grpcomm_modex failed
>>>>>>>> --> Returned "Data unpack would read past end of
>>>>>>>> buffer" (-26) instead
>>>>>>>> of "Success" (0)
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> devel_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> devel_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>