Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] Is trunk broken ?
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-06-19 17:31:27


Yo Ralph --

Is the "bad" grpcomm component both new and the default? Further, is
the old "basic" grpcomm component now the non-default / testing
component?

If so, I wonder if what happened was that Pasha did an "svn up", but
without re-running autogen/configure, he wouldn't have seen the new
"bad" component and therefore was falling back on the old "basic"
component that is now the non-default / testing component...?

On Jun 19, 2008, at 4:21 PM, Pavel Shamis (Pasha) wrote:

> I did fresh check out and everything works well.
> So looks like some svn up screw my svn.
> Ralph, thanks for help !
>
> Ralph H Castain wrote:
>> Hmmm...something isn't right, Pasha. There is simply no way you
>> should be
>> encountering this error. You are picking up the wrong grpcomm module.
>>
>> I went ahead and fixed the grpcomm/basic module, but as I note in
>> the commit
>> message, that is now an experimental area. The grpcomm/bad module
>> is the
>> default for that reason.
>>
>> Check to ensure you have the orte/mca/grpcomm/bad directory, and
>> that it is
>> getting built. My guess is that you have a corrupted checkout or
>> build and
>> that the component is either missing or not getting built.
>>
>>
>> On 6/19/08 1:37 PM, "Pavel Shamis (Pasha)"
>> <pasha_at_[hidden]> wrote:
>>
>>
>>> Ralph H Castain wrote:
>>>
>>>> I can't find anything wrong so far. I'm waiting in a queue on
>>>> Odin to try
>>>> there since Jeff indicated you are using rsh as a launcher, and
>>>> that's the
>>>> only access I have to such an environment. Guess Odin is being
>>>> pounded
>>>> because the queue isn't going anywhere.
>>>>
>>> I use ssh., here is command line:
>>> ./bin/mpirun -np 2 -H sw214,sw214 -mca btl openib,sm,self
>>> ./osu_benchmarks-3.0/osu_latency
>>>
>>>> Meantime, I'm building on RoadRunner and will test there (TM
>>>> enviro).
>>>>
>>>>
>>>> On 6/19/08 1:18 PM, "Pavel Shamis (Pasha)" <pasha_at_[hidden]
>>>> > wrote:
>>>>
>>>>
>>>>>> You'll have to tell us something more than that, Pasha. What
>>>>>> kind of
>>>>>> environment, what rev level were you at, etc.
>>>>>>
>>>>> Ahh, sorry :) I run on Linux x86_64 Sles10 sp1. (Open MPI)
>>>>> 1.3a1r18682M
>>>>> , OFED 1.3.1
>>>>> Pasha.
>>>>>
>>>>>> So far as I know, the trunk is fine.
>>>>>>
>>>>>>
>>>>>> On 6/19/08 12:01 PM, "Pavel Shamis (Pasha)" <pasha_at_[hidden]
>>>>>> >
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>> I tried to run trunk on my machines and I got follow error:
>>>>>>>
>>>>>>> [sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would
>>>>>>> read past
>>>>>>> end of buffer in file base/grpcomm_base_modex.c at line 451
>>>>>>> [sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would
>>>>>>> read past
>>>>>>> end of buffer in file grpcomm_basic_module.c at line 560
>>>>>>> [sw214:04365]
>>>>>>> --------------------------------------------------------------------------
>>>>>>> It looks like MPI_INIT failed for some reason; your parallel
>>>>>>> process is
>>>>>>> likely to abort. There are many reasons that a parallel
>>>>>>> process can
>>>>>>> fail during MPI_INIT; some of which are due to configuration or
>>>>>>> environment
>>>>>>> problems. This failure appears to be an internal failure;
>>>>>>> here's some
>>>>>>> additional information (which may only be relevant to an Open
>>>>>>> MPI
>>>>>>> developer):
>>>>>>>
>>>>>>> orte_grpcomm_modex failed
>>>>>>> --> Returned "Data unpack would read past end of
>>>>>>> buffer" (-26) instead
>>>>>>> of "Success" (0)
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> devel_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
Cisco Systems