Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] trunk hangs since r19010
From: Ralph Castain (rhc_at_[hidden])
Date: 2008-07-28 11:05:15


On Jul 28, 2008, at 8:52 AM, Lenny Verkhovsky wrote:

> only openib works for me too,
>
> but Glebs said to me once that it's illigal and I always need to use
> self btl.
>

Don't know - could be true. But if that is true, then we should check
to see if that condition is met and error out - with an appropriate
message - if so. Otherwise, how is a user supposed to know this
condition?

>
>
> On 7/28/08, Jeff Squyres <jsquyres_at_[hidden]> wrote:
> FWIW, all my MTT runs are hanging as well.
>
>
>
> On Jul 28, 2008, at 10:37 AM, Brad Benton wrote:
>
> My experience is the same a Lenny's. I've tested on x86_64 and
> ppc64 systems and tests using --mca btl openib,self hang in all
> cases.
>
> --brad
>
>
> 2008/7/28 Lenny Verkhovsky <lenny.verkhovsky_at_[hidden]>
> I failed to run on different nodes or on the same node via self,openib
>
>
>
>
> On 7/28/08, Ralph Castain <rhc_at_[hidden]> wrote:
> I checked this out some more and I believe it is ticket #1378
> related. We lock up if SM is included in the BTL's, which is what I
> had done on my test. If I ^sm, I can run fine.
>
>
> On Jul 28, 2008, at 6:41 AM, Ralph Castain wrote:
>
> It could also be something new. Brad and I noted on Fri that IB was
> locking up as soon as we tried any cross-node communications. Hadn't
> seen that before, and at least I haven't explored it further -
> planned to do so today.
>
>
> On Jul 28, 2008, at 6:01 AM, Lenny Verkhovsky wrote:
>
> I believe it it.
>
> On 7/28/08, Jeff Squyres <jsquyres_at_[hidden]> wrote: On Jul 28,
> 2008, at 7:51 AM, Jeff Squyres wrote:
>
> Is this related to r1378?
>
> Gah -- I meant #1378, meaning the "PML ob1 deadlock" ticket.
>
>
>
> On Jul 28, 2008, at 7:13 AM, Lenny Verkhovsky wrote:
>
> Hi,
>
> I experience hanging of tests ( latency ) since r19010
>
>
> Best Regards
>
> Lenny.
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> --
> Jeff Squyres
> Cisco Systems
>
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel