Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] regression with derived datatypes
From: Ralph Castain (rhc_at_[hidden])
Date: 2014-05-07 13:15:59


I wonder if that might also explain the issue reported by Gilles regarding the scif BTL? In his example, the problem only occurred if the message was split across scif and vader. If so, then it might be that splitting messages in general is broken.

On May 7, 2014, at 10:11 AM, Rolf vandeVaart <rvandevaart_at_[hidden]> wrote:

> OK. So, I investigated a little more. I only see the issue when I am running with multiple ports enabled such that I have two openib BTLs instantiated. In addition, large message RDMA has to be enabled. If those conditions are not met, then I do not see the problem. For example:
> FAILS:
> Ø mpirun –np 2 –host host1,host2 –mca btl_openib_if_include mlx5_0:1,mlx5_0:2 –mca btl_openib_flags 3 MPI_Isend_ator_c
> PASS:
> Ø mpirun –np 2 –host host1,host2 –mca btl_openib_if_include mlx5_0:1 –mca btl_openib_flags 3 MPI_Isend_ator_c
> Ø mpirun –np 2 –host host1,host2 –mca btl_openib_if_include_mlx5:0:1,mlx5_0:2 –mca btl_openib_flags 1 MPI_Isend_ator_c
>
> So we must have some type of issue when we break up the message between the two openib BTLs. Maybe someone else can confirm my observations?
> I was testing against the latest trunk.
>
> Rolf
>
> From: devel [mailto:devel-bounces_at_[hidden]] On Behalf Of Joshua Ladd
> Sent: Wednesday, May 07, 2014 10:48 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] regression with derived datatypes
>
> Rolf,
>
> This was run on a Sandy Bridge system with ConnectX-3 cards.
>
> Josh
>
>
> On Wed, May 7, 2014 at 10:46 AM, Joshua Ladd <jladd.mlnx_at_[hidden]> wrote:
> Elena, can you run your reproducer on the trunk, please, and see if the problem persists?
>
> Josh
>
>
> On Wed, May 7, 2014 at 10:26 AM, Jeff Squyres (jsquyres) <jsquyres_at_[hidden]> wrote:
> On May 7, 2014, at 10:03 AM, Elena Elkina <elena.elkina_at_[hidden]> wrote:
>
> > Yes, this commit is also in the trunk.
>
> Yes, I understand that -- my question is: is this same *behavior* happening on the trunk. I.e., is there some other effect on the trunk that is causing the bad behavior to not occur?
>
> > Best,
> > Elena
> >
> >
> > On Wed, May 7, 2014 at 5:45 PM, Jeff Squyres (jsquyres) <jsquyres_at_[hidden]> wrote:
> > Is this also happening on the trunk?
> >
> >
> > Sent from my phone. No type good.
> >
> > On May 7, 2014, at 9:44 AM, "Elena Elkina" <elena.elkina_at_[hidden]> wrote:
> >
> >> Sorry,
> >>
> >> Fixes #4501: Datatype unpack code produces incorrect results in some case
> >>
> >> ---svn-pre-commit-ignore-below---
> >>
> >> r31370 [[BR]]
> >> Reshape all the packing/unpacking functions to use the same skeleton. Rewrite the
> >> generic_unpacking to take advantage of the same capabilitites.
> >>
> >> r31380 [[BR]]
> >> Remove a non-necessary label.
> >>
> >> r31387 [[BR]]
> >> Correctly save the displacement for the case where the convertor is not
> >> completed. As we need to have the right displacement at the beginning
> >> of the next call, we should save the position relative to the beginning
> >> of the buffer and not to the last loop.
> >>
> >> Best regards,
> >> Elena
> >>
> >>
> >> On Wed, May 7, 2014 at 5:43 PM, Jeff Squyres (jsquyres) <jsquyres_at_[hidden]> wrote:
> >> Can you cite the branch and SVN r number?
> >>
> >> Sent from my phone. No type good.
> >>
> >> > On May 7, 2014, at 9:24 AM, "Elena Elkina" <elena.elkina_at_[hidden]> wrote:
> >> >
> >> > b531973419a056696e6f88d813769aa4f1f1aee6
> >> _______________________________________________
> >> devel mailing list
> >> devel_at_[hidden]
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post: http://www.open-mpi.org/community/lists/devel/2014/05/14701.php
> >>
> >> _______________________________________________
> >> devel mailing list
> >> devel_at_[hidden]
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post: http://www.open-mpi.org/community/lists/devel/2014/05/14702.php
> >
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post: http://www.open-mpi.org/community/lists/devel/2014/05/14703.php
> >
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post: http://www.open-mpi.org/community/lists/devel/2014/05/14704.php
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: http://www.open-mpi.org/community/lists/devel/2014/05/14706.php
>
>
> This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: http://www.open-mpi.org/community/lists/devel/2014/05/14720.php