Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] regression with derived datatypes
From: George Bosilca (bosilca_at_[hidden])
Date: 2014-05-08 03:03:21


Nathan, or anybody with access to the target hardware,

If you can provide a minimalistic output of the applications with and
without the above-mentioned patch and with mpi_ddt_unpack_debug and
mpi_ddt_pack_debug, and mpi_ddt_position_debug set to 1, I would try
to help.

  George.

On Thu, May 8, 2014 at 2:50 AM, Hjelm, Nathan T <hjelmn_at_[hidden]> wrote:
> Since I have a system that has the scif libraries installed I will try to reproduce and see if I can come up with a fix. It will probably be sometime next week at the earliest.
>
> -Nathan
> ________________________________________
> From: devel [devel-bounces_at_[hidden]] on behalf of Gilles Gouaillardet [gilles.gouaillardet_at_[hidden]]
> Sent: Wednesday, May 07, 2014 9:03 PM
> To: devel_at_[hidden]
> Subject: Re: [OMPI devel] regression with derived datatypes
>
> On 2014/05/08 2:15, Ralph Castain wrote:
>> I wonder if that might also explain the issue reported by Gilles regarding the scif BTL? In his example, the problem only occurred if the message was split across scif and vader. If so, then it might be that splitting messages in general is broken.
>>
> i am afraid there is a misunderstanding :
> the problem always occur with scif,vader,self (regardless the ompi v1.8
> version)
> the problem occurs with scif,self only if r31496 is applied to ompi v1.8
>
>
> In my previous email
> http://www.open-mpi.org/community/lists/devel/2014/05/14699.php
> i reported the following interesting fact :
>
> with ompi v1.8 (latest r31678), the following command produces incorrect
> results :
> mpirun -host localhost -np 2 --mca btl scif,self ./test_scif
>
> but with ompi v1.8 r31309, the very same command produces correct results
>
> Elena pointed that r31496 is a suspect. so i took the latest v1.8
> (r31678) and reverted r31496 and ...
>
>
> mpirun -host localhost -np 2 --mca btl scif,self ./test_scif
>
> works again !
>
> note that the "default"
> mpirun -host localhost -np 2 --mca btl scif,vader,self ./test_scif
> still produces incorrect results
>
> in order to reproduce the issue, a MIC is *not* needed,
> you only need to install the software stack, load the mic kernel module
> and make sure you can read/write /dev/mic/*
>
> bottom line, there are two issues here :
> 1) r31496 broke something : mpirun -np 2 -host localhost --mca btl
> scif,self ./test_scif
> 2) something else never worked : mpirun -np 2 -host localhost --mca btl
> scif,vader,self ./test_scif
>
> Gilles
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: http://www.open-mpi.org/community/lists/devel/2014/05/14739.php
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: http://www.open-mpi.org/community/lists/devel/2014/05/14742.php