Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: George Bosilca (bosilca_at_[hidden])
Date: 2006-02-12 15:29:12


Yvan,

It's now corrected. Please use the trunk (nightly builds) starting from
revision 8997 or wait 'til monday when we will update the next stable
candidate. If you are in a hurry and feel like playing around with the MPI
code, you can apply the attached patch to the latest stable.

   Thanks,
     george.

On Fri, 10 Feb 2006, George Bosilca wrote:

> Yvan,
>
> I'm looking into this one. So far I cannot reproduce it with the
> current version from the trunk. I will look into the stable versions.
> Until I figure out what's wrong, can you please use the nightly
> builds to run your test. Once the problem get fixed it will be
> included in the 1.0.2 release.
>
> BTW, which interconnect are you using ? Ethernet ?
>
> Thanks,
> george.
>
> On Feb 10, 2006, at 5:06 PM, Yvan Fournier wrote:
>
>> Hello,
>>
>> I seem to have encountered a bug in Open MPI 1.0 using indexed
>> datatypes
>> with MPI_Recv (which seems to be of the "off by one" sort). I have
>> joined a test case, which is briefly explained below (as well as in
>> the
>> source file). This case should run on two processes. I observed the
>> bug
>> on 2 different Linux systems (single processor Centrino under Suse
>> 10.0
>> with gcc 4.0.2, dual-processor Xeon under Debian Sarge with gcc 3.4)
>> with Open MPI 1.0.1, and do not observe it using LAM 7.1.1 or MPICH2.
>>
>> Here is a summary of the case:
>>
>> ------------------
>>
>> Each processor reads a file ("data_p0" or "data_p1") giving a list of
>> global element ids. Some elements (vertices from a partitionned mesh)
>> may belong to both processors, so their id's may appear on both
>> processors: we have 7178 global vertices, 3654 and 3688 of them being
>> known by ranks 0 and 1 respectively.
>>
>> In this simplified version, we assign coordinates {x, y, z} to each
>> vertex equal to it's global id number for rank 1, and the negative of
>> that for rank 0 (assigning the same values to x, y, and z). After
>> finishing the "ordered gather", rank 0 prints the global id and
>> coordinates of each vertex.
>>
>> lines should print (for example) as:
>> 6456 ; 6455.00000 6455.00000 6456.00000
>> 6457 ; -6457.00000 -6457.00000 -6457.00000
>> depending on whether a vertex belongs only to rank 0 (negative
>> coordinates) or belongs to rank 1 (positive coordinates).
>>
>> With the OMPI 1.0.1 bug (observed on Suse Linux 10.0 with gcc 4.0
>> and on
>> Debian sarge with gcc 3.4), we have for example for the last vertices:
>> 7176 ; 7175.00000 7175.00000 7176.00000
>> 7177 ; 7176.00000 7176.00000 7177.00000
>> seeming to indicate an "off by one" type bug in datatype handling
>>
>> Not using an indexed datatype (i.e. not defining USE_INDEXED_DATATYPE
>> in the gather_test.c file), the bug dissapears. Using the indexed
>> datatype with LAM MPI 7.1.1 or MPICH2, we do not reproduce the bug
>> either, so it does seem to be an Open MPI issue.
>>
>> ------------------
>>
>> Best regards,
>>
>> Yvan Fournier
>> <ompi_datatype_bug.tar.gz>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> "Half of what I say is meaningless; but I say it so that the other
> half may reach you"
> Kahlil Gibran
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

"We must accept finite disappointment, but we must never lose infinite
hope."
                                   Martin Luther King