Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] FlowChecker: Detecting Bugs in MPI Libraries via Message Flow Checking
From: George Bosilca (bosilca_at_[hidden])
Date: 2010-11-22 16:45:48


On Nov 20, 2010, at 12:08 , Sébastien Boisvert wrote:

> Sounds interesting !
>
> Regarding my bug report, I don't think it is very important.
>
> Here's why:
>
> According to the standard MPI 2.2, Open-MPI is correct when blocking on
> any MPI_Send. So, even if Open-MPI __should__ (according to its
> documentation) send messages of 4096 bytes or less eagerly (with shared
> memory), the opposite is compliant too.

__should__ is not correct, __might__ is a better verb to describe the most "common" behavior for small messages. The problem comes from the fact that in each communicator the FIFO ordering is required by the MPI standard. As soon as there is any congestion, MPI_Send will block even for small messages (and this independent on the underlying network) until all he pending packets have been delivered.

> What I have learn is that my program should be designed and implemented
> according to MPI 2.2, not Open-MPI 1.4.3 or another implementation.

Totally agree with this one ;)

  george.

>
>
>
> On Sat, 2010-11-20 at 11:14 -0500, Christopher Samuel wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Hi folks,
>>
>> At SC10 this year there was an interesting tool presented
>> as a student paper called "FlowChecker: Detecting Bugs in
>> MPI Libraries via Message Flow Checking".
>>
>> http://sc10.supercomputing.org/schedule/event_detail.php?evid=pap352
>>
>> Basically they instrument a program and derive "intentions"
>> from your MPI calls and the MPI standard and also trace the
>> data flow (including things like memcpy) and messages.Then
>> offline you run a correlator which compares what was meant
>> to happen and what did and tries to root cause the fault.
>>
>> They claim to have taken 5 random closed bugs from 3 different
>> MPI implementations (including 3 from Open-MPI) and been able
>> to detect all 5 and root-cause 4 of them (the one they missed
>> was a data type issue).
>>
>> The PDF of their paper is here:
>>
>> http://www.cse.ohio-state.edu/~chenzhe/sc10-flowchecker.pdf
>>
>> I've emailed them to see if the code is going to be available
>> as it could be quite a handy tool to have when trying to track
>> down issues like the one Sébastien posted about.
>>
>> cheers,
>> Chris
>> - --
>> Christopher Samuel - Senior Systems Administrator
>> VLSCI - Victorian Life Sciences Computational Initiative
>> Email: samuel_at_[hidden] Phone: +61 (0)3 903 55545
>> http://www.vlsci.unimelb.edu.au/
>>
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.10 (GNU/Linux)
>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>>
>> iEYEARECAAYFAkzn884ACgkQO2KABBYQAh+jAQCggP+izYq3rkSo1hPzADi2vCEI
>> z2QAmwX5oEYpgYYlc6ZWC3Pr3q1dBGp/
>> =2KB+
>> -----END PGP SIGNATURE-----
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel