MPI guarantee message ordering per communicator per peer. In other words any message going from peer A to peer B in the same communicator will be __matched__ on the receiver in the exact same order as they were sent (this remains true even for multi-threaded libraries). MPI does not mandate any other type of ordering, such as between communicators or between different pairs of processes.
Now, what I previously said is only true for the matching logic. Completion of message reception is a totally different thing.
I understood that if a large message is sent and then a short message is sent, then the short message can reach before. But what if the messages have the same size, and are small enough so that no fragmentation occurs, the ordering in delivery will be guaranteed?