Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI_ERR_IN_STATUS from MPI_Bcast?
From: Jeremiah Willcock (jewillco_at_[hidden])
Date: 2011-02-10 15:17:51


Here is a small test case that hits the bug on 1.4.1:

#include <mpi.h>

int arr[1142];

int main(int argc, char** argv) {
   int rank, my_size;
   MPI_Init(&argc, &argv);
   MPI_Comm_rank(MPI_COMM_WORLD, &rank);
   my_size = (rank == 1) ? 1142 : 1088;
   MPI_Bcast(arr, my_size, MPI_INT, 0, MPI_COMM_WORLD);
   MPI_Finalize();
   return 0;
}

I tried it on 1.5.1, and I get MPI_ERR_TRUNCATE instead, so this might
have already been fixed.

-- Jeremiah Willcock

On Thu, 10 Feb 2011, Jeremiah Willcock wrote:

> FYI, I am having trouble finding a small test case that will trigger this on
> 1.5; I'm either getting deadlocks or MPI_ERR_TRUNCATE, so it could have been
> fixed. What are the triggering rules for different broadcast algorithms? It
> could be that only certain sizes or only certain BTLs trigger it.
>
> -- Jeremiah Willcock
>
> On Thu, 10 Feb 2011, Jeff Squyres wrote:
>
>> Nifty! Yes, I agree that that's a poor error message. It's probably
>> (unfortunately) being propagated up from the underlying point-to-point
>> system, where an ERR_IN_STATUS would actually make sense.
>>
>> I'll file a ticket about this. Thanks for the heads up.
>>
>>
>> On Feb 9, 2011, at 4:49 PM, Jeremiah Willcock wrote:
>>
>>> On Wed, 9 Feb 2011, Jeremiah Willcock wrote:
>>>
>>>> I get the following Open MPI error from 1.4.1:
>>>>
>>>> *** An error occurred in MPI_Bcast
>>>> *** on communicator MPI COMMUNICATOR 3 SPLIT FROM 0
>>>> *** MPI_ERR_IN_STATUS: error code in status
>>>> *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
>>>>
>>>> (hostname and port removed from each line). There is no MPI_Status
>>>> returned by MPI_Bcast, so I don't know what the error is? Is this
>>>> something that people have seen before?
>>>
>>> For the record, this appears to be caused by specifying inconsistent data
>>> sizes on the different ranks in the broadcast operation. The error
>>> message could still be improved, though.
>>>
>>> -- Jeremiah Willcock
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> --
>> Jeff Squyres
>> jsquyres_at_[hidden]
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>