Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] MPI_ERR_IN_STATUS from MPI_Bcast?
From: Jeremiah Willcock (jewillco_at_[hidden])
Date: 2011-02-10 15:24:21


I forgot to mention that this was tested with 3 or 4 ranks, connected via
TCP.

-- Jeremiah Willcock

On Thu, 10 Feb 2011, Jeremiah Willcock wrote:

> Here is a small test case that hits the bug on 1.4.1:
>
> #include <mpi.h>
>
> int arr[1142];
>
> int main(int argc, char** argv) {
> int rank, my_size;
> MPI_Init(&argc, &argv);
> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
> my_size = (rank == 1) ? 1142 : 1088;
> MPI_Bcast(arr, my_size, MPI_INT, 0, MPI_COMM_WORLD);
> MPI_Finalize();
> return 0;
> }
>
> I tried it on 1.5.1, and I get MPI_ERR_TRUNCATE instead, so this might have
> already been fixed.
>
> -- Jeremiah Willcock
>
>
> On Thu, 10 Feb 2011, Jeremiah Willcock wrote:
>
>> FYI, I am having trouble finding a small test case that will trigger this
>> on 1.5; I'm either getting deadlocks or MPI_ERR_TRUNCATE, so it could have
>> been fixed. What are the triggering rules for different broadcast
>> algorithms? It could be that only certain sizes or only certain BTLs
>> trigger it.
>>
>> -- Jeremiah Willcock
>>
>> On Thu, 10 Feb 2011, Jeff Squyres wrote:
>>
>>> Nifty! Yes, I agree that that's a poor error message. It's probably
>>> (unfortunately) being propagated up from the underlying point-to-point
>>> system, where an ERR_IN_STATUS would actually make sense.
>>>
>>> I'll file a ticket about this. Thanks for the heads up.
>>>
>>>
>>> On Feb 9, 2011, at 4:49 PM, Jeremiah Willcock wrote:
>>>
>>>> On Wed, 9 Feb 2011, Jeremiah Willcock wrote:
>>>>
>>>>> I get the following Open MPI error from 1.4.1:
>>>>>
>>>>> *** An error occurred in MPI_Bcast
>>>>> *** on communicator MPI COMMUNICATOR 3 SPLIT FROM 0
>>>>> *** MPI_ERR_IN_STATUS: error code in status
>>>>> *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
>>>>>
>>>>> (hostname and port removed from each line). There is no MPI_Status
>>>>> returned by MPI_Bcast, so I don't know what the error is? Is this
>>>>> something that people have seen before?
>>>>
>>>> For the record, this appears to be caused by specifying inconsistent data
>>>> sizes on the different ranks in the broadcast operation. The error
>>>> message could still be improved, though.
>>>>
>>>> -- Jeremiah Willcock
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> --
>>> Jeff Squyres
>>> jsquyres_at_[hidden]
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>