Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Open MPI 1.7.4 with --enable-mpi-thread-multiple gives MPI_Recv error
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2014-03-18 08:50:00


Technically, no -- your code was pretty much ok.

Yes, you're right that MPI_CHARACTER is for Fortran types. But in your case, a char is probably equivalent to a CHARACTER, and therefore using MPI_CHAR vs. MPI_CHARACTER should have been ok.

More specifically: it is ok to use MPI_CHARACTER when calling MPI functions from C, because you may have an opaque buffer that contains fortran data.

So this is a bug in OMPI -- we need to fix this. As you noted, it only happens when OMPI is configured/built with --enable-mpi-thread-multiple, which is a bit suspicious.

I'll file a bug for this; thanks for identifying the issue.

On Mar 17, 2014, at 10:33 PM, Elias Rudberg <elias.rudberg_at_[hidden]> wrote:

> Hello,
>
> Gustavo Correa wrote:
>> I guess you need to provide buffers of char type to
>> MPI_Send and MPI_Recv, not NULL.
>
> That was not the problem, I was anyway using message size 0, so then it should be OK to give NULL as the buffer pointer.
>
> I did find the problem now; it turns out that this was not at all due to any bug in Open MPI, it was my program that had a bug; I used wrong constant specifying the datatype. I used MPI_CHARACTER which I thought would correspond to a char or unsigned char in C/C++. But now when I checked the MPI standard it says that MPI_CHARACTER is for the Fortran CHARACTER type. Since I am using C, not Fortran, I should use MPI_CHAR or MPI_SIGNED_CHAR or MPI_UNSIGNED_CHAR. Now I have corrected my program by changing MPI_CHARACTER to MPI_UNSIGNED_CHAR, and then it works.
>
> Sorry for reporting this as a bug in Open MPI, it was really a bug in my own code.
>
> / Elias
>
>
> Quoting Gustavo Correa <gus_at_[hidden]>:
>
>> I guess you need to provide buffers of char type to
>> MPI_Send and MPI_Recv, not NULL.
>>
>> On Mar 16, 2014, at 8:04 PM, Elias Rudberg wrote:
>>
>>> Hi Ralph,
>>>
>>> Thanks for the quick answer!
>>>
>>>> Try running the "ring" program in our example directory and see if that works
>>>
>>> I just did this, and it works. (I ran ring_c.c)
>>>
>>> Looking in your ring_c.c code, I see that it is quite similar to my test program but one thing that differs is the datatype: the ring program uses MPI_INT but my test uses MPI_CHARACTER.
>>> I tried changing from MPI_INT to MPI_CHARACTER in ring_c.c (and the type of the variable "message" from int to char), and then ring_c.c fails in the same way as my test code. And my code works if changing from MPI_CHARACTER to MPI_INT.
>>>
>>> So, it looks like the there is a bug that is triggered when using MPI_CHARACTER, but it works with MPI_INT.
>>>
>>> / Elias
>>>
>>>
>>> Quoting Ralph Castain <rhc_at_[hidden]>:
>>>
>>>> Try running the "ring" program in our example directory and see if that works
>>>>
>>>> On Mar 16, 2014, at 4:26 PM, Elias Rudberg <elias.rudberg_at_[hidden]> wrote:
>>>>
>>>>> Hello!
>>>>>
>>>>> I would like to report a bug in Open MPI 1.7.4 when compiled with --enable-mpi-thread-multiple.
>>>>>
>>>>> The bug can be reproduced with the following test program (mpi-send-recv.c):
>>>>> ===========================================
>>>>> #include <mpi.h>
>>>>> #include <stdio.h>
>>>>> int main() {
>>>>> MPI_Init(NULL, NULL);
>>>>> int rank;
>>>>> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>>>> printf("Rank %d at start\n", rank);
>>>>> if (rank)
>>>>> MPI_Send(NULL, 0, MPI_CHARACTER, 0, 0, MPI_COMM_WORLD);
>>>>> else
>>>>> MPI_Recv(NULL, 0, MPI_CHARACTER, 1, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
>>>>> printf("Rank %d at end\n", rank);
>>>>> MPI_Finalize();
>>>>> return 0;
>>>>> }
>>>>> ===========================================
>>>>>
>>>>> With Open MPI 1.7.4 compiled with --enable-mpi-thread-multiple, the test program above fails like this:
>>>>> $ mpirun -np 2 ./a.out
>>>>> Rank 0 at start
>>>>> Rank 1 at start
>>>>> [elias-p6-2022scm:2743] *** An error occurred in MPI_Recv
>>>>> [elias-p6-2022scm:2743] *** reported by process [140733606985729,140256452018176]
>>>>> [elias-p6-2022scm:2743] *** on communicator MPI_COMM_WORLD
>>>>> [elias-p6-2022scm:2743] *** MPI_ERR_TYPE: invalid datatype
>>>>> [elias-p6-2022scm:2743] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
>>>>> [elias-p6-2022scm:2743] *** and potentially your MPI job)
>>>>>
>>>>> Steps I use to reproduce this in Ubuntu:
>>>>>
>>>>> (1) Download openmpi-1.7.4.tar.gz
>>>>>
>>>>> (2) Configure like this:
>>>>> ./configure --enable-mpi-thread-multiple
>>>>>
>>>>> (3) make
>>>>>
>>>>> (4) Compile test program like this:
>>>>> mpicc mpi-send-recv.c
>>>>>
>>>>> (5) Run like this:
>>>>> mpirun -np 2 ./a.out
>>>>> This gives the error above.
>>>>>
>>>>> Of course, in my actual application I will want to call MPI_Init_thread with MPI_THREAD_MULTIPLE instead of just MPI_Init, but that does not seem to matter for this error; the same error comes regardless of the way I call MPI_Init/MPI_Init_thread. So I just put MPI_Init in the test code above to make it as short as possible.
>>>>>
>>>>> Do you agree that this is a bug, or am I doing something wrong?
>>>>>
>>>>> Any ideas for workarounds to make things work with --enable-mpi-thread-multiple? (I do need threads, so skipping --enable-mpi-thread-multiple is probably not an option for me.)
>>>>>
>>>>> Best regards,
>>>>> Elias
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/