Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Open MPI 1.7.4 with --enable-mpi-thread-multiple gives MPI_Recv error
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2014-03-18 08:50:00


Technically, no -- your code was pretty much ok.

Yes, you're right that MPI_CHARACTER is for Fortran types. But in your case, a char is probably equivalent to a CHARACTER, and therefore using MPI_CHAR vs. MPI_CHARACTER should have been ok.

More specifically: it is ok to use MPI_CHARACTER when calling MPI functions from C, because you may have an opaque buffer that contains fortran data.

So this is a bug in OMPI -- we need to fix this. As you noted, it only happens when OMPI is configured/built with --enable-mpi-thread-multiple, which is a bit suspicious.

I'll file a bug for this; thanks for identifying the issue.

On Mar 17, 2014, at 10:33 PM, Elias Rudberg <elias.rudberg_at_[hidden]> wrote:

> Hello,
>
> Gustavo Correa wrote:
>> I guess you need to provide buffers of char type to
>> MPI_Send and MPI_Recv, not NULL.
>
> That was not the problem, I was anyway using message size 0, so then it should be OK to give NULL as the buffer pointer.
>
> I did find the problem now; it turns out that this was not at all due to any bug in Open MPI, it was my program that had a bug; I used wrong constant specifying the datatype. I used MPI_CHARACTER which I thought would correspond to a char or unsigned char in C/C++. But now when I checked the MPI standard it says that MPI_CHARACTER is for the Fortran CHARACTER type. Since I am using C, not Fortran, I should use MPI_CHAR or MPI_SIGNED_CHAR or MPI_UNSIGNED_CHAR. Now I have corrected my program by changing MPI_CHARACTER to MPI_UNSIGNED_CHAR, and then it works.
>
> Sorry for reporting this as a bug in Open MPI, it was really a bug in my own code.
>
> / Elias
>
>
> Quoting Gustavo Correa <gus_at_[hidden]>:
>
>> I guess you need to provide buffers of char type to
>> MPI_Send and MPI_Recv, not NULL.
>>
>> On Mar 16, 2014, at 8:04 PM, Elias Rudberg wrote:
>>
>>> Hi Ralph,
>>>
>>> Thanks for the quick answer!
>>>
>>>> Try running the "ring" program in our example directory and see if that works
>>>
>>> I just did this, and it works. (I ran ring_c.c)
>>>
>>> Looking in your ring_c.c code, I see that it is quite similar to my test program but one thing that differs is the datatype: the ring program uses MPI_INT but my test uses MPI_CHARACTER.
>>> I tried changing from MPI_INT to MPI_CHARACTER in ring_c.c (and the type of the variable "message" from int to char), and then ring_c.c fails in the same way as my test code. And my code works if changing from MPI_CHARACTER to MPI_INT.
>>>
>>> So, it looks like the there is a bug that is triggered when using MPI_CHARACTER, but it works with MPI_INT.
>>>
>>> / Elias
>>>
>>>
>>> Quoting Ralph Castain <rhc_at_[hidden]>:
>>>
>>>> Try running the "ring" program in our example directory and see if that works
>>>>
>>>> On Mar 16, 2014, at 4:26 PM, Elias Rudberg <elias.rudberg_at_[hidden]> wrote:
>>>>
>>>>> Hello!
>>>>>
>>>>> I would like to report a bug in Open MPI 1.7.4 when compiled with --enable-mpi-thread-multiple.
>>>>>
>>>>> The bug can be reproduced with the following test program (mpi-send-recv.c):
>>>>> ===========================================
>>>>> #include <mpi.h>
>>>>> #include <stdio.h>
>>>>> int main() {
>>>>> MPI_Init(NULL, NULL);
>>>>> int rank;
>>>>> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>>>> printf("Rank %d at start\n", rank);
>>>>> if (rank)
>>>>> MPI_Send(NULL, 0, MPI_CHARACTER, 0, 0, MPI_COMM_WORLD);
>>>>> else
>>>>> MPI_Recv(NULL, 0, MPI_CHARACTER, 1, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
>>>>> printf("Rank %d at end\n", rank);
>>>>> MPI_Finalize();
>>>>> return 0;
>>>>> }
>>>>> ===========================================
>>>>>
>>>>> With Open MPI 1.7.4 compiled with --enable-mpi-thread-multiple, the test program above fails like this:
>>>>> $ mpirun -np 2 ./a.out
>>>>> Rank 0 at start
>>>>> Rank 1 at start
>>>>> [elias-p6-2022scm:2743] *** An error occurred in MPI_Recv
>>>>> [elias-p6-2022scm:2743] *** reported by process [140733606985729,140256452018176]
>>>>> [elias-p6-2022scm:2743] *** on communicator MPI_COMM_WORLD
>>>>> [elias-p6-2022scm:2743] *** MPI_ERR_TYPE: invalid datatype
>>>>> [elias-p6-2022scm:2743] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
>>>>> [elias-p6-2022scm:2743] *** and potentially your MPI job)
>>>>>
>>>>> Steps I use to reproduce this in Ubuntu:
>>>>>
>>>>> (1) Download openmpi-1.7.4.tar.gz
>>>>>
>>>>> (2) Configure like this:
>>>>> ./configure --enable-mpi-thread-multiple
>>>>>
>>>>> (3) make
>>>>>
>>>>> (4) Compile test program like this:
>>>>> mpicc mpi-send-recv.c
>>>>>
>>>>> (5) Run like this:
>>>>> mpirun -np 2 ./a.out
>>>>> This gives the error above.
>>>>>
>>>>> Of course, in my actual application I will want to call MPI_Init_thread with MPI_THREAD_MULTIPLE instead of just MPI_Init, but that does not seem to matter for this error; the same error comes regardless of the way I call MPI_Init/MPI_Init_thread. So I just put MPI_Init in the test code above to make it as short as possible.
>>>>>
>>>>> Do you agree that this is a bug, or am I doing something wrong?
>>>>>
>>>>> Any ideas for workarounds to make things work with --enable-mpi-thread-multiple? (I do need threads, so skipping --enable-mpi-thread-multiple is probably not an option for me.)
>>>>>
>>>>> Best regards,
>>>>> Elias
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/