Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Open MPI 1.7.4 with --enable-mpi-thread-multiple gives MPI_Recv error
From: Elias Rudberg (elias.rudberg_at_[hidden])
Date: 2014-03-17 22:33:12


Hello,

Gustavo Correa wrote:
> I guess you need to provide buffers of char type to
> MPI_Send and MPI_Recv, not NULL.

That was not the problem, I was anyway using message size 0, so then
it should be OK to give NULL as the buffer pointer.

I did find the problem now; it turns out that this was not at all due
to any bug in Open MPI, it was my program that had a bug; I used wrong
constant specifying the datatype. I used MPI_CHARACTER which I thought
would correspond to a char or unsigned char in C/C++. But now when I
checked the MPI standard it says that MPI_CHARACTER is for the Fortran
CHARACTER type. Since I am using C, not Fortran, I should use MPI_CHAR
or MPI_SIGNED_CHAR or MPI_UNSIGNED_CHAR. Now I have corrected my
program by changing MPI_CHARACTER to MPI_UNSIGNED_CHAR, and then it
works.

Sorry for reporting this as a bug in Open MPI, it was really a bug in
my own code.

/ Elias

Quoting Gustavo Correa <gus_at_[hidden]>:

> I guess you need to provide buffers of char type to
> MPI_Send and MPI_Recv, not NULL.
>
> On Mar 16, 2014, at 8:04 PM, Elias Rudberg wrote:
>
>> Hi Ralph,
>>
>> Thanks for the quick answer!
>>
>>> Try running the "ring" program in our example directory and see if
>>> that works
>>
>> I just did this, and it works. (I ran ring_c.c)
>>
>> Looking in your ring_c.c code, I see that it is quite similar to my
>> test program but one thing that differs is the datatype: the ring
>> program uses MPI_INT but my test uses MPI_CHARACTER.
>> I tried changing from MPI_INT to MPI_CHARACTER in ring_c.c (and the
>> type of the variable "message" from int to char), and then ring_c.c
>> fails in the same way as my test code. And my code works if
>> changing from MPI_CHARACTER to MPI_INT.
>>
>> So, it looks like the there is a bug that is triggered when using
>> MPI_CHARACTER, but it works with MPI_INT.
>>
>> / Elias
>>
>>
>> Quoting Ralph Castain <rhc_at_[hidden]>:
>>
>>> Try running the "ring" program in our example directory and see if
>>> that works
>>>
>>> On Mar 16, 2014, at 4:26 PM, Elias Rudberg <elias.rudberg_at_[hidden]> wrote:
>>>
>>>> Hello!
>>>>
>>>> I would like to report a bug in Open MPI 1.7.4 when compiled with
>>>> --enable-mpi-thread-multiple.
>>>>
>>>> The bug can be reproduced with the following test program
>>>> (mpi-send-recv.c):
>>>> ===========================================
>>>> #include <mpi.h>
>>>> #include <stdio.h>
>>>> int main() {
>>>> MPI_Init(NULL, NULL);
>>>> int rank;
>>>> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>>> printf("Rank %d at start\n", rank);
>>>> if (rank)
>>>> MPI_Send(NULL, 0, MPI_CHARACTER, 0, 0, MPI_COMM_WORLD);
>>>> else
>>>> MPI_Recv(NULL, 0, MPI_CHARACTER, 1, 0, MPI_COMM_WORLD,
>>>> MPI_STATUS_IGNORE);
>>>> printf("Rank %d at end\n", rank);
>>>> MPI_Finalize();
>>>> return 0;
>>>> }
>>>> ===========================================
>>>>
>>>> With Open MPI 1.7.4 compiled with --enable-mpi-thread-multiple,
>>>> the test program above fails like this:
>>>> $ mpirun -np 2 ./a.out
>>>> Rank 0 at start
>>>> Rank 1 at start
>>>> [elias-p6-2022scm:2743] *** An error occurred in MPI_Recv
>>>> [elias-p6-2022scm:2743] *** reported by process
>>>> [140733606985729,140256452018176]
>>>> [elias-p6-2022scm:2743] *** on communicator MPI_COMM_WORLD
>>>> [elias-p6-2022scm:2743] *** MPI_ERR_TYPE: invalid datatype
>>>> [elias-p6-2022scm:2743] *** MPI_ERRORS_ARE_FATAL (processes in
>>>> this communicator will now abort,
>>>> [elias-p6-2022scm:2743] *** and potentially your MPI job)
>>>>
>>>> Steps I use to reproduce this in Ubuntu:
>>>>
>>>> (1) Download openmpi-1.7.4.tar.gz
>>>>
>>>> (2) Configure like this:
>>>> ./configure --enable-mpi-thread-multiple
>>>>
>>>> (3) make
>>>>
>>>> (4) Compile test program like this:
>>>> mpicc mpi-send-recv.c
>>>>
>>>> (5) Run like this:
>>>> mpirun -np 2 ./a.out
>>>> This gives the error above.
>>>>
>>>> Of course, in my actual application I will want to call
>>>> MPI_Init_thread with MPI_THREAD_MULTIPLE instead of just
>>>> MPI_Init, but that does not seem to matter for this error; the
>>>> same error comes regardless of the way I call
>>>> MPI_Init/MPI_Init_thread. So I just put MPI_Init in the test code
>>>> above to make it as short as possible.
>>>>
>>>> Do you agree that this is a bug, or am I doing something wrong?
>>>>
>>>> Any ideas for workarounds to make things work with
>>>> --enable-mpi-thread-multiple? (I do need threads, so skipping
>>>> --enable-mpi-thread-multiple is probably not an option for me.)
>>>>
>>>> Best regards,
>>>> Elias
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>