Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Windows: MPI_Allreduce() crashes when using MPI_DOUBLE_PRECISION
From: hi (hiralsmaillist_at_[hidden])
Date: 2011-05-17 02:54:40


Did you tried these test programs?
Or any suggestion to overcome this bug???

Thank you.
-Hiral

On Fri, May 13, 2011 at 11:20 AM, hi <hiralsmaillist_at_[hidden]> wrote:
> Hi Rainer,
>
>> Does REAL work for You?
> No.
> I am observing same errors (see below) even with INTEGER; please find
> the attached test programs with INTEGER and REAL.
>
> C:\test> mpirun mar_f_i.exe
>  size=           1 , rank=           0
>  start --, rcvbuf=           0           0           0           0           0
>  end --, rcvbuf=           2           2           2           2           2
>
> C:\test> mpirun -np 2 mar_f_i.exe
>  size=           2 , rank=           0
>  start --, rcvbuf=           0           0           0           0           0
>  size=           2 , rank=           1
>  start --, rcvbuf=           0           0           0           0           0
> forrtl: severe (157): Program Exception - access violation
> Image              PC                Routine            Line        Source
> [vibgyor:12628] [[31763,0],0]-[[31763,1],0] mca_oob_tcp_msg_recv:
> readv failed: Unknown error (108)
> --------------------------------------------------------------------------
> WARNING: A process refused to die!
>
> Host: vibgyor
> PID:  488
>
> This process may still be running and/or consuming resources.
>
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 0 with PID 452 on node vibgyor
> exiting improperly. There are two reasons this could occur:
>
> 1. this process did not call "init" before exiting, but others in the
> job did. This can cause a job to hang indefinitely while it waits for
> all processes to call "init". By rule, if one process calls "init",
> then ALL processes must call "init" prior to termination.
>
> 2. this process called "init", but exited without calling "finalize".
> By rule, all processes that call "init" MUST call "finalize" prior to
> exiting or it will be considered an "abnormal termination"
>
> This may have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> --------------------------------------------------------------------------
>
>
> Thank you.
> -Hiral
>
>
> On Thu, May 12, 2011 at 9:03 PM, Rainer Keller <keller_at_[hidden]> wrote:
>> Hello Hiral,
>> in the ompi_info You attached, the fortran size detection did not work
>> correctly (on viscluster -- aka that shows the you used the std.-installation
>> package):
>> ...
>>      Fort dbl prec size: 4
>> ...
>>
>> This most probably does not match Your compiler's setting for DOUBLE
>> PRECISION, which probably considers this to be 8.
>>
>> Does REAL work for You?
>>
>> Shiqing is currently away, will ask when he returns.
>>
>> With best regards,
>> Rainer
>>
>>
>> On Wednesday 11 May 2011 09:29:03 hi wrote:
>>> Hi Jeff,
>>>
>>> > Can you send the info listed on the help page?
>>> >
>>> >From the HELP page...
>>>
>>> ***For run-time problems:
>>> 1) Check the FAQ first. Really. This can save you a lot of time; many
>>> common problems and solutions are listed there.
>>> I couldn't find reference in FAQ.
>>>
>>> 2) The version of Open MPI that you're using.
>>> I am using pre-built openmpi-1.5.3 64-bit and 32-bit binaries on Window 7
>>> I also tried with locally built openmpi-1.5.2 using Visual Studio 2008
>>> 32-bit compilers
>>> I tried various compilers: VS-9 32-bit and VS-10 64-bit and
>>> corresponding intel ifort compiler.
>>>
>>> 3) The config.log file from the top-level Open MPI directory, if
>>> available (please compress!).
>>> Don't have.
>>>
>>> 4) The output of the "ompi_info --all" command from the node where
>>> you're invoking mpirun.
>>> see output of pre-built openmpi-1.5.3_x64/bin/ompi_info --all" in
>>> attachments.
>>>
>>> 5) If running on more than one node --
>>> I am running test program on single none.
>>>
>>> 6) A detailed description of what is failing.
>>> Already described in this post.
>>>
>>> 7) Please include information about your network:
>>> As I am running test program on local and single machine, this might
>>> not be required.
>>>
>>> > You forgot ierr in the call to MPI_Finalize.  You also paired
>>> > DOUBLE_PRECISION data with MPI_INTEGER in the call to allreduce.  And
>>> > you mixed sndbuf and rcvbuf in the call to allreduce, meaning that when
>>> > your print rcvbuf afterwards, it'll always still be 0.
>>>
>>> As I am not Fortran programmer, this is my mistake !!!
>>>
>>> >        program Test_MPI
>>> >            use mpi
>>> >            implicit none
>>> >
>>> >            DOUBLE PRECISION rcvbuf(5), sndbuf(5)
>>> >            INTEGER nproc, rank, ierr, n, i, ret
>>> >
>>> >            n = 5
>>> >            do i = 1, n
>>> >                sndbuf(i) = 2.0
>>> >                rcvbuf(i) = 0.0
>>> >            end do
>>> >
>>> >            call MPI_INIT(ierr)
>>> >            call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
>>> >            call MPI_COMM_SIZE(MPI_COMM_WORLD, nproc, ierr)
>>> >            write(*,*) "size=", nproc, ", rank=", rank
>>> >            write(*,*) "start --, rcvbuf=", rcvbuf
>>> >            CALL MPI_ALLREDUCE(sndbuf, rcvbuf, n,
>>> >     &              MPI_DOUBLE_PRECISION, MPI_SUM, MPI_COMM_WORLD, ierr)
>>> >            write(*,*) "end --, rcvbuf=", rcvbuf
>>> >
>>> >            CALL MPI_Finalize(ierr)
>>> >        end
>>> >
>>> > (you could use "include 'mpif.h'", too -- I tried both)
>>> >
>>> > This program works fine for me.
>>>
>>> I am observing same crash, as described in this thread (when executing
>>> as "mpirun -np 2 mar_f_dp.exe"), even with above correct and simple
>>> test program. I commented 'use mpi' as it gave me "Error in compiled
>>> module file" error, so I used 'include "mpif.h"' statement (see
>>> attachement).
>>>
>>> It seems that Windows specific issue, (I could run this test program
>>> on Linux with openmpi-1.5.1).
>>>
>>> Can anybody try this test program on Windows?
>>>
>>> Thank you in advance.
>>> -Hiral
>>
>> --
>> ----------------------------------------------------------------
>>  Dr.-Ing. Rainer Keller  http://www.hlrs.de/people/keller
>>  HLRS                         Tel: ++49 (0)711-685 6 5858
>>  Nobelstrasse 19                 Fax: ++49 (0)711-685 6 5832
>>  70550 Stuttgart                    email: keller_at_[hidden]
>>  Germany                             AIM/Skype:rusraink
>>
>