Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Windows: MPI_Allreduce() crashes when using MPI_DOUBLE_PRECISION
From: hi (hiralsmaillist_at_[hidden])
Date: 2011-05-13 01:50:40


Hi Rainer,

> Does REAL work for You?
No.
I am observing same errors (see below) even with INTEGER; please find
the attached test programs with INTEGER and REAL.

C:\test> mpirun mar_f_i.exe
 size= 1 , rank= 0
 start --, rcvbuf= 0 0 0 0 0
 end --, rcvbuf= 2 2 2 2 2

C:\test> mpirun -np 2 mar_f_i.exe
 size= 2 , rank= 0
 start --, rcvbuf= 0 0 0 0 0
 size= 2 , rank= 1
 start --, rcvbuf= 0 0 0 0 0
forrtl: severe (157): Program Exception - access violation
Image PC Routine Line Source
[vibgyor:12628] [[31763,0],0]-[[31763,1],0] mca_oob_tcp_msg_recv:
readv failed: Unknown error (108)
--------------------------------------------------------------------------
WARNING: A process refused to die!

Host: vibgyor
PID: 488

This process may still be running and/or consuming resources.

--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 452 on node vibgyor
exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in the
job did. This can cause a job to hang indefinitely while it waits for
all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------

Thank you.
-Hiral

On Thu, May 12, 2011 at 9:03 PM, Rainer Keller <keller_at_[hidden]> wrote:
> Hello Hiral,
> in the ompi_info You attached, the fortran size detection did not work
> correctly (on viscluster -- aka that shows the you used the std.-installation
> package):
> ...
>      Fort dbl prec size: 4
> ...
>
> This most probably does not match Your compiler's setting for DOUBLE
> PRECISION, which probably considers this to be 8.
>
> Does REAL work for You?
>
> Shiqing is currently away, will ask when he returns.
>
> With best regards,
> Rainer
>
>
> On Wednesday 11 May 2011 09:29:03 hi wrote:
>> Hi Jeff,
>>
>> > Can you send the info listed on the help page?
>> >
>> >From the HELP page...
>>
>> ***For run-time problems:
>> 1) Check the FAQ first. Really. This can save you a lot of time; many
>> common problems and solutions are listed there.
>> I couldn't find reference in FAQ.
>>
>> 2) The version of Open MPI that you're using.
>> I am using pre-built openmpi-1.5.3 64-bit and 32-bit binaries on Window 7
>> I also tried with locally built openmpi-1.5.2 using Visual Studio 2008
>> 32-bit compilers
>> I tried various compilers: VS-9 32-bit and VS-10 64-bit and
>> corresponding intel ifort compiler.
>>
>> 3) The config.log file from the top-level Open MPI directory, if
>> available (please compress!).
>> Don't have.
>>
>> 4) The output of the "ompi_info --all" command from the node where
>> you're invoking mpirun.
>> see output of pre-built openmpi-1.5.3_x64/bin/ompi_info --all" in
>> attachments.
>>
>> 5) If running on more than one node --
>> I am running test program on single none.
>>
>> 6) A detailed description of what is failing.
>> Already described in this post.
>>
>> 7) Please include information about your network:
>> As I am running test program on local and single machine, this might
>> not be required.
>>
>> > You forgot ierr in the call to MPI_Finalize.  You also paired
>> > DOUBLE_PRECISION data with MPI_INTEGER in the call to allreduce.  And
>> > you mixed sndbuf and rcvbuf in the call to allreduce, meaning that when
>> > your print rcvbuf afterwards, it'll always still be 0.
>>
>> As I am not Fortran programmer, this is my mistake !!!
>>
>> >        program Test_MPI
>> >            use mpi
>> >            implicit none
>> >
>> >            DOUBLE PRECISION rcvbuf(5), sndbuf(5)
>> >            INTEGER nproc, rank, ierr, n, i, ret
>> >
>> >            n = 5
>> >            do i = 1, n
>> >                sndbuf(i) = 2.0
>> >                rcvbuf(i) = 0.0
>> >            end do
>> >
>> >            call MPI_INIT(ierr)
>> >            call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
>> >            call MPI_COMM_SIZE(MPI_COMM_WORLD, nproc, ierr)
>> >            write(*,*) "size=", nproc, ", rank=", rank
>> >            write(*,*) "start --, rcvbuf=", rcvbuf
>> >            CALL MPI_ALLREDUCE(sndbuf, rcvbuf, n,
>> >     &              MPI_DOUBLE_PRECISION, MPI_SUM, MPI_COMM_WORLD, ierr)
>> >            write(*,*) "end --, rcvbuf=", rcvbuf
>> >
>> >            CALL MPI_Finalize(ierr)
>> >        end
>> >
>> > (you could use "include 'mpif.h'", too -- I tried both)
>> >
>> > This program works fine for me.
>>
>> I am observing same crash, as described in this thread (when executing
>> as "mpirun -np 2 mar_f_dp.exe"), even with above correct and simple
>> test program. I commented 'use mpi' as it gave me "Error in compiled
>> module file" error, so I used 'include "mpif.h"' statement (see
>> attachement).
>>
>> It seems that Windows specific issue, (I could run this test program
>> on Linux with openmpi-1.5.1).
>>
>> Can anybody try this test program on Windows?
>>
>> Thank you in advance.
>> -Hiral
>
> --
> ----------------------------------------------------------------
>  Dr.-Ing. Rainer Keller  http://www.hlrs.de/people/keller
>  HLRS                         Tel: ++49 (0)711-685 6 5858
>  Nobelstrasse 19                 Fax: ++49 (0)711-685 6 5832
>  70550 Stuttgart                    email: keller_at_[hidden]
>  Germany                             AIM/Skype:rusraink
>



  • application/octet-stream attachment: mar_f_r.f

  • application/octet-stream attachment: mar_f_i.f