Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Where is the error? (MPI program in fortran)
From: Gus Correa (gus_at_[hidden])
Date: 2014-04-15 17:20:13


Or just compiling with -g or -traceback (depending on the compiler) will
give you more information about the point of failure
in the error message.

On 04/15/2014 04:25 PM, Ralph Castain wrote:
> Have you tried using a debugger to look at the resulting core file? It
> will probably point you right at the problem. Most likely a case of
> overrunning some array when #temps > 5
>
>
>
>
> On Tue, Apr 15, 2014 at 10:46 AM, Oscar Mojica <o_mojical_at_[hidden]
> <mailto:o_mojical_at_[hidden]>> wrote:
>
> Hello everybody
>
> I implemented a parallel simulated annealing algorithm in fortran.
> The algorithm is describes as follows
>
> 1. The MPI program initially generates P processes that have rank
> 0,1,...,P-1.
> 2. The MPI program generates a starting point and sends it for all
> processes set T=T0
> 3. At the current temperature T, each process begins to execute
> iterative operations
> 4. At end of iterations, process with rank 0 is responsible for
> collecting the solution obatined by
> 5. Each process at current temperature and broadcast the best
> solution of them among all participating
> process
> 6. Each process cools the temperatue and goes back to step 3, until
> the maximum number of temperatures
> is reach
>
> I compiled with: mpif90 -o exe mpivfsa_version2.f
> and run with: mpirun -np 4 ./exe in a single machine
>
> So I have 4 processes, 1 iteration per temperature and for example
> 15 temperatures. When I run the program
> with just 5 temperatures it works well, but when the number of
> temperatures is higher than 5 it doesn't write the
> ouput files and I get the following error message:
>
>
> [oscar-Vostro-3550:06740] *** Process received signal ***
> [oscar-Vostro-3550:06741] *** Process received signal ***
> [oscar-Vostro-3550:06741] Signal: Segmentation fault (11)
> [oscar-Vostro-3550:06741] Signal code: Address not mapped (1)
> [oscar-Vostro-3550:06741] Failing at address: 0xad6af
> [oscar-Vostro-3550:06742] *** Process received signal ***
> [oscar-Vostro-3550:06740] Signal: Segmentation fault (11)
> [oscar-Vostro-3550:06740] Signal code: Address not mapped (1)
> [oscar-Vostro-3550:06740] Failing at address: 0xad6af
> [oscar-Vostro-3550:06742] Signal: Segmentation fault (11)
> [oscar-Vostro-3550:06742] Signal code: Address not mapped (1)
> [oscar-Vostro-3550:06742] Failing at address: 0xad6af
> [oscar-Vostro-3550:06740] [ 0]
> /lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7f49ee2224a0]
> [oscar-Vostro-3550:06740] [ 1]
> /lib/x86_64-linux-gnu/libc.so.6(cfree+0x1c) [0x7f49ee26f54c]
> [oscar-Vostro-3550:06740] [ 2] ./exe() [0x406742]
> [oscar-Vostro-3550:06740] [ 3] ./exe(main+0x34) [0x406ac9]
> [oscar-Vostro-3550:06740] [ 4]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f49ee20d76d]
> [oscar-Vostro-3550:06742] [ 0]
> /lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7f6877fdc4a0]
> [oscar-Vostro-3550:06742] [ 1]
> /lib/x86_64-linux-gnu/libc.so.6(cfree+0x1c) [0x7f687802954c]
> [oscar-Vostro-3550:06742] [ 2] ./exe() [0x406742]
> [oscar-Vostro-3550:06742] [ 3] ./exe(main+0x34) [0x406ac9]
> [oscar-Vostro-3550:06742] [ 4]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f6877fc776d]
> [oscar-Vostro-3550:06742] [ 5] ./exe() [0x401399]
> [oscar-Vostro-3550:06742] *** End of error message ***
> [oscar-Vostro-3550:06740] [ 5] ./exe() [0x401399]
> [oscar-Vostro-3550:06740] *** End of error message ***
> [oscar-Vostro-3550:06741] [ 0]
> /lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7fa6c4c6e4a0]
> [oscar-Vostro-3550:06741] [ 1]
> /lib/x86_64-linux-gnu/libc.so.6(cfree+0x1c) [0x7fa6c4cbb54c]
> [oscar-Vostro-3550:06741] [ 2] ./exe() [0x406742]
> [oscar-Vostro-3550:06741] [ 3] ./exe(main+0x34) [0x406ac9]
> [oscar-Vostro-3550:06741] [ 4]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7fa6c4c5976d]
> [oscar-Vostro-3550:06741] [ 5] ./exe() [0x401399]
> [oscar-Vostro-3550:06741] *** End of error message ***
> --------------------------------------------------------------------------
> mpirun noticed that process rank 0 with PID 6917 on node
> oscar-Vostro-3550 exited on signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
> 2 total processes killed (some possibly by mpirun during cleanup)
>
> If there is a segmentation fault in no case it must work .
> I checked the program and didn't find the error. Why does the
> program work with five temperatures?
> Could someone help me to find the error and answer my question please.
>
> The program and the necessary files to run it are attached
>
> Thanks
>
>
> _Oscar Fabian Mojica Ladino_
> Geologist M.S. in Geophysics
>
> _______________________________________________
> users mailing list
> users_at_[hidden] <mailto:users_at_[hidden]>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>