Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Where is the error? (MPI program in fortran)
From: Gus Correa (gus_at_[hidden])
Date: 2014-04-16 11:24:52


On 04/16/2014 08:30 AM, Oscar Mojica wrote:
> How would be the command line to compile with the option -g ? What debugger can I use?
> Thanks
>

Replace any optimization flags (-O2, or similar) by -g.
Check if your compiler has the -traceback flag or similar
(man compiler-name).

The gdb debugger is normally available on Linux (or you can install it
with yum, apt-get, etc). An alternative is ddd, with a GUI (can also be
installed from yum, etc).
If you use a commercial compiler you may have a debugger with a GUI.

> Enviado desde mi iPad
>
>> El 15/04/2014, a las 18:20, "Gus Correa" <gus_at_[hidden]> escribió:
>>
>> Or just compiling with -g or -traceback (depending on the compiler) will
>> give you more information about the point of failure
>> in the error message.
>>
>>> On 04/15/2014 04:25 PM, Ralph Castain wrote:
>>> Have you tried using a debugger to look at the resulting core file? It
>>> will probably point you right at the problem. Most likely a case of
>>> overrunning some array when #temps > 5
>>>
>>>
>>>
>>>
>>> On Tue, Apr 15, 2014 at 10:46 AM, Oscar Mojica <o_mojical_at_[hidden]
>>> <mailto:o_mojical_at_[hidden]>> wrote:
>>>
>>> Hello everybody
>>>
>>> I implemented a parallel simulated annealing algorithm in fortran.
>>> The algorithm is describes as follows
>>>
>>> 1. The MPI program initially generates P processes that have rank
>>> 0,1,...,P-1.
>>> 2. The MPI program generates a starting point and sends it for all
>>> processes set T=T0
>>> 3. At the current temperature T, each process begins to execute
>>> iterative operations
>>> 4. At end of iterations, process with rank 0 is responsible for
>>> collecting the solution obatined by
>>> 5. Each process at current temperature and broadcast the best
>>> solution of them among all participating
>>> process
>>> 6. Each process cools the temperatue and goes back to step 3, until
>>> the maximum number of temperatures
>>> is reach
>>>
>>> I compiled with: mpif90 -o exe mpivfsa_version2.f
>>> and run with: mpirun -np 4 ./exe in a single machine
>>>
>>> So I have 4 processes, 1 iteration per temperature and for example
>>> 15 temperatures. When I run the program
>>> with just 5 temperatures it works well, but when the number of
>>> temperatures is higher than 5 it doesn't write the
>>> ouput files and I get the following error message:
>>>
>>>
>>> [oscar-Vostro-3550:06740] *** Process received signal ***
>>> [oscar-Vostro-3550:06741] *** Process received signal ***
>>> [oscar-Vostro-3550:06741] Signal: Segmentation fault (11)
>>> [oscar-Vostro-3550:06741] Signal code: Address not mapped (1)
>>> [oscar-Vostro-3550:06741] Failing at address: 0xad6af
>>> [oscar-Vostro-3550:06742] *** Process received signal ***
>>> [oscar-Vostro-3550:06740] Signal: Segmentation fault (11)
>>> [oscar-Vostro-3550:06740] Signal code: Address not mapped (1)
>>> [oscar-Vostro-3550:06740] Failing at address: 0xad6af
>>> [oscar-Vostro-3550:06742] Signal: Segmentation fault (11)
>>> [oscar-Vostro-3550:06742] Signal code: Address not mapped (1)
>>> [oscar-Vostro-3550:06742] Failing at address: 0xad6af
>>> [oscar-Vostro-3550:06740] [ 0]
>>> /lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7f49ee2224a0]
>>> [oscar-Vostro-3550:06740] [ 1]
>>> /lib/x86_64-linux-gnu/libc.so.6(cfree+0x1c) [0x7f49ee26f54c]
>>> [oscar-Vostro-3550:06740] [ 2] ./exe() [0x406742]
>>> [oscar-Vostro-3550:06740] [ 3] ./exe(main+0x34) [0x406ac9]
>>> [oscar-Vostro-3550:06740] [ 4]
>>> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f49ee20d76d]
>>> [oscar-Vostro-3550:06742] [ 0]
>>> /lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7f6877fdc4a0]
>>> [oscar-Vostro-3550:06742] [ 1]
>>> /lib/x86_64-linux-gnu/libc.so.6(cfree+0x1c) [0x7f687802954c]
>>> [oscar-Vostro-3550:06742] [ 2] ./exe() [0x406742]
>>> [oscar-Vostro-3550:06742] [ 3] ./exe(main+0x34) [0x406ac9]
>>> [oscar-Vostro-3550:06742] [ 4]
>>> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f6877fc776d]
>>> [oscar-Vostro-3550:06742] [ 5] ./exe() [0x401399]
>>> [oscar-Vostro-3550:06742] *** End of error message ***
>>> [oscar-Vostro-3550:06740] [ 5] ./exe() [0x401399]
>>> [oscar-Vostro-3550:06740] *** End of error message ***
>>> [oscar-Vostro-3550:06741] [ 0]
>>> /lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7fa6c4c6e4a0]
>>> [oscar-Vostro-3550:06741] [ 1]
>>> /lib/x86_64-linux-gnu/libc.so.6(cfree+0x1c) [0x7fa6c4cbb54c]
>>> [oscar-Vostro-3550:06741] [ 2] ./exe() [0x406742]
>>> [oscar-Vostro-3550:06741] [ 3] ./exe(main+0x34) [0x406ac9]
>>> [oscar-Vostro-3550:06741] [ 4]
>>> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7fa6c4c5976d]
>>> [oscar-Vostro-3550:06741] [ 5] ./exe() [0x401399]
>>> [oscar-Vostro-3550:06741] *** End of error message ***
>>> --------------------------------------------------------------------------
>>> mpirun noticed that process rank 0 with PID 6917 on node
>>> oscar-Vostro-3550 exited on signal 11 (Segmentation fault).
>>> --------------------------------------------------------------------------
>>> 2 total processes killed (some possibly by mpirun during cleanup)
>>>
>>> If there is a segmentation fault in no case it must work .
>>> I checked the program and didn't find the error. Why does the
>>> program work with five temperatures?
>>> Could someone help me to find the error and answer my question please.
>>>
>>> The program and the necessary files to run it are attached
>>>
>>> Thanks
>>>
>>>
>>> _Oscar Fabian Mojica Ladino_
>>> Geologist M.S. in Geophysics
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden] <mailto:users_at_[hidden]>
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>