Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Open MPI, Segmentation fault
From: Gus Correa (gus_at_[hidden])
Date: 2010-07-01 11:44:19


Hello Jack, list

As others mentioned, this may be a problem with dynamic
memory allocation.
It could also be a violation of statically allocated memory,
I guess.

You say:

> My program can run well for 1,2,10 processors, but fail when the
> number of tasks cannot
> be divided evenly by number of processes.

Often times, when the division of the number of "tasks"
(or the global problem size) by the number of "processors" is not even,
one processor gets a lighter/heavier workload then the others,
it also allocates less/more memory than the others,
and it accesses smaller/larger arrays than the others.

In general integer division and remainder/module calculations
are used to control memory allocation, the array sizes, etc,
on different processors.
These formulas tend to use the MPI communicator size
(i.e., effectively the number of processors if you are using
MPI_COMM_WORLD) to split the workload across the processors.

I would search for the lines of code where those calculations are done,
and where the arrays are allocated and accessed,
to make sure the algorithm works both when
they are of the same size
(even workload across the processors),
as when they are of different sizes
(uneven workload across the processors).
You may be violating memory access by a few bytes only, due to a small
mistake in one of those integer division / remainder/module formulas,
perhaps where an array index upper or lower bound is calculated.
It happened to me before, probably to others too.

This type of code inspection can be done without a debugger,
or before you get to the debugger phase.

I hope this helps,
Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------

> Jeff Squyres wrote:
> Also see http://www.open-mpi.org/faq/?category=debugging.
>
> On Jul 1, 2010, at 3:17 AM, Asad Ali wrote:
>
>> Hi Jack,
>>
>> Debugging OpenMPI with traditional debuggers is a pain.
>> >From your error message it sounds that you have some memory allocation problem. Do you use dynamic memory allocation (allocate and then free)?
>>
>> I use display (printf()) command with MPIrank command. It tells me which thread is giving segmentation fault.
>>
>> Cheers,
>>
>> Asad
>>
>> On Thu, Jul 1, 2010 at 4:13 PM, Jack Bryan <dtustudy68_at_[hidden]> wrote:
>> thanks
>>
>> I am not familiar with OpenMPI.
>>
>> Would you please help me with how to ask openMPI to show where the fault occurs ?
>>
>> GNU debuger ?
>>
>> Any help is appreciated.
>>
>> thanks!!!
>>
>> Jack
>>
>> June 30 2010
>>
>> Date: Wed, 30 Jun 2010 16:13:09 -0400
>> From: amjad11_at_[hidden]
>> To: users_at_[hidden]
>> Subject: Re: [OMPI users] Open MPI, Segmentation fault
>>
>>
>> Based on my experiences, I would FULLY endorse (100% agree with) David Zhang.
>> It is usually a coding or typo mistake.
>>
>> At first, Ensure that array sizes and dimension are correct.
>>
>> I experience that if openmpi is compiled with gnu compilers (not with Intel) then it also point outs the subroutine exactly in which the fault occur. have a try.
>>
>> best,
>> AA
>>
>>
>>
>> On Wed, Jun 30, 2010 at 12:43 PM, David Zhang <solarbikedz_at_[hidden]> wrote:
>> When I got segmentation faults, it has always been my coding mistakes. Perhaps your code is not robust against number of processes not divisible by 2?
>>
>> On Wed, Jun 30, 2010 at 8:47 AM, Jack Bryan <dtustudy68_at_[hidden]> wrote:
>> Dear All,
>>
>> I am using Open MPI, I got the error:
>>
>> n337:37664] *** Process received signal ***
>> [n337:37664] Signal: Segmentation fault (11)
>> [n337:37664] Signal code: Address not mapped (1)
>> [n337:37664] Failing at address: 0x7fffcfe90000
>> [n337:37664] [ 0] /lib64/libpthread.so.0 [0x3c50e0e4c0]
>> [n337:37664] [ 1] /lustre/home/rhascheduler/RhaScheduler-0.4.1.1/mytest/nmn2 [0x414ed7]
>> [n337:37664] [ 2] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3c5021d974]
>> [n337:37664] [ 3] /lustre/home/rhascheduler/RhaScheduler-0.4.1.1/mytest/nmn2(__gxx_personality_v0+0x1f1) [0x412139]
>> [n337:37664] *** End of error message ***
>>
>> After searching answers, it seems that some functions fail.
>>
>> My program can run well for 1,2,10 processors, but fail when the number of tasks cannot
>> be divided evenly by number of processes.
>>
>> Any help is appreciated.
>>
>> thanks
>>
>> Jack
>>
>> June 30 2010
>>
>>
>> The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail. Get busy.
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> --
>> David Zhang
>> University of California, San Diego
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox. Learn more.
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> --
>> "Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write." - H.G. Wells
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>