Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Open MPI, Segmentation fault
From: Jack Bryan (dtustudy68_at_[hidden])
Date: 2010-07-01 12:09:50


Thanks for all your replies.

I want to do master-worker asynchronous communication.

The master needs to distribute tasks to workers and then collect results from them.

master :

world.irecv(resultSourceRank, upStreamTaskTag, myResultTaskPackage[iRank][taskCounterT3]);

I got this error "MPI_ERR_TRUNCATE" , because I declared " TaskPackage myResultTaskPackage. "

It seems that the 2-dimension array cannot be used to receive my defined
class package from worker, who sends a TaskPackage to master.

So, I changed it to an int 2-d array to get the result, it works well.

But, I still want to find out how to store the result in a data structure with the type TaskPackage because
int type data can only be used to carry integers. Too limited.

What I want to do is:

The master can store the results from each worker and then combine them together
to form the final result after collecting all results from workers.

But, if the master has number of tasks that cannot be divided evenly by worker numbers,
each worker may have different number of tasks.

If we have 11 tasks and 3 workers.

aveTaskNumPerNode = (11 - 11%3) /3 = 3
leftTaskNum = 11%3 =2 = Z

the master distributes each of left tasks from worker 1 to work Z (Z < totalNumWorkers).

For example, worker 1: 4 tasks, worker 2: 4 task, worker 3: 3 tasks.

The master tries to distribute tasks evenly so that the difference between workloads of
each worker is minimized.

I am going to use vector's vector to do the dynamic data storage.

The 2-dimensional data-structure that can store results from workers.

Each row element of the data-structure has different columns.

It can be indexed by iterator so that I can find the a specified number worker task result
by searching the data strucutre.

For example,
               column column
                  1 2
 row 1 (worker1.task1) (worker1.task4)
 row 2 (worker2.task2) (worker1.task5)
 row 3 (worker3.task3)

the data strucutre should remember the location of work ID and the task ID.
So that the master can know which task comes from which worker.

Any help or comment are appreciated.

thanks

Jack

June 30 2010

> Date: Thu, 1 Jul 2010 11:44:19 -0400
> From: gus_at_[hidden]
> To: users_at_[hidden]
> Subject: Re: [OMPI users] Open MPI, Segmentation fault
>
> Hello Jack, list
>
> As others mentioned, this may be a problem with dynamic
> memory allocation.
> It could also be a violation of statically allocated memory,
> I guess.
>
> You say:
>
> > My program can run well for 1,2,10 processors, but fail when the
> > number of tasks cannot
> > be divided evenly by number of processes.
>
> Often times, when the division of the number of "tasks"
> (or the global problem size) by the number of "processors" is not even,
> one processor gets a lighter/heavier workload then the others,
> it also allocates less/more memory than the others,
> and it accesses smaller/larger arrays than the others.
>
> In general integer division and remainder/module calculations
> are used to control memory allocation, the array sizes, etc,
> on different processors.
> These formulas tend to use the MPI communicator size
> (i.e., effectively the number of processors if you are using
> MPI_COMM_WORLD) to split the workload across the processors.
>
> I would search for the lines of code where those calculations are done,
> and where the arrays are allocated and accessed,
> to make sure the algorithm works both when
> they are of the same size
> (even workload across the processors),
> as when they are of different sizes
> (uneven workload across the processors).
> You may be violating memory access by a few bytes only, due to a small
> mistake in one of those integer division / remainder/module formulas,
> perhaps where an array index upper or lower bound is calculated.
> It happened to me before, probably to others too.
>
> This type of code inspection can be done without a debugger,
> or before you get to the debugger phase.
>
> I hope this helps,
> Gus Correa
> ---------------------------------------------------------------------
> Gustavo Correa
> Lamont-Doherty Earth Observatory - Columbia University
> Palisades, NY, 10964-8000 - USA
> ---------------------------------------------------------------------
>
> > Jeff Squyres wrote:
> > Also see http://www.open-mpi.org/faq/?category=debugging.
> >
> > On Jul 1, 2010, at 3:17 AM, Asad Ali wrote:
> >
> >> Hi Jack,
> >>
> >> Debugging OpenMPI with traditional debuggers is a pain.
> >> >From your error message it sounds that you have some memory allocation problem. Do you use dynamic memory allocation (allocate and then free)?
> >>
> >> I use display (printf()) command with MPIrank command. It tells me which thread is giving segmentation fault.
> >>
> >> Cheers,
> >>
> >> Asad
> >>
> >> On Thu, Jul 1, 2010 at 4:13 PM, Jack Bryan <dtustudy68_at_[hidden]> wrote:
> >> thanks
> >>
> >> I am not familiar with OpenMPI.
> >>
> >> Would you please help me with how to ask openMPI to show where the fault occurs ?
> >>
> >> GNU debuger ?
> >>
> >> Any help is appreciated.
> >>
> >> thanks!!!
> >>
> >> Jack
> >>
> >> June 30 2010
> >>
> >> Date: Wed, 30 Jun 2010 16:13:09 -0400
> >> From: amjad11_at_[hidden]
> >> To: users_at_[hidden]
> >> Subject: Re: [OMPI users] Open MPI, Segmentation fault
> >>
> >>
> >> Based on my experiences, I would FULLY endorse (100% agree with) David Zhang.
> >> It is usually a coding or typo mistake.
> >>
> >> At first, Ensure that array sizes and dimension are correct.
> >>
> >> I experience that if openmpi is compiled with gnu compilers (not with Intel) then it also point outs the subroutine exactly in which the fault occur. have a try.
> >>
> >> best,
> >> AA
> >>
> >>
> >>
> >> On Wed, Jun 30, 2010 at 12:43 PM, David Zhang <solarbikedz_at_[hidden]> wrote:
> >> When I got segmentation faults, it has always been my coding mistakes. Perhaps your code is not robust against number of processes not divisible by 2?
> >>
> >> On Wed, Jun 30, 2010 at 8:47 AM, Jack Bryan <dtustudy68_at_[hidden]> wrote:
> >> Dear All,
> >>
> >> I am using Open MPI, I got the error:
> >>
> >> n337:37664] *** Process received signal ***
> >> [n337:37664] Signal: Segmentation fault (11)
> >> [n337:37664] Signal code: Address not mapped (1)
> >> [n337:37664] Failing at address: 0x7fffcfe90000
> >> [n337:37664] [ 0] /lib64/libpthread.so.0 [0x3c50e0e4c0]
> >> [n337:37664] [ 1] /lustre/home/rhascheduler/RhaScheduler-0.4.1.1/mytest/nmn2 [0x414ed7]
> >> [n337:37664] [ 2] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3c5021d974]
> >> [n337:37664] [ 3] /lustre/home/rhascheduler/RhaScheduler-0.4.1.1/mytest/nmn2(__gxx_personality_v0+0x1f1) [0x412139]
> >> [n337:37664] *** End of error message ***
> >>
> >> After searching answers, it seems that some functions fail.
> >>
> >> My program can run well for 1,2,10 processors, but fail when the number of tasks cannot
> >> be divided evenly by number of processes.
> >>
> >> Any help is appreciated.
> >>
> >> thanks
> >>
> >> Jack
> >>
> >> June 30 2010
> >>
> >>
> >> The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail. Get busy.
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >>
> >>
> >> --
> >> David Zhang
> >> University of California, San Diego
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >>
> >> Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox. Learn more.
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >>
> >>
> >> --
> >> "Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write." - H.G. Wells
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
                                               
_________________________________________________________________
The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail.
http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4