Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] machine exited on signal 11 (Segmentation fault).
From: Rohan Deshpande (rohand87_at_[hidden])
Date: 2012-04-19 03:18:31


Hi Jeffy,

I checked the SEND RECV buffers and it looks ok to me. The code I have sent
works only when I statically initialize the array.

The code fails everytime I use malloc to initialize the array.

Can you please look at code and let me know what is wrong.

On Wed, Apr 18, 2012 at 8:11 PM, Jeffrey Squyres <jsquyres_at_[hidden]> wrote:

> As a guess, you're passing in a bad address.
>
> Double check the buffers that you're sending to MPI_SEND/MPI_RECV/etc.
>
>
> On Apr 17, 2012, at 10:43 PM, Rohan Deshpande wrote:
>
> > After using malloc i am getting following error
> >
> > *** Process received signal ***
> > Signal: Segmentation fault (11)
> > Signal code: Address not mapped (1)
> > Failing at address: 0x1312d08
> > [ 0] [0x5e840c]
> > [ 1] /usr/local/lib/openmpi/mca_btl_tcp.so(+0x5bdb) [0x119bdb]
> > /usr/local/lib/libopen-pal.so.0(+0x19ce0) [0xb2cce0]
> > /usr/local/lib/libopen-pal.so.0(opal_event_loop+0x27) [0xb2cf47]
> > /usr/local/lib/libopen-pal.so.0(opal_progress+0xda) [0xb200ba]
> > /usr/local/lib/openmpi/mca_pml_ob1.so(+0x3f75) [0xa9ef75]
> > [ 6] /usr/local/lib/libmpi.so.0(MPI_Recv+0x136) [0xea7c46]
> > [ 7] mpi_array(main+0x501) [0x8048e25]
> > [ 8] /lib/libc.so.6(__libc_start_main+0xe6) [0x2fece6]
> > [ 9] mpi_array() [0x8048891]
> > *** End of error message ***
> > [machine4][[3968,1],0][btl_tcp_frag.c:216:mca_btl_tcp_frag_recv]
> mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
> >
> --------------------------------------------------------------------------
> > mpirun noticed that process rank 1 with PID 2936 on node machine4 exited
> on signal 11 (Segmentation fault).
> >
> > Can someone help please.
> >
> > Thanks
> >
> >
> >
> > On Tue, Apr 17, 2012 at 6:01 PM, Jeffrey Squyres <jsquyres_at_[hidden]>
> wrote:
> > Try malloc'ing your array instead of creating it statically on the
> stack. Something like:
> >
> > int *data;
> >
> > int main(..) {
> > {
> > data = malloc(ARRAYSIZE * sizeof(int));
> > if (NULL == data) {
> > perror("malloc");
> > exit(1);
> > }
> > // ...
> > }
> >
> >
> > On Apr 17, 2012, at 5:05 AM, Rohan Deshpande wrote:
> >
> > >
> > > Hi,
> > >
> > > I am trying to distribute large amount of data using MPI.
> > >
> > > When I exceed the certain data size the segmentation fault occurs.
> > >
> > > Here is my code,
> > >
> > >
> > > #include "mpi.h"
> > > #include <stdio.h>
> > > #include <stdlib.h>
> > > #include <string.h>
> > > #define ARRAYSIZE 2000000
> > > #define MASTER 0
> > >
> > > int data[ARRAYSIZE];
> > >
> > >
> > > int main(int argc, char* argv[])
> > > {
> > > int numtasks, taskid, rc, dest, offset, i, j, tag1, tag2, source,
> chunksize, namelen;
> > > int mysum, sum;
> > > int update(int myoffset, int chunk, int myid);
> > > char myname[MPI_MAX_PROCESSOR_NAME];
> > > MPI_Status status;
> > > double start, stop, time;
> > > double totaltime;
> > > FILE *fp;
> > > char line[128];
> > > char element;
> > > int n;
> > > int k=0;
> > >
> > >
> > >
> > > /***** Initializations *****/
> > > MPI_Init(&argc, &argv);
> > > MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
> > > MPI_Comm_rank(MPI_COMM_WORLD,&taskid);
> > > MPI_Get_processor_name(myname, &namelen);
> > > printf ("MPI task %d has started on host %s...\n", taskid, myname);
> > > chunksize = (ARRAYSIZE / numtasks);
> > > tag2 = 1;
> > > tag1 = 2;
> > >
> > >
> > > /***** Master task only ******/
> > > if (taskid == MASTER){
> > >
> > > /* Initialize the array */
> > > sum = 0;
> > > for(i=0; i<ARRAYSIZE; i++) {
> > > data[i] = i * 1 ;
> > > sum = sum + data[i];
> > > }
> > > printf("Initialized array sum = %d\n",sum);
> > >
> > > /* Send each task its portion of the array - master keeps 1st part */
> > > offset = chunksize;
> > > for (dest=1; dest<numtasks; dest++) {
> > > MPI_Send(&offset, 1, MPI_INT, dest, tag1, MPI_COMM_WORLD);
> > > MPI_Send(&data[offset], chunksize, MPI_INT, dest, tag2,
> MPI_COMM_WORLD);
> > > printf("Sent %d elements to task %d offset=
> %d\n",chunksize,dest,offset);
> > > offset = offset + chunksize;
> > > }
> > >
> > > /* Master does its part of the work */
> > > offset = 0;
> > > mysum = update(offset, chunksize, taskid);
> > >
> > > /* Wait to receive results from each task */
> > > for (i=1; i<numtasks; i++) {
> > > source = i;
> > > MPI_Recv(&offset, 1, MPI_INT, source, tag1, MPI_COMM_WORLD,
> &status);
> > > MPI_Recv(&data[offset], chunksize, MPI_INT, source, tag2,
> > > MPI_COMM_WORLD, &status);
> > > }
> > >
> > > /* Get final sum and print sample results */
> > > MPI_Reduce(&mysum, &sum, 1, MPI_INT, MPI_SUM, MASTER,
> MPI_COMM_WORLD);
> > > /* printf("Sample results: \n");
> > > offset = 0;
> > > for (i=0; i<numtasks; i++) {
> > > for (j=0; j<5; j++)
> > > printf(" %d",data[offset+j]);ARRAYSIZE
> > > printf("\n");
> > > offset = offset + chunksize;
> > > }*/
> > > printf("\n*** Final sum= %d ***\n",sum);
> > >
> > > } /* end of master section */
> > >
> > >
> > > #include <stdlib.h>
> > > /***** Non-master tasks only *****/
> > >
> > > if (taskid > MASTER) {
> > >
> > > /* Receive my portion of array from the master task */
> > > start= MPI_Wtime();
> > > source = MASTER;
> > > MPI_Recv(&offset, 1, MPI_INT, source, tag1, MPI_COMM_WORLD, &status);
> > > MPI_Recv(&data[offset], chunksize, MPI_INT, source,
> tag2,MPI_COMM_WORLD, &status);
> > >
> > > mysum = update(offset, chunksize, taskid);
> > > stop = MPI_Wtime();
> > > time = stop -start;
> > > printf("time taken by process %d to recieve elements and caluclate
> own sum is = %lf seconds \n", taskid, time);
> > > totaltime = totaltime + time;
> > >
> > > /* Send my results back to the master task */
> > > dest = MASTER;
> > > MPI_Send(&offset, 1, MPI_INT, dest, tag1, MPI_COMM_WORLD);
> > > MPI_Send(&data[offset], chunksize, MPI_INT, MASTER, tag2,
> MPI_COMM_WORLD);
> > >
> > > MPI_Reduce(&mysum, &sum, 1, MPI_INT, MPI_SUM, MASTER,
> MPI_COMM_WORLD);
> > >
> > > } /* end of non-master */
> > >
> > > // printf("Total time taken for distribution is - %lf seconds",
> totaltime);
> > > MPI_Finalize();
> > >
> > > } /* end of main */
> > >
> > >
> > > int update(int myoffset, int chunk, int myid) {
> > > int i,j;
> > > int mysum;
> > > int mydata[myoffset+chunk];
> > > /* Perform addition to each of my array elements and keep my sum */
> > > mysum = 0;
> > > /* printf("task %d has elements:",myid);
> > > for(j = myoffset; j<myoffset+chunk; j++){
> > > printf("\t%d", data[j]);
> > > }
> > > printf("\n");*/
> > > for(i=myoffset; i < myoffset + chunk; i++) {
> > >
> > > //data[i] = data[i] + i;
> > > mysum = mysum + data[i];
> > > }
> > > printf("Task %d has sum = %d\n",myid,mysum);
> > > return(mysum);
> > > }
> > >
> > >
> > > When I run it with ARRAYSIZE = 2000000 The program works fine. But
> when I increase the size ARRAYSIZE = 20000000. The program ends with
> segmentation fault.
> > > I am running it on a cluster (machine 4 is master, machine 5,6 are
> slaves) and np=20
> > >
> > > MPI task 0 has started on host machine4
> > > MPI task 2 has started on host machine4
> > > MPI task 3 has started on host machine4
> > > MPI task 14 has started on host machine4
> > > MPI task 8 has started on host machine6
> > > MPI task 10 has started on host machine6
> > > MPI task 13 has started on host machine4
> > > MPI task 4 has started on host machine5
> > > MPI task 6 has started on host machine5
> > > MPI task 7 has started on host machine5
> > > MPI task 16 has started on host machine5
> > > MPI task 11 has started on host machine6
> > > MPI task 12 has started on host machine4
> > > MPI task 5 has started on hostmachine5
> > > MPI task 17 has started on host machine5
> > > MPI task 18 has started on host machine5
> > > MPI task 15 has started on host machine4
> > > MPI task 19 has started on host machine5
> > > MPI task 1 has started on host machine4
> > > MPI task 9 has started on host machine6
> > > Initialized array sum = 542894464
> > > Sent 1000000 elements to task 1 offset= 1000000
> > > Task 1 has sum = 1055913696
> > > time taken by process 1 to recieve elements and caluclate own sum is =
> 0.249345 seconds
> > > Sent 1000000 elements to task 2 offset= 2000000
> > > Sent 1000000 elements to task 3 offset= 3000000
> > > Task 2 has sum = 328533728
> > > time taken by process 2 to recieve elements and caluclate own sum is =
> 0.274285 seconds
> > > Sent 1000000 elements to task 4 offset= 4000000
> > >
> --------------------------------------------------------------------------
> > > mpirun noticed that process rank 3 with PID 5695 on node machine4
> exited on signal 11 (Segmentation fault).
> > >
> > > Any idea what could be wrong here?
> > >
> > >
> > > --
> > >
> > > Best Regards,
> > >
> > > ROHAN DESHPANDE
> > >
> > >
> > >
> > > _______________________________________________
> > > users mailing list
> > > users_at_[hidden]
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > --
> > Jeff Squyres
> > jsquyres_at_[hidden]
> > For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
Best Regards,
ROHAN DESHPANDE