Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Segmentation fault (11)
From: Jean Potsam (jeanpotsam_at_[hidden])
Date: 2010-03-29 14:28:45


Hi Josh/All,
               I just tested a simple c application with blcr and it worked fine.
 
##########################################
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <fcntl.h>
#include <limits.h>
#include <sys/types.h>
#include <sys/stat.h>
#include<signal.h>
#include <fcntl.h>
#include <unistd.h>

char * getprocessid()
{
    FILE * read_fp;
    char buffer[BUFSIZ + 1];
    int chars_read;
    char * buffer_data="12345";
    memset(buffer, '\0', sizeof(buffer));
  read_fp = popen("uname -a", "r");
     /*
      ...
 */ 
     return buffer_data;
}
 
int main(int argc, char ** argv)
{

 int rank;
   int size;
char * thedata;
int n=0;

 thedata=getprocessid();
 printf(" the data is %s", thedata);
    
  while( n <10)
  {
    printf("value is %d\n", n);
    n++;
    sleep(1);
   }
 printf("bye\n");
 
}
 
 
jean_at_sun32:/tmp$ cr_run ./pipetest3 &
[1] 31807
jean_at_sun32:~$  the data is 12345value is 0
value is 1
value is 2
...
value is 9
bye
 
jean_at_sun32:/tmp$ cr_checkpoint 31807
 
jean_at_sun32:/tmp$ cr_restart context.31807
value is 7
value is 8
value is 9
bye
 
##############################################
 
 
It looks like its more to do with Openmpi.  Any ideas from you side?
 
Thank you.
 
Kind regards,
 
Jean.
 
 

 

--- On Mon, 29/3/10, Josh Hursey <jjhursey_at_[hidden]> wrote:

From: Josh Hursey <jjhursey_at_[hidden]>
Subject: Re: [OMPI users] Segmentation fault (11)
To: "Open MPI Users" <users_at_[hidden]>
Date: Monday, 29 March, 2010, 16:08

I wonder if this is a bug with BLCR (since the segv stack is in the BLCR thread). Can you try an non-MPI version of this application that uses popen(), and see if BLCR properly checkpoints/restarts it?

If so, we can start to see what Open MPI might be doing to confuse things, but I suspect that this might be a bug with BLCR. Either way let us know what you find out.

Cheers,
Josh

On Mar 27, 2010, at 6:17 AM, jody wrote:

> I'm not sure if this is the cause of your problems:
> You define the constant BUFFER_SIZE, but in the code you use a constant called BUFSIZ...
> Jody
>
>
> On Fri, Mar 26, 2010 at 10:29 PM, Jean Potsam <jeanpotsam_at_[hidden]> wrote:
> Dear All,
>               I am having a problem with openmpi . I have installed openmpi 1.4 and blcr 0.8.1
>
> I have written a small mpi application as follows below:
>
> #######################
> #include <unistd.h>
> #include <stdlib.h>
> #include <stdio.h>
> #include <string.h>
> #include <fcntl.h>
> #include <limits.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <mpi.h>
> #include<signal.h>
> #include <fcntl.h>
> #include <unistd.h>
>
> #define BUFFER_SIZE PIPE_BUF
>
> char * getprocessid()
> {
>     FILE * read_fp;
>     char buffer[BUFSIZ + 1];
>     int chars_read;
>     char * buffer_data="12345";
>     memset(buffer, '\0', sizeof(buffer));
>   read_fp = popen("uname -a", "r");
>      /*
>       ...
>  */
>      return buffer_data;
> }
>
> int main(int argc, char ** argv)
> {
>   MPI_Status status;
>  int rank;
>    int size;
> char * thedata;
>     MPI_Init(&argc, &argv);
>     MPI_Comm_size(MPI_COMM_WORLD,&size);
>     MPI_Comm_rank(MPI_COMM_WORLD,&rank);
>  thedata=getprocessid();
>  printf(" the data is %s", thedata);
>     MPI_Finalize();
> }
> ############################
>
> I get the following result:
>
> #######################
> jean_at_sunn32:~$ mpicc pipetest2.c -o pipetest2
> jean_at_sunn32:~$ mpirun -np 1 -am ft-enable-cr -mca btl ^openib  pipetest2
> [sun32:19211] *** Process received signal ***
> [sun32:19211] Signal: Segmentation fault (11)
> [sun32:19211] Signal code: Address not mapped (1)
> [sun32:19211] Failing at address: 0x4
> [sun32:19211] [ 0] [0xb7f3c40c]
> [sun32:19211] [ 1] /lib/libc.so.6(cfree+0x3b) [0xb796868b]
> [sun32:19211] [ 2] /usr/local/blcr/lib/libcr.so.0(cri_info_free+0x2a) [0xb7a5925a]
> [sun32:19211] [ 3] /usr/local/blcr/lib/libcr.so.0 [0xb7a5ac72]
> [sun32:19211] [ 4] /lib/libc.so.6(__libc_fork+0x186) [0xb7991266]
> [sun32:19211] [ 5] /lib/libc.so.6(_IO_proc_open+0x7e) [0xb7958b6e]
> [sun32:19211] [ 6] /lib/libc.so.6(popen+0x6c) [0xb7958dfc]
> [sun32:19211] [ 7] pipetest2(getprocessid+0x42) [0x8048836]
> [sun32:19211] [ 8] pipetest2(main+0x4d) [0x8048897]
> [sun32:19211] [ 9] /lib/libc.so.6(__libc_start_main+0xe5) [0xb7912455]
> [sun32:19211] [10] pipetest2 [0x8048761]
> [sun32:19211] *** End of error message ***
> #####################################################
>
>
> However, If I compile the application using gcc, it works fine. The problem arises with:
>   read_fp = popen("uname -a", "r");
>
> Does anyone has an idea how to resolve this problem?
>
> Many thanks
>
> Jean
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
users_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/users