Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OMPI seg fault by a class with weird address.
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2011-03-15 12:50:41


You can:

    mpirun -np 4 valgrind ./my_application

That is, you run 4 copies of valgrind, each with one instance of ./my_application. Then you'll get valgrind reports for your applications. You might want to dig into the valgrind command line options to have it dump the results to files with unique prefixes (e.g., PID and/or hostname) so that you can get a unique report from each process.

If you disabled ptmalloc and you're still getting the same error, then it sounds like an application error. Check out and see what valgrind tells you.

On Mar 15, 2011, at 11:25 AM, Jack Bryan wrote:

> Thanks,
>
> From http://valgrind.org/docs/manual/mc-manual.html#mc-manual.mpiwrap
>
> I find that
>
> "Currently the wrappers are only buildable with mpiccs which are based on GNU GCC or Intel's C++ Compiler."
>
> The cluster which I am working on is using GNU Open MPI mpic++. i am afraid that the Valgrind wrapper can work here.
>
> I do not have system administrator authorization.
>
> Are there other mem-checkers (open source) that can do this ?
>
> thanks
>
> Jack
>
> > Subject: Re: [OMPI users] OMPI seg fault by a class with weird address.
> > From: jsquyres_at_[hidden]
> > Date: Tue, 15 Mar 2011 06:19:53 -0400
> > CC: dtustudy68_at_[hidden]
> > To: users_at_[hidden]
> >
> > You may also want to run your program through a memory-checking debugger such as valgrind to see if it turns up any other problems.
> >
> > AFIK, ptmalloc should be fine for use with STL vector allocation.
> >
> >
> > On Mar 15, 2011, at 4:00 AM, Belaid MOA wrote:
> >
> > > Hi Jack,
> > > I may need to see the whole code to decide but my quick look suggest that ptmalloc is causing a problem with STL-vector allocation. ptmalloc is the openMPI internal malloc library. Could you try to build openMPI without memory management (using --without-memory-manager) and let us know the outcome. ptmalloc is not needed if you are not using an RDMA interconnect.
> > >
> > > With best regards,
> > > -Belaid.
> > >
> > > From: dtustudy68_at_[hidden]
> > > To: belaid_moa_at_[hidden]; users_at_[hidden]
> > > Subject: RE: [OMPI users] OMPI seg fault by a class with weird address.
> > > Date: Tue, 15 Mar 2011 00:30:19 -0600
> > >
> > > Hi,
> > >
> > > Because the code is very long, I just show the calling relationship of functions.
> > >
> > > main()
> > > {
> > > scheduler();
> > >
> > > }
> > > scheduler()
> > > {
> > > ImportIndices();
> > > }
> > >
> > > ImportIndices()
> > > {
> > > Index IdxNode ;
> > > IdxNode = ReadFile("fileName");
> > > }
> > >
> > > Index ReadFile(const char* fileinput)
> > > {
> > > Index TempIndex;
> > > .........
> > >
> > > }
> > >
> > > vector<int> Index::GetPosition() const { return Position; }
> > > vector<int> Index::GetColumn() const { return Column; }
> > > vector<int> Index::GetYear() const { return Year; }
> > > vector<string> Index::GetName() const { return Name; }
> > > int Index::GetPosition(const int idx) const { return Position[idx]; }
> > > int Index::GetColumn(const int idx) const { return Column[idx]; }
> > > int Index::GetYear(const int idx) const { return Year[idx]; }
> > > string Index::GetName(const int idx) const { return Name[idx]; }
> > > int Index::GetSize() const { return Position.size(); }
> > >
> > > The sequential code works well, and there is no scheduler().
> > >
> > > The parallel code output from gdb:
> > > ----------------------------------------------
> > > Breakpoint 1, myNeplanTaskScheduler(CNSGA2 *, int, int, int, ._85 *, char, int, message_para_to_workers_VecT &, MPI_Datatype, int &, int &, std::vector<std::vector<double, std::allocator<double> >, std::allocator<std::vector<double, std::allocator<double> > > > &, std::vector<std::vector<double, std::allocator<double> >, std::allocator<std::vector<double, std::allocator<double> > > > &, std::vector<double, std::allocator<double> > &, int, std::vector<std::vector<double, std::allocator<double> >, std::allocator<std::vector<double, std::allocator<double> > > > &, MPI_Datatype, int, MPI_Datatype, int) (nsga2=0x118c490,
> > > popSize=<value optimized out>, nodeSize=<value optimized out>,
> > > myRank=<value optimized out>, myChildpop=0x1208d80, genCandTag=65 'A',
> > > generationNum=1, myPopParaVec=std::vector of length 4, capacity 4 = {...},
> > > message_to_master_type=0x7fffffffd540, myT1Flag=@0x7fffffffd68c,
> > > myT2Flag=@0x7fffffffd688,
> > > resultTaskPackageT1=std::vector of length 4, capacity 4 = {...},
> > > resultTaskPackageT2Pr=std::vector of length 4, capacity 4 = {...},
> > > xdataV=std::vector of length 4, capacity 4 = {...}, objSize=7,
> > > resultTaskPackageT12=std::vector of length 4, capacity 4 = {...},
> > > xdata_to_workers_type=0x121c410, myGenerationNum=1,
> > > Mpara_to_workers_type=0x121b9b0, nconNum=0)
> > > at src/nsga2/myNetplanScheduler.cpp:109
> > > 109 ImportIndices();
> > > (gdb) c
> > > Continuing.
> > >
> > > Breakpoint 2, ImportIndices () at src/index.cpp:120
> > > 120 IdxNode = ReadFile("prepdata/idx_node.csv");
> > > (gdb) c
> > > Continuing.
> > >
> > > Breakpoint 4, ReadFile (fileinput=0xd8663d "prepdata/idx_node.csv")
> > > at src/index.cpp:86
> > > 86 Index TempIndex;
> > > (gdb) c
> > > Continuing.
> > >
> > > Breakpoint 5, Index::Index (this=0x7fffffffcb80) at src/index.cpp:20
> > > 20 Name(0) {}
> > > (gdb) c
> > > Continuing.
> > >
> > > Program received signal SIGSEGV, Segmentation fault.
> > > 0x00002aaaab3b0b81 in opal_memory_ptmalloc2_int_malloc ()
> > > from /opt/openmpi-1.3.4-gnu/lib/libopen-pal.so.0
> > >
> > > ---------------------------------------
> > > the backtrace output from the above parallel OpenMPI code:
> > >
> > > (gdb) bt
> > > #0 0x00002aaaab3b0b81 in opal_memory_ptmalloc2_int_malloc ()
> > > from /opt/openmpi-1.3.4-gnu/lib/libopen-pal.so.0
> > > #1 0x00002aaaab3b2bd3 in opal_memory_ptmalloc2_malloc ()
> > > from /opt/openmpi-1.3.4-gnu/lib/libopen-pal.so.0
> > > #2 0x0000003f7c8bd1dd in operator new(unsigned long) ()
> > > from /usr/lib64/libstdc++.so.6
> > > #3 0x00000000004646a7 in __gnu_cxx::new_allocator<int>::allocate (
> > > this=0x7fffffffcb80, __n=0)
> > > at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/ext/new_allocator.h:88
> > > #4 0x00000000004646cf in std::_Vector_base<int, std::allocator<int> >::_M_allocate (this=0x7fffffffcb80, __n=0)
> > > at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_vector.h:127
> > > #5 0x0000000000464701 in std::_Vector_base<int, std::allocator<int> >::_Vector_base (this=0x7fffffffcb80, __n=0, __a=...)
> > > at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_vector.h:113
> > > #6 0x0000000000464d0b in std::vector<int, std::allocator<int> >::vector (
> > > this=0x7fffffffcb80, __n=0, __value=@0x7fffffffc968, __a=...)
> > > at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_vector.h:216
> > > #7 0x00000000004890d7 in Index::Index (this=0x7fffffffcb80)
> > > ---Type <return> to continue, or q <return> to quit---
> > > at src/index.cpp:20
> > > #8 0x000000000048927a in ReadFile (fileinput=0xd8663d "prepdata/idx_node.csv")
> > > at src/index.cpp:86
> > > #9 0x0000000000489533 in ImportIndices () at src/index.cpp:120
> > > #10 0x0000000000445e0e in myNeplanTaskScheduler(CNSGA2 *, int, int, int, ._85 *, char, int, message_para_to_workers_VecT &, MPI_Datatype, int &, int &, std::vector<std::vector<double, std::allocator<double> >, std::allocator<std::vector<double, std::allocator<double> > > > &, std::vector<std::vector<double, std::allocator<double> >, std::allocator<std::vector<double, std::allocator<double> > > > &, std::vector<double, std::allocator<double> > &, int, std::vector<std::vector<double, std::allocator<double> >, std::allocator<std::vector<double, std::allocator<double> > > > &, MPI_Datatype, int, MPI_Datatype, int) (nsga2=0x118c490,
> > > popSize=<value optimized out>, nodeSize=<value optimized out>,
> > > myRank=<value optimized out>, myChildpop=0x1208d80, genCandTag=65 'A',
> > > generationNum=1, myPopParaVec=std::vector of length 4, capacity 4 = {...},
> > > message_to_master_type=0x7fffffffd540, myT1Flag=@0x7fffffffd68c,
> > > myT2Flag=@0x7fffffffd688,
> > > resultTaskPackageT1=std::vector of length 4, capacity 4 = {...},
> > > resultTaskPackageT2Pr=std::vector of length 4, capacity 4 = {...},
> > > xdataV=std::vector of length 4, capacity 4 = {...}, objSize=7,
> > > resultTaskPackageT12=std::vector of length 4, capacity 4 = {...},
> > > xdata_to_workers_type=0x121c410, myGenerationNum=1,
> > > Mpara_to_workers_type=0x121b9b0, nconNum=0)
> > > ---Type <return> to continue, or q <return> to quit---
> > > at src/nsga2/myNetplanScheduler.cpp:109
> > > #11 0x000000000044f44b in main (argc=1, argv=0x7fffffffd998)
> > > at src/nsga2/main-parallel2.cpp:216
> > > ----------------------------------------------------
> > >
> > > What is "opal_memory_ptmalloc2_int_malloc ()" ?
> > >
> > > The gdb output from sequential code:
> > > -------------------------------------
> > > Breakpoint 1, main (argc=<value optimized out>, argv=<value optimized out>)
> > > at src/nsga2/main-seq.cpp:32
> > > 32 ImportIndices();
> > > (gdb) c
> > > Continuing.
> > >
> > > Breakpoint 2, ImportIndices () at src/index.cpp:115
> > > 115 IdxNode = ReadFile("prepdata/idx_node.csv");
> > > (gdb) c
> > > Continuing.
> > >
> > > Breakpoint 4, ReadFile (fileinput=0xd6bb9d "prepdata/idx_node.csv")
> > > at src/index.cpp:86
> > > 86 Index TempIndex;
> > > (gdb) c
> > > Continuing.
> > >
> > > Breakpoint 5, Index::Index (this=0x7fffffffd6d0) at src/index.cpp:20
> > > 20 Name(0) {}
> > > (gdb) c
> > > Continuing.
> > >
> > > Breakpoint 4, ReadFile (fileinput=0xd6bbb3 "prepdata/idx_ud.csv")
> > > at src/index.cpp:86
> > > 86 Index TempIndex;
> > > (gdb) bt
> > > #0 ReadFile (fileinput=0xd6bbb3 "prepdata/idx_ud.csv") at src/index.cpp:86
> > > #1 0x0000000000471cc9 in ImportIndices () at src/index.cpp:116
> > > #2 0x000000000043bba6 in main (argc=<value optimized out>,
> > > argv=<value optimized out>) at src/nsga2/main-seq.cpp:32
> > >
> > > --------------------------------------
> > > thanks
> > >
> > >
> > > From: belaid_moa_at_[hidden]
> > > To: users_at_[hidden]; dtustudy68_at_[hidden]
> > > Subject: RE: [OMPI users] OMPI seg fault by a class with weird address.
> > > Date: Tue, 15 Mar 2011 06:16:35 +0000
> > >
> > > Hi Jack,
> > > 1- Where is your main function to see how you called your class?
> > > 2- I do not see the implementation of GetPosition, GetName, etc.?
> > >
> > > With best regards,
> > > -Belaid.
> > >
> > >
> > > From: dtustudy68_at_[hidden]
> > > To: users_at_[hidden]
> > > Date: Mon, 14 Mar 2011 19:04:12 -0600
> > > Subject: [OMPI users] OMPI seg fault by a class with weird address.
> > >
> > > Hi,
> > >
> > > I got a run-time error of a Open MPI C++ program.
> > >
> > > The following output is from gdb:
> > >
> > > --------------------------------------------------------------------------
> > > Program received signal SIGSEGV, Segmentation fault.
> > > 0x00002aaaab3b0b81 in opal_memory_ptmalloc2_int_malloc ()
> > > from /opt/openmpi-1.3.4-gnu/lib/libopen-pal.so.0
> > >
> > > At the point
> > >
> > > Breakpoint 9, Index::Index (this=0x7fffffffcb80) at src/index.cpp:20
> > > 20 Name(0) {}
> > >
> > > The Index has been called before this point and no problem:
> > > -------------------------------------------------------
> > > Breakpoint 9, Index::Index (this=0x117d800) at src/index.cpp:20
> > > 20 Name(0) {}
> > > (gdb) c
> > > Continuing.
> > >
> > > Breakpoint 9, Index::Index (this=0x117d860) at src/index.cpp:20
> > > 20 Name(0) {}
> > > (gdb) c
> > > Continuing.
> > > ----------------------------------------------------------------------------
> > >
> > > It seems that the 0x7fffffffcb80 address is a problem.
> > >
> > > But, I donot know the reason and how to remove the bug.
> > >
> > > Any help is really appreciated.
> > >
> > > thanks
> > >
> > > the following is the index definition.
> > >
> > > ---------------------------------------------------------
> > > class Index {
> > > public:
> > > Index();
> > > Index(const Index& rhs);
> > > ~Index();
> > > Index& operator=(const Index& rhs);
> > >
> > > vector<int> GetPosition() const;
> > > vector<int> GetColumn() const;
> > > vector<int> GetYear() const;
> > > vector<string> GetName() const;
> > > int GetPosition(const int idx) const;
> > > int GetColumn(const int idx) const;
> > > int GetYear(const int idx) const;
> > > string GetName(const int idx) const;
> > > int GetSize() const;
> > >
> > > void Add(const int idx, const int col, const string& name);
> > > void Add(const int idx, const int col, const int year, const string& name);
> > > void Add(const int idx, const Step& col, const string& name);
> > > void WriteFile(const char* fileinput) const;
> > >
> > > private:
> > > vector<int> Position;
> > > vector<int> Column;
> > > vector<int> Year;
> > > vector<string> Name;
> > > };
> > > // Contructors and destructor for the Index class
> > > Index::Index() :
> > > Position(0),
> > > Column(0),
> > > Year(0),
> > > Name(0) {}
> > >
> > > Index::Index(const Index& rhs) :
> > > Position(rhs.GetPosition()),
> > > Column(rhs.GetColumn()),
> > > Year(rhs.GetYear()),
> > > Name(rhs.GetName()) {}
> > >
> > > Index::~Index() {}
> > >
> > > Index& Index::operator=(const Index& rhs) {
> > > Position = rhs.GetPosition();
> > > Column = rhs.GetColumn(),
> > > Year = rhs.GetYear(),
> > > Name = rhs.GetName();
> > > return *this;
> > > }
> > > ----------------------------------------------------------
> > >
> > >
> > >
> > > _______________________________________________ users mailing list users_at_[hidden] http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > _______________________________________________
> > > users mailing list
> > > users_at_[hidden]
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > --
> > Jeff Squyres
> > jsquyres_at_[hidden]
> > For corporate legal information go to:
> > http://www.cisco.com/web/about/doing_business/legal/cri/
> >

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/