Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OMPI seg fault by a class with weird address.
From: Samuel K. Gutierrez (samuel_at_[hidden])
Date: 2011-03-15 11:27:35


I -think- setting OMPI_MCA_memory_ptmalloc2_disable to 1 will turn off
OMPI's memory wrappers without having to rebuild. Someone please
correct me if I'm wrong :-).

For example (bash-like shell):

export OMPI_MCA_memory_ptmalloc2_disable=1

Hope that helps,

--
Samuel K. Gutierrez
Los Alamos National Laboratory
On Mar 15, 2011, at 9:19 AM, Jack Bryan wrote:
> Thanks,
>
> I do not have system administrator authorization.
> I am afraid that I cannot rebuild OpenMPI --without-memory-manager.
>
> Are there other ways to get around it ?
>
> For example, use other things to replace "ptmalloc" ?
>
> Any help is really appreciated.
>
> thanks
>
> From: belaid_moa_at_[hidden]
> To: dtustudy68_at_[hidden]; users_at_[hidden]
> Subject: RE: [OMPI users] OMPI seg fault by a class with weird  
> address.
> Date: Tue, 15 Mar 2011 08:00:56 +0000
>
> Hi Jack,
>   I may need to see the whole code to decide but my quick look  
> suggest that ptmalloc is causing a problem with STL-vector  
> allocation. ptmalloc is the openMPI internal malloc library. Could  
> you try to build openMPI without memory management (using --without- 
> memory-manager) and let us know the outcome. ptmalloc is not needed  
> if you are not using an RDMA interconnect.
>
>   With best regards,
> -Belaid.
>
> From: dtustudy68_at_[hidden]
> To: belaid_moa_at_[hidden]; users_at_[hidden]
> Subject: RE: [OMPI users] OMPI seg fault by a class with weird  
> address.
> Date: Tue, 15 Mar 2011 00:30:19 -0600
>
> Hi,
>
> Because the code is very long, I just  show the calling relationship  
> of functions.
>
> main()
> {
>     scheduler();
>
> }
> scheduler()
> {
>      ImportIndices();
> }
>
> ImportIndices()
> {
> 	Index IdxNode ;
> 	IdxNode = ReadFile("fileName");
> }
>
> Index ReadFile(const char* fileinput)
> {
> 	Index TempIndex;
>         .........
>
> }
>
> vector<int> Index::GetPosition() const { return Position; }
> vector<int> Index::GetColumn() const { return Column; }
> vector<int> Index::GetYear() const { return Year; }
> vector<string> Index::GetName() const { return Name; }
> int Index::GetPosition(const int idx) const { return Position[idx]; }
> int Index::GetColumn(const int idx) const { return Column[idx]; }
> int Index::GetYear(const int idx) const { return Year[idx]; }
> string Index::GetName(const int idx) const { return Name[idx]; }
> int Index::GetSize() const { return Position.size(); }
>
> The sequential code works well, and there is no  scheduler().
>
> The parallel code output from gdb:
> ----------------------------------------------
> Breakpoint 1, myNeplanTaskScheduler(CNSGA2 *, int, int, int, ._85 *,  
> char, int, message_para_to_workers_VecT &, MPI_Datatype, int &, int  
> &, std::vector<std::vector<double, std::allocator<double> >,  
> std::allocator<std::vector<double, std::allocator<double> > > > &,  
> std::vector<std::vector<double, std::allocator<double> >,  
> std::allocator<std::vector<double, std::allocator<double> > > > &,  
> std::vector<double, std::allocator<double> > &, int,  
> std::vector<std::vector<double, std::allocator<double> >,  
> std::allocator<std::vector<double, std::allocator<double> > > > &,  
> MPI_Datatype, int, MPI_Datatype, int) (nsga2=0x118c490,
>     popSize=<value optimized out>, nodeSize=<value optimized out>,
>     myRank=<value optimized out>, myChildpop=0x1208d80,  
> genCandTag=65 'A',
>     generationNum=1, myPopParaVec=std::vector of length 4, capacity  
> 4 = {...},
>     message_to_master_type=0x7fffffffd540, myT1Flag=@0x7fffffffd68c,
>     myT2Flag=@0x7fffffffd688,
>     resultTaskPackageT1=std::vector of length 4, capacity 4 = {...},
>     resultTaskPackageT2Pr=std::vector of length 4, capacity 4 = {...},
>     xdataV=std::vector of length 4, capacity 4 = {...}, objSize=7,
>     resultTaskPackageT12=std::vector of length 4, capacity 4 = {...},
>     xdata_to_workers_type=0x121c410, myGenerationNum=1,
>     Mpara_to_workers_type=0x121b9b0, nconNum=0)
>     at src/nsga2/myNetplanScheduler.cpp:109
> 109                     ImportIndices();
> (gdb) c
> Continuing.
>
> Breakpoint 2, ImportIndices () at src/index.cpp:120
> 120             IdxNode = ReadFile("prepdata/idx_node.csv");
> (gdb) c
> Continuing.
>
> Breakpoint 4, ReadFile (fileinput=0xd8663d "prepdata/idx_node.csv")
>     at src/index.cpp:86
> 86              Index TempIndex;
> (gdb) c
> Continuing.
>
> Breakpoint 5, Index::Index (this=0x7fffffffcb80) at src/index.cpp:20
> 20              Name(0) {}
> (gdb) c
> Continuing.
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x00002aaaab3b0b81 in opal_memory_ptmalloc2_int_malloc ()
>    from /opt/openmpi-1.3.4-gnu/lib/libopen-pal.so.0
>
> ---------------------------------------
> the backtrace output from the above parallel OpenMPI code:
>
> (gdb) bt
> #0  0x00002aaaab3b0b81 in opal_memory_ptmalloc2_int_malloc ()
>    from /opt/openmpi-1.3.4-gnu/lib/libopen-pal.so.0
> #1  0x00002aaaab3b2bd3 in opal_memory_ptmalloc2_malloc ()
>    from /opt/openmpi-1.3.4-gnu/lib/libopen-pal.so.0
> #2  0x0000003f7c8bd1dd in operator new(unsigned long) ()
>    from /usr/lib64/libstdc++.so.6
> #3  0x00000000004646a7 in __gnu_cxx::new_allocator<int>::allocate (
>     this=0x7fffffffcb80, __n=0)
>     at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c+ 
> +/4.1.2/ext/new_allocator.h:88
> #4  0x00000000004646cf in std::_Vector_base<int, std::allocator<int>  
> >::_M_allocate (this=0x7fffffffcb80, __n=0)
>     at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c+ 
> +/4.1.2/bits/stl_vector.h:127
> #5  0x0000000000464701 in std::_Vector_base<int, std::allocator<int>  
> >::_Vector_base (this=0x7fffffffcb80, __n=0, __a=...)
>     at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c+ 
> +/4.1.2/bits/stl_vector.h:113
> #6  0x0000000000464d0b in std::vector<int, std::allocator<int>  
> >::vector (
>     this=0x7fffffffcb80, __n=0, __value=@0x7fffffffc968, __a=...)
>     at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c+ 
> +/4.1.2/bits/stl_vector.h:216
> #7  0x00000000004890d7 in Index::Index (this=0x7fffffffcb80)
> ---Type <return> to continue, or q <return> to quit---
>     at src/index.cpp:20
> #8  0x000000000048927a in ReadFile (fileinput=0xd8663d "prepdata/ 
> idx_node.csv")
>     at src/index.cpp:86
> #9  0x0000000000489533 in ImportIndices () at src/index.cpp:120
> #10 0x0000000000445e0e in myNeplanTaskScheduler(CNSGA2 *, int, int,  
> int, ._85 *, char, int, message_para_to_workers_VecT &,  
> MPI_Datatype, int &, int &, std::vector<std::vector<double,  
> std::allocator<double> >, std::allocator<std::vector<double,  
> std::allocator<double> > > > &, std::vector<std::vector<double,  
> std::allocator<double> >, std::allocator<std::vector<double,  
> std::allocator<double> > > > &, std::vector<double,  
> std::allocator<double> > &, int, std::vector<std::vector<double,  
> std::allocator<double> >, std::allocator<std::vector<double,  
> std::allocator<double> > > > &, MPI_Datatype, int, MPI_Datatype,  
> int) (nsga2=0x118c490,
>     popSize=<value optimized out>, nodeSize=<value optimized out>,
>     myRank=<value optimized out>, myChildpop=0x1208d80,  
> genCandTag=65 'A',
>     generationNum=1, myPopParaVec=std::vector of length 4, capacity  
> 4 = {...},
>     message_to_master_type=0x7fffffffd540, myT1Flag=@0x7fffffffd68c,
>     myT2Flag=@0x7fffffffd688,
>     resultTaskPackageT1=std::vector of length 4, capacity 4 = {...},
>     resultTaskPackageT2Pr=std::vector of length 4, capacity 4 = {...},
>     xdataV=std::vector of length 4, capacity 4 = {...}, objSize=7,
>     resultTaskPackageT12=std::vector of length 4, capacity 4 = {...},
>     xdata_to_workers_type=0x121c410, myGenerationNum=1,
>     Mpara_to_workers_type=0x121b9b0, nconNum=0)
> ---Type <return> to continue, or q <return> to quit---
>     at src/nsga2/myNetplanScheduler.cpp:109
> #11 0x000000000044f44b in main (argc=1, argv=0x7fffffffd998)
>     at src/nsga2/main-parallel2.cpp:216
> ----------------------------------------------------
>
> What is "opal_memory_ptmalloc2_int_malloc ()" ?
>
> The gdb output from sequential code:
> -------------------------------------
> Breakpoint 1, main (argc=<value optimized out>, argv=<value  
> optimized out>)
>     at src/nsga2/main-seq.cpp:32
> 32              ImportIndices();
> (gdb) c
> Continuing.
>
> Breakpoint 2, ImportIndices () at src/index.cpp:115
> 115             IdxNode = ReadFile("prepdata/idx_node.csv");
> (gdb) c
> Continuing.
>
> Breakpoint 4, ReadFile (fileinput=0xd6bb9d "prepdata/idx_node.csv")
>     at src/index.cpp:86
> 86              Index TempIndex;
> (gdb) c
> Continuing.
>
> Breakpoint 5, Index::Index (this=0x7fffffffd6d0) at src/index.cpp:20
> 20              Name(0) {}
> (gdb) c
> Continuing.
>
> Breakpoint 4, ReadFile (fileinput=0xd6bbb3 "prepdata/idx_ud.csv")
>     at src/index.cpp:86
> 86              Index TempIndex;
> (gdb) bt
> #0  ReadFile (fileinput=0xd6bbb3 "prepdata/idx_ud.csv") at src/ 
> index.cpp:86
> #1  0x0000000000471cc9 in ImportIndices () at src/index.cpp:116
> #2  0x000000000043bba6 in main (argc=<value optimized out>,
>     argv=<value optimized out>) at src/nsga2/main-seq.cpp:32
>
> --------------------------------------
> thanks
>
>
> From: belaid_moa_at_[hidden]
> To: users_at_[hidden]; dtustudy68_at_[hidden]
> Subject: RE: [OMPI users] OMPI seg fault by a class with weird  
> address.
> Date: Tue, 15 Mar 2011 06:16:35 +0000
>
> Hi Jack,
> 1- Where is your main function to see how you called your class?
> 2- I do not see the implementation of GetPosition, GetName, etc.?
>
> With best regards,
> -Belaid.
>
>
> From: dtustudy68_at_[hidden]
> To: users_at_[hidden]
> Date: Mon, 14 Mar 2011 19:04:12 -0600
> Subject: [OMPI users] OMPI seg fault by a class with weird address.
>
> Hi,
>
> I got a run-time error of a Open MPI C++ program.
>
> The following output is from gdb:
>
> --------------------------------------------------------------------------
> Program received signal SIGSEGV, Segmentation fault.
> 0x00002aaaab3b0b81 in opal_memory_ptmalloc2_int_malloc ()
>    from /opt/openmpi-1.3.4-gnu/lib/libopen-pal.so.0
>
> At the point
>
> Breakpoint 9, Index::Index (this=0x7fffffffcb80) at src/index.cpp:20
> 20              Name(0) {}
>
> The Index has been called before this point and no problem:
> -------------------------------------------------------
> Breakpoint 9, Index::Index (this=0x117d800) at src/index.cpp:20
> 20              Name(0) {}
> (gdb) c
> Continuing.
>
> Breakpoint 9, Index::Index (this=0x117d860) at src/index.cpp:20
> 20              Name(0) {}
> (gdb) c
> Continuing.
> ----------------------------------------------------------------------------
>
> It seems that the 0x7fffffffcb80 address is a problem.
>
> But, I donot know the reason and how to remove the bug.
>
> Any help is really appreciated.
>
> thanks
>
> the following is the index definition.
>
> ---------------------------------------------------------
> class Index {
>     public:
>         Index();
>         Index(const Index& rhs);
>         ~Index();
>         Index& operator=(const Index& rhs);
> 		
> 		vector<int> GetPosition() const;
> 		vector<int> GetColumn() const;
> 		vector<int> GetYear() const;
> 		vector<string> GetName() const;
> 		int GetPosition(const int idx) const;
> 		int GetColumn(const int idx) const;
> 		int GetYear(const int idx) const;
> 		string GetName(const int idx) const;
> 		int GetSize() const;
> 		
> 		void Add(const int idx, const int col, const string& name);
> 		void Add(const int idx, const int col, const int year, const  
> string& name);
> 		void Add(const int idx, const Step& col, const string& name);
> 		void WriteFile(const char* fileinput) const;
> 		
>     private:
> 		vector<int> Position;
> 		vector<int> Column;
> 		vector<int> Year;
> 		vector<string> Name;
> };
> // Contructors and destructor for the Index class
> Index::Index() :
> 	Position(0),
> 	Column(0),
> 	Year(0),
> 	Name(0) {}
>
> Index::Index(const Index& rhs) :
> 	Position(rhs.GetPosition()),
> 	Column(rhs.GetColumn()),
> 	Year(rhs.GetYear()),
> 	Name(rhs.GetName()) {}
>
> Index::~Index() {}
>
> Index& Index::operator=(const Index& rhs) {
>     Position = rhs.GetPosition();
> 	Column = rhs.GetColumn(),
> 	Year = rhs.GetYear(),
> 	Name = rhs.GetName();
>     return *this;
> }
> ----------------------------------------------------------
>
>
>
> _______________________________________________ users mailing list users_at_[hidden] 
>  http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users