Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Segmentation fault with HPCC benchmark
From: Reza Bakhshayeshi (reza.b2008_at_[hidden])
Date: 2013-04-03 13:17:18


Thanks for your answers.

@Ralph Castain:
Do you mean what error I receive?
It's the output when I'm running the program:

  *** Process received signal ***
  Signal: Segmentation fault (11)
  Signal code: Address not mapped (1)
  Failing at address: 0x1b7f000
  [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7f6a84b524a0]
  [ 1] hpcc(HPCC_Power2NodesMPIRandomAccessCheck+0xa04) [0x423834]
  [ 2] hpcc(HPCC_MPIRandomAccess+0x87a) [0x41e43a]
  [ 3] hpcc(main+0xfbf) [0x40a1bf]
  [ 4] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)
[0x7f6a84b3d76d]
  [ 5] hpcc() [0x40aafd]
  *** End of error message ***
[
][[53938,1],0][../../../../../../ompi/mca/btl/tcp/btl_tcp_frag.c:216:mca_btl_tcp_frag_recv]
mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 4164 on node 192.168.100.6
exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

@Gus Correa:
I did it both on server and on instances but it didn't solve the problem.

On 3 April 2013 19:14, Gus Correa <gus_at_[hidden]> wrote:

> Hi Reza
>
> Check the system stacksize first ('limit stacksize' or 'ulimit -s').
> If it is small, you can try to increase it
> before you run the program.
> Say (tcsh):
>
> limit stacksize unlimited
>
> or (bash):
>
> ulimit -s unlimited
>
> I hope this helps,
> Gus Correa
>
>
> On 04/03/2013 10:29 AM, Ralph Castain wrote:
>
>> Could you perhaps share the stacktrace from the segfault? It's
>> impossible to advise you on the problem without seeing it.
>>
>>
>> On Apr 3, 2013, at 5:28 AM, Reza Bakhshayeshi <reza.b2008_at_[hidden]
>> <mailto:reza.b2008_at_[hidden]>> wrote:
>>
>> ​Hi
>>> ​​I have installed HPCC benchmark suite and openmpi on a private cloud
>>> instances.
>>> Unfortunately I get Segmentation fault error mostly when I want to run
>>> it simultaneously on two or more instances with:
>>> mpirun -np 2 --hostfile ./myhosts hpcc
>>>
>>> Everything is on Ubuntu server 12.04 (updated)
>>> and this is my make.intel64 file:
>>>
>>> shell ------------------------------**------------------------------**--
>>> # ------------------------------**------------------------------**
>>> ----------
>>> #
>>> SHELL = /bin/sh
>>> #
>>> CD = cd
>>> CP = cp
>>> LN_S = ln -s
>>> MKDIR = mkdir
>>> RM = /bin/rm -f
>>> TOUCH = touch
>>> #
>>> # ------------------------------**------------------------------**
>>> ----------
>>> # - Platform identifier ------------------------------**
>>> ------------------
>>> # ------------------------------**------------------------------**
>>> ----------
>>> #
>>> ARCH = intel64
>>> #
>>> # ------------------------------**------------------------------**
>>> ----------
>>> # - HPL Directory Structure / HPL library ------------------------------
>>> # ------------------------------**------------------------------**
>>> ----------
>>> #
>>> TOPdir = ../../..
>>> INCdir = $(TOPdir)/include
>>> BINdir = $(TOPdir)/bin/$(ARCH)
>>> LIBdir = $(TOPdir)/lib/$(ARCH)
>>> #
>>> HPLlib = $(LIBdir)/libhpl.a
>>> #
>>> # ------------------------------**------------------------------**
>>> ----------
>>> # - Message Passing library (MPI) ------------------------------**
>>> --------
>>> # ------------------------------**------------------------------**
>>> ----------
>>> # MPinc tells the C compiler where to find the Message Passing library
>>> # header files, MPlib is defined to be the name of the library to be
>>> # used. The variable MPdir is only used for defining MPinc and MPlib.
>>> #
>>> MPdir = /usr/lib/openmpi
>>> MPinc = -I$(MPdir)/include
>>> MPlib = $(MPdir)/lib/libmpi.so
>>> #
>>> # ------------------------------**------------------------------**
>>> ----------
>>> # - Linear Algebra library (BLAS or VSIPL) -----------------------------
>>> # ------------------------------**------------------------------**
>>> ----------
>>> # LAinc tells the C compiler where to find the Linear Algebra library
>>> # header files, LAlib is defined to be the name of the library to be
>>> # used. The variable LAdir is only used for defining LAinc and LAlib.
>>> #
>>> LAdir = /usr/local/ATLAS/obj64
>>> LAinc = -I$(LAdir)/include
>>> LAlib = $(LAdir)/lib/libcblas.a $(LAdir)/lib/libatlas.a
>>> #
>>> # ------------------------------**------------------------------**
>>> ----------
>>> # - F77 / C interface ------------------------------**
>>> --------------------
>>> # ------------------------------**------------------------------**
>>> ----------
>>> # You can skip this section if and only if you are not planning to use
>>> # a BLAS library featuring a Fortran 77 interface. Otherwise, it is
>>> # necessary to fill out the F2CDEFS variable with the appropriate
>>> # options. **One and only one** option should be chosen in **each** of
>>> # the 3 following categories:
>>> #
>>> # 1) name space (How C calls a Fortran 77 routine)
>>> #
>>> # -DAdd_ : all lower case and a suffixed underscore (Suns,
>>> # Intel, ...), [default]
>>> # -DNoChange : all lower case (IBM RS6000),
>>> # -DUpCase : all upper case (Cray),
>>> # -DAdd__ : the FORTRAN compiler in use is f2c.
>>> #
>>> # 2) C and Fortran 77 integer mapping
>>> #
>>> # -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default]
>>> # -DF77_INTEGER=long : Fortran 77 INTEGER is a C long,
>>> # -DF77_INTEGER=short : Fortran 77 INTEGER is a C short.
>>> #
>>> # 3) Fortran 77 string handling
>>> #
>>> # -DStringSunStyle : The string address is passed at the string loca-
>>> # tion on the stack, and the string length is then
>>> # passed as an F77_INTEGER after all explicit
>>> # stack arguments, [default]
>>> # -DStringStructPtr : The address of a structure is passed by a
>>> # Fortran 77 string, and the structure is of the
>>> # form: struct {char *cp; F77_INTEGER len;},
>>> # -DStringStructVal : A structure is passed by value for each Fortran
>>> # 77 string, and the structure is of the form:
>>> # struct {char *cp; F77_INTEGER len;},
>>> # -DStringCrayStyle : Special option for Cray machines, which uses
>>> # Cray fcd (fortran character descriptor) for
>>> # interoperation.
>>> #
>>> F2CDEFS =
>>> #
>>> # ------------------------------**------------------------------**
>>> ----------
>>> # - HPL includes / libraries / specifics ------------------------------*
>>> *-
>>> # ------------------------------**------------------------------**
>>> ----------
>>> #
>>> HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc)
>>> HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib) -lm
>>> #
>>> # - Compile time options ------------------------------**
>>> -----------------
>>> #
>>> # -DHPL_COPY_L force the copy of the panel L before bcast;
>>> # -DHPL_CALL_CBLAS call the cblas interface;
>>> # -DHPL_CALL_VSIPL call the vsip library;
>>> # -DHPL_DETAILED_TIMING enable detailed timers;
>>> #
>>> # By default HPL will:
>>> # *) not copy L before broadcast,
>>> # *) call the BLAS Fortran 77 interface,
>>> # *) not display detailed timing information.
>>> #
>>> HPL_OPTS = -DHPL_CALL_CBLAS
>>> #
>>> # ------------------------------**------------------------------**
>>> ----------
>>> #
>>> HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)
>>> #
>>> # ------------------------------**------------------------------**
>>> ----------
>>> # - Compilers / linkers - Optimization flags ---------------------------
>>> # ------------------------------**------------------------------**
>>> ----------
>>> #
>>> CC = /usr/bin/mpicc
>>> CCNOOPT = $(HPL_DEFS)
>>> CCFLAGS = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops
>>> #CCFLAGS = $(HPL_DEFS)
>>> #
>>> # On some platforms, it is necessary to use the Fortran linker to find
>>> # the Fortran internals used in the BLAS library.
>>> #
>>> LINKER = /usr/bin/mpif90
>>> LINKFLAGS = $(CCFLAGS)
>>> #
>>> ARCHIVER = ar
>>> ARFLAGS = r
>>> RANLIB = echo
>>> #
>>> # ------------------------------**------------------------------**
>>> ----------
>>>
>>> Would you mind please help me figure this problem out?
>>>
>>> Regards,
>>> Reza
>>> ______________________________**_________________
>>> users mailing list
>>> users_at_[hidden] <mailto:users_at_[hidden]>
>>> http://www.open-mpi.org/**mailman/listinfo.cgi/users>
>>>
>>
>>
>>
>> ______________________________**_________________
>> users mailing list
>> users_at_[hidden]
>>
http://www.open-mpi.org/**mailman/listinfo.cgi/users>
>>
>
> ______________________________**_________________
> users mailing list
> users_at_[hidden]
>
http://www.open-mpi.org/**mailman/listinfo.cgi/users>
>