Dear list members

 

I am using openmpi 1.3.3 with OFED on a HP cluster with redhatLinux.

 

Occasionally (not always) I get a crash with the following message:

 

[hydra11:09312] *** Process received signal ***

[hydra11:09312] Signal: Segmentation fault (11)

[hydra11:09312] Signal code: Address not mapped (1)

[hydra11:09312] Failing at address: 0xffffffffab5f30a8

[hydra11:09312] [ 0] /lib64/libpthread.so.0 [0x3c1400e4c0]

[hydra11:09312] [ 1] /home/ipl/openmpi-1.3.3/platforms/hp/lib/libmpi.so.0(MPI_Isend+0x93) [0x2af1be45a3e3]

[hydra11:09312] [ 2] ./flow(MP_SendReal+0x60) [0x6bc993]

[hydra11:09312] [ 3] ./flow(SendRealsAlongFaceWithOffset_3D+0x4ab) [0x68ba19]

[hydra11:09312] [ 4] ./flow(MP_SendVertexArrayBlock+0x23d) [0x6891e1]

[hydra11:09312] [ 5] ./flow(MB_CommAllVertex+0x65) [0x6848ba]

[hydra11:09312] [ 6] ./flow(MB_SetupVertexArray+0xd5) [0x68c837]

[hydra11:09312] [ 7] ./flow(MB_SetupGrid+0xa8) [0x68be51]

[hydra11:09312] [ 8] ./flow(SetGrid+0x58) [0x446224]

[hydra11:09312] [ 9] ./flow(main+0x148) [0x43b728]

[hydra11:09312] [10] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3c1341d974]

[hydra11:09312] [11] ./flow(__gxx_personality_v0+0xd9) [0x429b19]

[hydra11:09312] *** End of error message ***

--------------------------------------------------------------------------

mpirun noticed that process rank 6 with PID 9312 on node hydra11 exited on signal 11 (Segmentation fault).

--------------------------------------------------------------------------

 

The crash does not appear always – sometimes the application runs fine. However, it seems that the crash especially occurs when I run on more than 1 node.

 

I have consulted the archive of open-mpi and have found many error messages of the same kind, but none from the 1.3.3 version, and none of direct relevance.

 

I would really appreciate comments on this. Below is the information required according to the openmpi web,

 

Config.log: attached (config.zip)

Open mpi was configured with prefix and with the path to openib, and with the following compiler flags

setenv CC gcc

setenv CFLAGS '-O'

setenv CXX g++

setenv CXXFLAGS '-O'

setenv F77 'gfortran'

setenv FFLAGS '-O'

 

ompi_info –all:

attached

 

The application (named flow) was launched on hydra11 by

nohup mpirun –H hydra11,hydra12 –np 8 ./flow caseC.in &

 

the PATH and LD_LIBRARY_PATH, hydra11 and hydra12:

PATH=/home/ipl/openmpi-1.3.3/platforms/hp/bin

LD_LIBRARY_PATH= /home/ipl/openmpi-1.3.3/platforms/hp/lib

 

OpenFabrics version: 1.4

 

Linux:

X86_64-redhat-linux/3.4.6

 

ibv_devinfo, hydra11: attached

ibv_devinfo, hydra12: attached

 

ifconfig, hydra11: attached

ifconfig, hydra12: attached

 

ulimit –l (hydra11): 6000000

ulimit –l (hydra12): unlimited

 

Furthermore, I can say that I have not specified any MCA parameters.

 

The application which I am running  (named flow) is linked from fortran, c and c++ libraries with the following:

/home/ipl/openmpi-1.3.3/platforms/hp/bin/mpicc        -DMP -DNS3_ARCH_LINUX -DLAPACK  -I/home/ipl/ns3/engine/include_forLinux -I/home/ipl/openmpi-1.3.3/platforms/hp/include    -c -o user_small_3D.o user_small_3D.c

rm -f flow

/home/ipl/openmpi-1.3.3/platforms/hp/bin/mpicxx   -o flow  user_small_3D.o  -L/home/ipl/ns3/engine/lib_forLinux -lns3main -lns3pars -lns3util -lns3vofl -lns3turb -lns3solv -lns3mesh -lns3diff -lns3grid -lns3line -lns3data -lns3base -lfitpack -lillusolve -lfftpack_small -lfenton -lns3air -lns3dens -lns3poro -lns3sedi -llapack_small -lblas_small -lm -lgfortran  /home/ipl/ns3/engine/lib_Tecplot_forLinux/tecio64.a  

 

Please let me know if you need more info!

 

Thanks in advance,

Iris Lohmann

 

 

 

 

Iris Pernille Lohmann

MSc, PhD

Ports & Offshore Technology (POT)

 

 

DHI

Agern Allé 5

DK-2970 Hørsholm

Denmark

 

Tel:  

 

+45 4516 9200

Direct:

 

45169427

 

ipl@dhigroup.com

www.dhigroup.com

 

WATER  •  ENVIRONMENT  •  HEALTH

 

 

*****************************************************************************
**                                                                         **
** WARNING:  This email contains an attachment of a very suspicious type.  **
** You are urged NOT to open this attachment unless you are absolutely     **
** sure it is legitimate.  Opening this attachment may cause irreparable   **
** damage to your computer and your files.  If you have any questions      **
** about the validity of this message, PLEASE SEEK HELP BEFORE OPENING IT. **
**                                                                         **
** This warning was added by the IU Computer Science Dept. mail scanner.   **
*****************************************************************************