Hi,
Can you give more info about the compilation steps, I just
recompiled it (using the internal stuff except for fftw) and was able to run an
example (output below). Did I miss something ?
I recompiled / ran on a Platform OCS 5 cluster (based on RHEL 5),
with IB support (OFED)
Partial ompi_info :
Open MPI: 1.2.6
Open MPI SVN revision: r17946
Open RTE: 1.2.6
Open RTE SVN revision: r17946
OPAL: 1.2.6
OPAL SVN revision: r17946
Prefix: /home/mbozzore/openmpi
Configured architecture: x86_64-unknown-linux-gnu
Configured by: mbozzore
Configured on: Mon Aug 11 00:29:15 EDT 2008
Configure
host: tyan04.lsf.platform.com
Built by: mbozzore
Built on: Mon Aug 11 00:33:54 EDT 2008
Built host: tyan04.lsf.platform.com
C bindings: yes
C++ bindings: yes
Fortran77 bindings: yes (all)
Fortran90 bindings: yes
Fortran90 bindings size: small
C compiler: gcc
C compiler absolute: /usr/bin/gcc
C++ compiler: g++
C++ compiler absolute: /usr/bin/g++
Fortran77 compiler: gfortran
Fortran77 compiler abs: /usr/bin/gfortran
Fortran90 compiler: gfortran
Fortran90 compiler abs: /usr/bin/gfortran
[mbozzore@tyan04 tests]$ mpirun -np 4 --machinefile
./hosts -x LD_LIBRARY_PATH --mca btl openib,self ../bin/pw.x < scf.in
Program PWSCF
v.4.0.1 starts ...
Today is 15Aug2008 at 14:51:18
Parallel version (MPI)
Number of processors in
use: 4
R & G space division:
proc/pool = 4
For Norm-Conserving or Ultrasoft
(Vanderbilt) Pseudopotentials or PAW
Current dimensions of program pwscf
are:
Max number of different atomic species
(ntypx) = 10
Max number of k-points (npk) =
40000
Max angular momentum in
pseudopotentials (lmaxx) = 3
Iterative solution of the eigenvalue
problem
a parallel distributed memory algorithm
will be used,
eigenstates matrixes will be
distributed block like on
ortho sub-group =
2* 2 procs
Planes per process (thick) : nr3 = 16
npp = 4 ncplane = 256
Proc/ planes
cols G planes cols
G columns G
Pool (dense
grid) (smooth
grid) (wavefct grid)
1 4
41 366
4 41
366 13 70
2 4
41 366
4 41
366 14 71
3 4
40 362
4 40
362 14 71
4 4
41 365
4 41
365 14 71
tot
16 163 1459
16 163 1459
55 283
bravais-lattice
index
= 2
lattice parameter (a_0)
= 10.2000 a.u.
unit-cell
volume
= 265.3020 (a.u.)^3
number of
atoms/cell
= 2
number of atomic
types
= 1
number of electrons
= 8.00
number of Kohn-Sham
states= 4
kinetic-energy
cutoff = 12.0000 Ry
charge density
cutoff = 48.0000 Ry
convergence
threshold = 1.0E-06
mixing
beta
= 0.7000
number of iterations used
= 8
plain mixing
Exchange-correlation
= SLA PZ NOGX NOGC (1100)
celldm(1)=
10.200000 celldm(2)= 0.000000 celldm(3)=
0.000000
celldm(4)= 0.000000
celldm(5)= 0.000000 celldm(6)= 0.000000
crystal axes: (cart. coord. in units of
a_0)
a(1) = ( -0.500000 0.000000 0.500000 )
a(2) = ( 0.000000 0.500000 0.500000 )
a(3) = ( -0.500000 0.500000 0.000000 )
reciprocal axes: (cart. coord. in units
2 pi/a_0)
b(1) = ( -1.000000 -1.000000 1.000000 )
b(2) = ( 1.000000 1.000000 1.000000 )
b(3) = ( -1.000000 1.000000 -1.000000 )
PseudoPot. # 1 for Si read from file
Si.vbc.UPF
Pseudo is Norm-conserving, Zval =
4.0
Generated by new atomic code, or
converted to UPF format
Using radial grid of 431
points, 2 beta functions with:
l(1) = 0
l(2) = 1
atomic species
valence mass pseudopotential
Si
4.00 28.08600 Si( 1.00)
48 Sym.Ops. (with inversion)
Cartesian axes
site n.
atom
positions (a_0 units)
1 Si
tau( 1) = ( 0.0000000 0.0000000
0.0000000 )
2
Si tau( 2) = ( 0.2500000
0.2500000 0.2500000 )
number of k points= 2
cart. coord. in units 2pi/a_0
k(
1) = ( 0.2500000 0.2500000 0.2500000), wk
= 0.5000000
k(
2) = ( 0.2500000 0.2500000 0.7500000), wk
= 1.5000000
G cutoff = 126.4975
( 1459 G-vectors) FFT grid: ( 16, 16, 16)
Largest allocated
arrays est. size (Mb)
dimensions
Kohn-Sham
Wavefunctions 0.00
Mb ( 51, 4)
NL
pseudopotentials
0.01 Mb ( 51, 8)
Each V/rho on FFT
grid 0.02
Mb ( 1024)
Each G-vector
array 0.00
Mb ( 366)
G-vector
shells
0.00 Mb ( 42)
Largest temporary
arrays est. size (Mb)
dimensions
Auxiliary
wavefunctions 0.01
Mb ( 51, 16)
Each subspace H/S
matrix 0.00
Mb ( 16, 16)
Each
<psi_i|beta_j> matrix 0.00
Mb ( 8, 4)
Arrays for rho
mixing 0.13
Mb ( 1024, 8)
Initial potential from superposition of
free atoms
starting charge
7.99901, renormalised to 8.00000
Starting wfc are 8
atomic wfcs
total cpu time spent up to now
is 0.10 secs
per-process dynamical
memory: 21.9 Mb
Self-consistent Calculation
iteration # 1 ecut=
12.00 Ry beta=0.70
Davidson diagonalization with overlap
ethr = 1.00E-02, avg # of
iterations = 2.0
Threshold (ethr) on eigenvalues was too
large:
Diagonalizing with lowered threshold
Davidson diagonalization with overlap
ethr = 7.93E-04, avg # of
iterations = 1.0
total cpu time spent up to now
is 0.13 secs
total
energy
= -15.79103983 Ry
Harris-Foulkes estimate
= -15.81239602 Ry
estimated scf
accuracy < 0.06375741 Ry
iteration #
2 ecut= 12.00
Ry beta=0.70
Davidson diagonalization with overlap
ethr = 7.97E-04, avg # of
iterations = 1.0
total cpu time spent up to now
is 0.15 secs
total
energy
= -15.79409517 Ry
Harris-Foulkes estimate
= -15.79442220 Ry
estimated scf
accuracy < 0.00230261 Ry
iteration #
3 ecut= 12.00
Ry beta=0.70
Davidson diagonalization with overlap
ethr = 2.88E-05, avg # of
iterations = 2.0
total cpu time spent up to now
is 0.17 secs
total
energy
= -15.79447768 Ry
Harris-Foulkes estimate
= -15.79450039 Ry
estimated scf
accuracy < 0.00006345 Ry
iteration #
4 ecut= 12.00
Ry beta=0.70
Davidson diagonalization with overlap
ethr = 7.93E-07, avg # of
iterations = 2.0
total cpu time spent up to now
is 0.19 secs
total
energy
= -15.79449472 Ry
Harris-Foulkes estimate
= -15.79449644 Ry
estimated scf
accuracy < 0.00000455 Ry
iteration #
5 ecut= 12.00
Ry beta=0.70
Davidson diagonalization with overlap
ethr = 5.69E-08, avg # of
iterations = 2.5
total cpu time spent up to now
is 0.21 secs
End of self-consistent calculation
k =
0.2500 0.2500 0.2500 ( 180 PWs) bands (ev):
-4.8701 2.3792
5.5371 5.5371
k =
0.2500 0.2500 0.7500 ( 186 PWs) bands (ev):
-2.9165 -0.0653
2.6795 4.0355
! total
energy
= -15.79449556 Ry
Harris-Foulkes estimate
= -15.79449558 Ry
estimated scf
accuracy < 0.00000005 Ry
The total energy is the sum of the
following terms:
one-electron contribution
= 4.83378726 Ry
hartree
contribution = 1.08428951
Ry
xc
contribution
= -4.81281375 Ry
ewald
contribution =
-16.89975858 Ry
convergence has been achieved
in 5 iterations
entering subroutine stress ...
total stress
(Ry/bohr**3)
(kbar) P= -30.30
-0.00020597 0.00000000
0.00000000
-30.30 0.00 0.00
0.00000000 -0.00020597
0.00000000
0.00 -30.30 0.00
0.00000000 0.00000000
-0.00020597
0.00 0.00 -30.30
Writing output data file pwscf.save
PWSCF : 0.28s
CPU time, 0.39s wall time
init_run
: 0.05s CPU
electrons
: 0.11s CPU
stress : 0.00s CPU
Called by init_run:
wfcinit
: 0.01s CPU
potinit
: 0.00s CPU
Called by electrons:
c_bands
: 0.09s CPU ( 6
calls, 0.015 s avg)
sum_band
: 0.01s CPU ( 6
calls, 0.001 s avg)
v_of_rho
: 0.00s CPU ( 6
calls, 0.001 s avg)
mix_rho
: 0.00s CPU ( 6
calls, 0.000 s avg)
Called by c_bands:
init_us_2
: 0.00s CPU ( 28
calls, 0.000 s avg)
cegterg
: 0.09s CPU ( 12
calls, 0.007 s avg)
Called by *egterg:
h_psi : 0.01s
CPU ( 35 calls, 0.000 s avg)
g_psi : 0.00s
CPU ( 21 calls, 0.000 s avg)
cdiaghg
: 0.06s CPU ( 31
calls, 0.002 s avg)
Called by h_psi:
add_vuspsi
: 0.00s CPU ( 35
calls, 0.000 s avg)
General routines
calbec :
0.00s CPU ( 37 calls, 0.000 s avg)
cft3s : 0.02s
CPU ( 354 calls, 0.000 s avg)
davcio : 0.00s CPU
( 40 calls, 0.000 s avg)
Parallel routines
fft_scatter
: 0.01s CPU ( 354 calls,
0.000 s avg)
Open Source Solution Developer
Platform computing
Phone: +1 905 948 4649
From: users-bounces@open-mpi.org
[mailto:users-bounces@open-mpi.org] On Behalf Of C.Y. Lee
Sent: August-15-08 1:03 PM
To: users@open-mpi.org
Subject: [OMPI users] Segmentation fault (11) Address not mapped (1)
All,
I had a similar problem as James described in an earlier
message: http://www.open-mpi.org/community/lists/users/2008/07/6204.php
While he was able to recompile openmpi to solve the problem,
I had no luck with my RedHat Enterprise 5 system.
Here are two other threads with similar issues
regarding openmpi on Ubuntu and OSX which were solved: https://bugs.launchpad.net/ubuntu/+source/binutils/+bug/234837
Now...
Here is my story:
I had Quantum Espresso (QE) running without problem using
openmpi.
However, when I tried to recompile QE with a
recompiled fftw-2.1.5, it compiled without any error. But when I ran QE,
it gave me the error below:
*** Process received signal ***
Signal: Segmentation fault (11)
Signal code: Address not mapped (1)
Failing at address: 0x22071b70
[ 0] /lib64/libpthread.so.0 [0x352420de70]
[ 1] /usr/lib64/liblapack.so.3(dsytf2_+0xc43) [0x2aaaaac9f5e3]
[ 2] /usr/lib64/liblapack.so.3(dsytrf_+0x407) [0x2aaaaaca0567]
[ 3] /opt/espresso-4.0.1/bin/pw.x(mix_rho_+0x828) [0x5044b8]
[ 4] /opt/espresso-4.0.1/bin/pw.x(electrons_+0xb37) [0x4eae47]
[ 5] /opt/espresso-4.0.1/bin/pw.x(MAIN__+0xbf) [0x42b3af]
[ 6] /opt/espresso-4.0.1/bin/pw.x(main+0xe) [0x6aad5e]
[ 7] /lib64/libc.so.6(__libc_start_main+0xf4) [0x352361d8a4]
[ 8] /opt/espresso-4.0.1/bin/pw.x [0x42b239]
*** End of error message ***
From what I read from the above links, it seems to be a bug
in openmpi.
Please share your thoughts on this, thank you!
CY