Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Infiniband errors
From: Syed Ahsan Ali (ahsanshah01_at_[hidden])
Date: 2012-12-20 22:19:23


Dear Yann

Here is the output

*[root_at_compute-01-01 ~]# cat /etc/redhat-release*
Red Hat Enterprise Linux Server release 5.3 (Tikanga)
*[root_at_compute-01-01 ~]# uname -a*
Linux compute-01-01.private.dns.zone 2.6.18-128.el5 #1 SMP Wed Dec 17
11:41:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
*[root_at_compute-01-01 ~]# lsmod*
Module Size Used by
blcr 118276 0
blcr_vmadump 56728 1 blcr
blcr_imports 47488 2 blcr,blcr_vmadump
autofs4 57033 2
hidp 83521 2
nfs 290189 5
lockd 99185 2 nfs
fscache 52385 1 nfs
nfs_acl 36673 1 nfs
rfcomm 104809 0
l2cap 89281 10 hidp,rfcomm
bluetooth 118597 5 hidp,rfcomm,l2cap
sunrpc 197897 17 nfs,lockd,nfs_acl
cpufreq_ondemand 42449 8
acpi_cpufreq 47937 1
freq_table 40889 2 cpufreq_ondemand,acpi_cpufreq
rdma_ucm 47872 8
qlgc_vnic 151168 0
ib_sdp 147176 0
rdma_cm 68500 2 rdma_ucm,ib_sdp
iw_cm 43656 1 rdma_cm
ib_addr 41992 1 rdma_cm
ib_ipoib 113240 0
ipoib_helper 35728 2 ib_ipoib
ib_cm 73000 3 qlgc_vnic,rdma_cm,ib_ipoib
ib_sa 75016 4 qlgc_vnic,rdma_cm,ib_ipoib,ib_cm
ipv6 424609 71 ib_ipoib
xfrm_nalgo 43333 1 ipv6
crypto_api 42945 1 xfrm_nalgo
ib_uverbs 75824 1 rdma_ucm
ib_umad 50472 0
iw_cxgb3 107476 0
cxgb3 155120 1 iw_cxgb3
ib_ipath 355456 0
mlx4_ib 99260 0
mlx4_core 136036 1 mlx4_ib
ib_mthca 157988 0
ib_mad 70948 5 ib_cm,ib_sa,ib_umad,mlx4_ib,ib_mthca
ib_core 108544 15
rdma_ucm,qlgc_vnic,ib_sdp,rdma_cm,iw_cm,ib_ipoib,ib_cm,ib_sa,ib_uverbs,ib_umad,iw_cxgb3,ib_ipath,mlx4_ib,ib_mthca,ib_mad
dm_mirror 53193 0
dm_multipath 55257 0
scsi_dh 41665 1 dm_multipath
video 53197 0
hwmon 36553 0
backlight 39873 1 video
sbs 49921 0
i2c_ec 38593 1 sbs
i2c_core 56129 1 i2c_ec
button 40545 0
battery 43849 0
asus_acpi 50917 0
acpi_memhotplug 40133 0
ac 38729 0
parport_pc 62312 0
lp 47121 0
parport 73165 2 parport_pc,lp
joydev 43969 0
sr_mod 50789 0
cdrom 68713 1 sr_mod
bnx2 210249 0
sg 69993 0
i5000_edac 43465 0
edac_mc 60193 1 i5000_edac
serio_raw 40517 0
pcspkr 36289 0
dm_raid45 99025 0
dm_message 36161 1 dm_raid45
dm_region_hash 46145 1 dm_raid45
dm_log 44865 3 dm_mirror,dm_raid45,dm_region_hash
dm_mod 100369 4 dm_mirror,dm_multipath,dm_raid45,dm_log
dm_mem_cache 38977 1 dm_raid45
usb_storage 116129 0
qla2xxx 1107173 0
scsi_transport_fc 73801 1 qla2xxx
shpchp 70637 0
mptsas 69201 5
mptscsih 69697 1 mptsas
mptbase 113637 2 mptsas,mptscsih
scsi_transport_sas 66753 1 mptsas
sd_mod 56385 6
scsi_mod 196569 10
scsi_dh,sr_mod,sg,usb_storage,qla2xxx,scsi_transport_fc,mptsas,mptscsih,scsi_transport_sas,sd_mod
ext3 168017 4
jbd 94257 1 ext3
uhci_hcd 57433 0
ohci_hcd 55925 0
ehci_hcd 65741 0
*[root_at_compute-01-01 ~]# cat /proc/mounts*
rootfs / rootfs rw 0 0
/dev/root / ext3 rw,data=ordered 0 0
/dev /dev tmpfs rw 0 0
/proc /proc proc rw 0 0
/sys /sys sysfs rw 0 0
/proc/bus/usb /proc/bus/usb usbfs rw 0 0
devpts /dev/pts devpts rw 0 0
/dev/sda5 /var ext3 rw,data=ordered 0 0
/dev/sda6 /data ext3 rw,data=ordered 0 0
/dev/sda1 /boot ext3 rw,data=ordered 0 0
tmpfs /dev/shm tmpfs rw 0 0
none /proc/sys/fs/binfmt_misc binfmt_misc rw 0 0
sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0
10.0.0.1:/depot/shared /shared nfs
rw,vers=3,rsize=32768,wsize=32768,hard,proto=tcp,timeo=600,retrans=2,sec=sys,addr=10.0.0.1
0 0
10.0.0.1:/home /home nfs
rw,vers=3,rsize=32768,wsize=32768,hard,proto=tcp,timeo=600,retrans=2,sec=sys,addr=10.0.0.1
0 0
10.0.0.1:/opt/intel /opt/intel nfs
rw,vers=3,rsize=32768,wsize=32768,hard,proto=tcp,timeo=600,retrans=2,sec=sys,addr=10.0.0.1
0 0
10.0.0.1:/SATA-Backup /SATA-Backup nfs
rw,vers=3,rsize=32768,wsize=32768,hard,proto=tcp,timeo=600,retrans=2,sec=sys,addr=10.0.0.1
0 0
10.0.0.1:/FC-data /FC-data nfs
rw,vers=3,rsize=32768,wsize=32768,hard,proto=tcp,timeo=600,retrans=2,sec=sys,addr=10.0.0.1
0 0
/etc/auto.misc /misc autofs
rw,fd=6,pgrp=3876,timeout=300,minproto=5,maxproto=5,indirect 0 0
-hosts /net autofs
rw,fd=12,pgrp=3876,timeout=300,minproto=5,maxproto=5,indirect 0 0

Thanks and Regards

On Wed, Dec 19, 2012 at 8:38 PM, Yann Droneaud <ydroneaud_at_[hidden]> wrote:

> Le mercredi 19 décembre 2012 à 12:12 +0500, Syed Ahsan Ali a écrit :
> > Dear John
> >
> > I found this output of ibstatus on some nodes (most probably the
> > problem causing)
> > [root_at_compute-01-08 ~]# ibstatus
> >
> > Fatal error: device '*': sys files not found
> > (/sys/class/infiniband/*/ports)
> >
> > Does this show any hardware or software issue?
> >
>
> This is a software issue.
>
> Which Linux (lsb_release --all or cat /etc/redhat-release) and kernel
> (uname -a) version are you using ?
>
> Which modules are loaded (lsmod) ?
>
> Is /sys mounted (mount and/or cat /proc/mounts) ?
>
> Regards.
>
> --
> Yann Droneaud
> OPTEYA
>
>
>

-- 
Syed Ahsan Ali Bokhari
Electronic Engineer (EE)
Research & Development Division
Pakistan Meteorological Department H-8/4, Islamabad.
Phone # off  +92518358714
Cell # +923155145014