Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Mostyn Lewis (Mostyn.Lewis_at_[hidden])
Date: 2007-11-06 13:52:32


Andrew,

Failure looks like:

> > + mpirun --prefix
> > +
> /tools/openmpi/1.3a1r16632_svn/infinicon/gcc64/4.1.2/udapl/suse_sles_1
> > + 0/x86_64/opteron -np 8
> > -machinefile H ./a.out
> > Process 0 of 8 on s1470
> > Process 1 of 8 on s1470
> > Process 4 of 8 on s1469
> > Process 2 of 8 on s1470
> > Process 7 of 8 on s1469
> > Process 5 of 8 on s1469
> > Process 6 of 8 on s1469
> > Process 3 of 8 on s1470
> > 30989:a.out *->0 (f=noaffinity,0,1,2,3)
> > 30988:a.out *->0 (f=noaffinity,0,1,2,3)
> > 30990:a.out *->0 (f=noaffinity,0,1,2,3)
> > 30372:a.out *->0 (f=noaffinity,0,1,2,3)
> > 30991:a.out *->0 (f=noaffinity,0,1,2,3)
> > 30370:a.out *->0 (f=noaffinity,0,1,2,3)
> > 30369:a.out *->0 (f=noaffinity,0,1,2,3)
> > 30371:a.out *->0 (f=noaffinity,0,1,2,3)
> > get ASYNC ERROR = 6
> > [s1469:30369] *** Process received signal *** [s1469:30369] Signal:
> > Segmentation fault (11) [s1469:30369] Signal code: Address not mapped
> > (1) [s1469:30369] Failing at address: 0x110 [s1469:30369] [ 0]
> > /lib64/libpthread.so.0 [0x2b528ceefc10] [s1469:30369] [ 1]
> > /lib64/libdapl.so(dapl_llist_next_entry+0x25) [0x2b528fba5df5]
> > [s1469:30369] *** End of error message ***

> > and in a /var/log/messages I see:
> >
> > Nov 5 14:46:00 s1469 sshd[30363]: Accepted publickey for mostyn from
> > 10.173.132.37 port 36211 ssh2 Nov 5 14:46:25 s1469 kernel: TVpd:
> > !ERROR! Async Event:TAVOR_EQE_TYPE_CQ_ERR: (CQ Access Error) cqn:641
> Nov
> > 5 14:46:25 s1469 kernel: a.out[30374]: segfault at 0000000000000110
> rip
> > 00002b528fba5df5 rsp 00000000410010b0 error 4
> >
> > This is repoducible.
> >
> > Is this OpenMPI or your libdapl that's doing this, you think?
> >
> > + ompi_info
> > Open MPI: 1.3a1svn11022007
> > Open MPI SVN revision: svn11022007
> > Open RTE: 1.3a1svn11022007
> > Open RTE SVN revision: svn11022007
> > OPAL: 1.3a1svn11022007
> > OPAL SVN revision: svn11022007
> > Prefix:
> >
> /tools/openmpi/1.3a1r16632_svn/infinicon/gcc64/4.1.2/udapl/suse_sles_10/
> > x86_64/opter
> > on
> > Configured architecture: x86_64-unknown-linux-gnu
> > Configure host: s1471
> > Configured by: root
> > Configured on: Fri Nov 2 16:20:29 PDT 2007
> > Configure host: s1471
> > Built by: mostyn
> > Built on: Fri Nov 2 16:30:07 PDT 2007
> > Built host: s1471
> > C bindings: yes
> > C++ bindings: yes
> > Fortran77 bindings: yes (all)
> > Fortran90 bindings: yes
> > Fortran90 bindings size: small
> > C compiler: gcc
> > C compiler absolute: /usr/bin/gcc
> > C++ compiler: g++
> > C++ compiler absolute: /usr/bin/g++
> > Fortran77 compiler: gfortran
> > Fortran77 compiler abs: /usr/bin/gfortran
> > Fortran90 compiler: gfortran
> > Fortran90 compiler abs: /usr/bin/gfortran
> > C profiling: yes
> > C++ profiling: yes
> > Fortran77 profiling: yes
> > Fortran90 profiling: yes
> > C++ exceptions: no
> > Thread support: posix (mpi: no, progress: no)
> > Sparse Groups: no
> > Internal debug support: no
> > MPI parameter check: runtime
> > Memory profiling support: no
> > Memory debugging support: no
> > libltdl support: yes
> > Heterogeneous support: yes
> > mpirun default --prefix: no
> > MPI I/O support: yes
> > MCA backtrace: execinfo (MCA v1.0, API v1.0, Component
> v1.3)
> > MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component
> > v1.3)
> > MCA paffinity: linux (MCA v1.0, API v1.1, Component v1.3)
> > MCA maffinity: first_use (MCA v1.0, API v1.0, Component
> > v1.3)
> > MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.3)
> > MCA timer: linux (MCA v1.0, API v1.0, Component v1.3)
> > MCA installdirs: env (MCA v1.0, API v1.0, Component v1.3)
> > MCA installdirs: config (MCA v1.0, API v1.0, Component v1.3)
> > MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
> > MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
> > MCA coll: basic (MCA v1.0, API v1.1, Component v1.3)
> > MCA coll: inter (MCA v1.0, API v1.1, Component v1.3)
> > MCA coll: self (MCA v1.0, API v1.1, Component v1.3)
> > MCA coll: sm (MCA v1.0, API v1.1, Component v1.3)
> > MCA coll: tuned (MCA v1.0, API v1.1, Component v1.3)
> > MCA io: romio (MCA v1.0, API v1.0, Component v1.3)
> > MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.3)
> > MCA mpool: sm (MCA v1.0, API v1.0, Component v1.3)
> > MCA pml: cm (MCA v1.0, API v1.0, Component v1.3)
> > MCA pml: dr (MCA v1.0, API v1.0, Component v1.3)
> > MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.3)
> > MCA bml: r2 (MCA v1.0, API v1.0, Component v1.3)
> > MCA rcache: vma (MCA v1.0, API v1.0, Component v1.3)
> > MCA btl: self (MCA v1.0, API v1.0.1, Component v1.3)
> > MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.3)
> > MCA btl: udapl (MCA v1.0, API v1.0, Component v1.3)
> > MCA topo: unity (MCA v1.0, API v1.0, Component v1.3)
> > MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.3)
> > MCA osc: rdma (MCA v1.0, API v1.0, Component v1.3)
> > MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.3)
> > MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.3)
> > MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.3)
> > MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.3)
> > MCA gpr: replica (MCA v1.0, API v1.0, Component v1.3)
> > MCA grpcomm: basic (MCA v1.0, API v2.0, Component v1.3)
> > MCA iof: proxy (MCA v1.0, API v1.0, Component v1.3)
> > MCA iof: svc (MCA v1.0, API v1.0, Component v1.3)
> > MCA ns: proxy (MCA v1.0, API v2.0, Component v1.3)
> > MCA ns: replica (MCA v1.0, API v2.0, Component v1.3)
> > MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
> > MCA odls: default (MCA v1.0, API v1.3, Component v1.3)
> > MCA ras: dash_host (MCA v1.0, API v1.3, Component
> > v1.3)
> > MCA ras: localhost (MCA v1.0, API v1.3, Component
> > v1.3)
> > MCA ras: slurm (MCA v1.0, API v1.3, Component v1.3)
> > MCA rds: hostfile (MCA v1.0, API v1.3, Component
> v1.3)
> > MCA rds: proxy (MCA v1.0, API v1.3, Component v1.3)
> > MCA rmaps: round_robin (MCA v1.0, API v1.3, Component
> > v1.3)
> > MCA rmgr: proxy (MCA v1.0, API v2.0, Component v1.3)
> > MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.3)
> > MCA rml: oob (MCA v1.0, API v1.0, Component v1.3)
> > MCA routed: tree (MCA v1.0, API v1.0, Component v1.3)
> > MCA routed: unity (MCA v1.0, API v1.0, Component v1.3)
> > MCA pls: proxy (MCA v1.0, API v1.3, Component v1.3)
> > MCA pls: rsh (MCA v1.0, API v1.3, Component v1.3)
> > MCA pls: slurm (MCA v1.0, API v1.3, Component v1.3)
> > MCA sds: env (MCA v1.0, API v1.0, Component v1.3)
> > MCA sds: pipe (MCA v1.0, API v1.0, Component v1.3)
> > MCA sds: seed (MCA v1.0, API v1.0, Component v1.3)
> > MCA sds: singleton (MCA v1.0, API v1.0, Component
> > v1.3)
> > MCA sds: slurm (MCA v1.0, API v1.0, Component v1.3)
> > MCA filem: rsh (MCA v1.0, API v1.0, Component v1.3)

Regards,
Mostyn

On Tue, 6 Nov 2007, Andrew Friedley wrote:

> All thread support is disabled by default in Open MPI; the uDAPL BTL is
> neither thread safe nor makes use of a threaded uDAPL implementation.
> For completeness, the thread support is controlled by the
> --enable-mpi-threads and --enable-progress-threads options to the
> configure script.
>
> The referense you're seeing to libpthread.so.0 is a side effect of the
> way we print backtraces when crashes occur and can be ignored.
>
> How exactly does your MPI program fail? Make sure you take a look at
> http://www.open-mpi.org/community/help/ and provide all relevant
> information.
>
> Andrew
>
> Mostyn Lewis wrote:
>> I'm trying to build a udapl OpenMPI from last Friday's SVN and using
>> Qlogic/QuickSilver/SilverStorm 4.1.0.0.1 software. I can get it
>> made and it works in machine. With IB between 2 machines is fails
>> near termination of a job. Qlogic says they don't have a threaded
>> udapl (libpthread is in the traceback).
>>
>> How do you (can you?) configure pthreads away alltogether?
>>
>> Mostyn
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>