Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Andrew Friedley (afriedle_at_[hidden])
Date: 2007-11-06 14:29:03


Mostyn Lewis wrote:
> Andrew,
>
> Failure looks like:
>
>>> + mpirun --prefix
>>> +
>> /tools/openmpi/1.3a1r16632_svn/infinicon/gcc64/4.1.2/udapl/suse_sles_1
>>> + 0/x86_64/opteron -np 8
>>> -machinefile H ./a.out
>>> Process 0 of 8 on s1470
>>> Process 1 of 8 on s1470
>>> Process 4 of 8 on s1469
>>> Process 2 of 8 on s1470
>>> Process 7 of 8 on s1469
>>> Process 5 of 8 on s1469
>>> Process 6 of 8 on s1469
>>> Process 3 of 8 on s1470
>>> 30989:a.out *->0 (f=noaffinity,0,1,2,3)
>>> 30988:a.out *->0 (f=noaffinity,0,1,2,3)
>>> 30990:a.out *->0 (f=noaffinity,0,1,2,3)
>>> 30372:a.out *->0 (f=noaffinity,0,1,2,3)
>>> 30991:a.out *->0 (f=noaffinity,0,1,2,3)
>>> 30370:a.out *->0 (f=noaffinity,0,1,2,3)
>>> 30369:a.out *->0 (f=noaffinity,0,1,2,3)
>>> 30371:a.out *->0 (f=noaffinity,0,1,2,3)
>>> get ASYNC ERROR = 6

I thought this might be coming from the uDAPL BTL but I don't see where
in the could this could possibly be printed from.

>>> [s1469:30369] *** Process received signal *** [s1469:30369] Signal:
>>> Segmentation fault (11) [s1469:30369] Signal code: Address not mapped
>>> (1) [s1469:30369] Failing at address: 0x110 [s1469:30369] [ 0]
>>> /lib64/libpthread.so.0 [0x2b528ceefc10] [s1469:30369] [ 1]
>>> /lib64/libdapl.so(dapl_llist_next_entry+0x25) [0x2b528fba5df5]
>>> [s1469:30369] *** End of error message ***
>
>>> and in a /var/log/messages I see:
>>>
>>> Nov 5 14:46:00 s1469 sshd[30363]: Accepted publickey for mostyn from
>>> 10.173.132.37 port 36211 ssh2 Nov 5 14:46:25 s1469 kernel: TVpd:
>>> !ERROR! Async Event:TAVOR_EQE_TYPE_CQ_ERR: (CQ Access Error) cqn:641
>> Nov
>>> 5 14:46:25 s1469 kernel: a.out[30374]: segfault at 0000000000000110
>> rip
>>> 00002b528fba5df5 rsp 00000000410010b0 error 4
>>>

This makes me wonder if you're using the right DAT libraries. Take a
look at your dat.conf, it's usually found in /etc and make sure that it
is configured properly for the Qlogic stuff, and does NOT contain any
lines for any other stuff (like OFED-based interfaces). Usually each
line contains a path to a specific library to use for a particular
interface, make sure it's the library you want. You might have to
contact you uDAPL vendor for help on that.

>>> This is repoducible.
>>>
>>> Is this OpenMPI or your libdapl that's doing this, you think?

I can't be sure -- every uDAPL implementation seems to interpret the
spec differently (or completely change or leave out some functionality),
making it hell to provide portable uDAPL support. And currently the
uDAPL BTL has seen little/no testing outside of Sun's and OFED's uDAPL.

What kind of interface adapters are you using? Sounds like some kind of
IB hardware; if possible I recommend using the OFED (openib BTL) or PSM
(PSM MTL) interfaces instead of uDAPL.

Andrew

>>>
>>> + ompi_info
>>> Open MPI: 1.3a1svn11022007
>>> Open MPI SVN revision: svn11022007
>>> Open RTE: 1.3a1svn11022007
>>> Open RTE SVN revision: svn11022007
>>> OPAL: 1.3a1svn11022007
>>> OPAL SVN revision: svn11022007
>>> Prefix:
>>>
>> /tools/openmpi/1.3a1r16632_svn/infinicon/gcc64/4.1.2/udapl/suse_sles_10/
>>> x86_64/opter
>>> on
>>> Configured architecture: x86_64-unknown-linux-gnu
>>> Configure host: s1471
>>> Configured by: root
>>> Configured on: Fri Nov 2 16:20:29 PDT 2007
>>> Configure host: s1471
>>> Built by: mostyn
>>> Built on: Fri Nov 2 16:30:07 PDT 2007
>>> Built host: s1471
>>> C bindings: yes
>>> C++ bindings: yes
>>> Fortran77 bindings: yes (all)
>>> Fortran90 bindings: yes
>>> Fortran90 bindings size: small
>>> C compiler: gcc
>>> C compiler absolute: /usr/bin/gcc
>>> C++ compiler: g++
>>> C++ compiler absolute: /usr/bin/g++
>>> Fortran77 compiler: gfortran
>>> Fortran77 compiler abs: /usr/bin/gfortran
>>> Fortran90 compiler: gfortran
>>> Fortran90 compiler abs: /usr/bin/gfortran
>>> C profiling: yes
>>> C++ profiling: yes
>>> Fortran77 profiling: yes
>>> Fortran90 profiling: yes
>>> C++ exceptions: no
>>> Thread support: posix (mpi: no, progress: no)
>>> Sparse Groups: no
>>> Internal debug support: no
>>> MPI parameter check: runtime
>>> Memory profiling support: no
>>> Memory debugging support: no
>>> libltdl support: yes
>>> Heterogeneous support: yes
>>> mpirun default --prefix: no
>>> MPI I/O support: yes
>>> MCA backtrace: execinfo (MCA v1.0, API v1.0, Component
>> v1.3)
>>> MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component
>>> v1.3)
>>> MCA paffinity: linux (MCA v1.0, API v1.1, Component v1.3)
>>> MCA maffinity: first_use (MCA v1.0, API v1.0, Component
>>> v1.3)
>>> MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.3)
>>> MCA timer: linux (MCA v1.0, API v1.0, Component v1.3)
>>> MCA installdirs: env (MCA v1.0, API v1.0, Component v1.3)
>>> MCA installdirs: config (MCA v1.0, API v1.0, Component v1.3)
>>> MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
>>> MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
>>> MCA coll: basic (MCA v1.0, API v1.1, Component v1.3)
>>> MCA coll: inter (MCA v1.0, API v1.1, Component v1.3)
>>> MCA coll: self (MCA v1.0, API v1.1, Component v1.3)
>>> MCA coll: sm (MCA v1.0, API v1.1, Component v1.3)
>>> MCA coll: tuned (MCA v1.0, API v1.1, Component v1.3)
>>> MCA io: romio (MCA v1.0, API v1.0, Component v1.3)
>>> MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.3)
>>> MCA mpool: sm (MCA v1.0, API v1.0, Component v1.3)
>>> MCA pml: cm (MCA v1.0, API v1.0, Component v1.3)
>>> MCA pml: dr (MCA v1.0, API v1.0, Component v1.3)
>>> MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.3)
>>> MCA bml: r2 (MCA v1.0, API v1.0, Component v1.3)
>>> MCA rcache: vma (MCA v1.0, API v1.0, Component v1.3)
>>> MCA btl: self (MCA v1.0, API v1.0.1, Component v1.3)
>>> MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.3)
>>> MCA btl: udapl (MCA v1.0, API v1.0, Component v1.3)
>>> MCA topo: unity (MCA v1.0, API v1.0, Component v1.3)
>>> MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.3)
>>> MCA osc: rdma (MCA v1.0, API v1.0, Component v1.3)
>>> MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.3)
>>> MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.3)
>>> MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.3)
>>> MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.3)
>>> MCA gpr: replica (MCA v1.0, API v1.0, Component v1.3)
>>> MCA grpcomm: basic (MCA v1.0, API v2.0, Component v1.3)
>>> MCA iof: proxy (MCA v1.0, API v1.0, Component v1.3)
>>> MCA iof: svc (MCA v1.0, API v1.0, Component v1.3)
>>> MCA ns: proxy (MCA v1.0, API v2.0, Component v1.3)
>>> MCA ns: replica (MCA v1.0, API v2.0, Component v1.3)
>>> MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
>>> MCA odls: default (MCA v1.0, API v1.3, Component v1.3)
>>> MCA ras: dash_host (MCA v1.0, API v1.3, Component
>>> v1.3)
>>> MCA ras: localhost (MCA v1.0, API v1.3, Component
>>> v1.3)
>>> MCA ras: slurm (MCA v1.0, API v1.3, Component v1.3)
>>> MCA rds: hostfile (MCA v1.0, API v1.3, Component
>> v1.3)
>>> MCA rds: proxy (MCA v1.0, API v1.3, Component v1.3)
>>> MCA rmaps: round_robin (MCA v1.0, API v1.3, Component
>>> v1.3)
>>> MCA rmgr: proxy (MCA v1.0, API v2.0, Component v1.3)
>>> MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.3)
>>> MCA rml: oob (MCA v1.0, API v1.0, Component v1.3)
>>> MCA routed: tree (MCA v1.0, API v1.0, Component v1.3)
>>> MCA routed: unity (MCA v1.0, API v1.0, Component v1.3)
>>> MCA pls: proxy (MCA v1.0, API v1.3, Component v1.3)
>>> MCA pls: rsh (MCA v1.0, API v1.3, Component v1.3)
>>> MCA pls: slurm (MCA v1.0, API v1.3, Component v1.3)
>>> MCA sds: env (MCA v1.0, API v1.0, Component v1.3)
>>> MCA sds: pipe (MCA v1.0, API v1.0, Component v1.3)
>>> MCA sds: seed (MCA v1.0, API v1.0, Component v1.3)
>>> MCA sds: singleton (MCA v1.0, API v1.0, Component
>>> v1.3)
>>> MCA sds: slurm (MCA v1.0, API v1.0, Component v1.3)
>>> MCA filem: rsh (MCA v1.0, API v1.0, Component v1.3)
>
>
> Regards,
> Mostyn
>
>
> On Tue, 6 Nov 2007, Andrew Friedley wrote:
>
>> All thread support is disabled by default in Open MPI; the uDAPL BTL is
>> neither thread safe nor makes use of a threaded uDAPL implementation.
>> For completeness, the thread support is controlled by the
>> --enable-mpi-threads and --enable-progress-threads options to the
>> configure script.
>>
>> The referense you're seeing to libpthread.so.0 is a side effect of the
>> way we print backtraces when crashes occur and can be ignored.
>>
>> How exactly does your MPI program fail? Make sure you take a look at
>> http://www.open-mpi.org/community/help/ and provide all relevant
>> information.
>>
>> Andrew
>>
>> Mostyn Lewis wrote:
>>> I'm trying to build a udapl OpenMPI from last Friday's SVN and using
>>> Qlogic/QuickSilver/SilverStorm 4.1.0.0.1 software. I can get it
>>> made and it works in machine. With IB between 2 machines is fails
>>> near termination of a job. Qlogic says they don't have a threaded
>>> udapl (libpthread is in the traceback).
>>>
>>> How do you (can you?) configure pthreads away alltogether?
>>>
>>> Mostyn
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users