Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jelena Pjesivac-Grbovic (pjesa_at_[hidden])
Date: 2007-09-19 20:54:51


The suggestion will probably work, but it is not a solution.
"choosing barrier synchronization" is not recommended by SKaMPI team and
that it reduces accuracy of the benchmark.
The problem is either at pml ob1 level or in btl ib level - and it has
to do with many messages being sent at the same time. You can reproduce
this type of problem at 4 - 5 nodes over IB (on odin) using bcast or
reduce using small segment sizes (1KB, less than eager size for ib). (I
do not think I saw it on 2 nodes). I haven't tried it on onesided
operations, but if it happens there too - I am even more likely to
believe in my theory :)

Thanks,
Jelena

Gleb Natapov wrote:
> On Wed, Sep 19, 2007 at 01:58:35PM -0600, Edmund Sumbar wrote:
>
>> I'm trying to run skampi-5.0.1-r0191 under PBS
>> over IB with the command line
>>
>> mpirun -np 2 ./skampi -i coll.ski -o coll_ib.sko
>>
> Can you add
> choose_barrier_synchronization()
> to coll.ski and try again? It looks like this one:
> https://svn.open-mpi.org/trac/ompi/ticket/1015
>
>
>> The pt2pt and mmisc tests run to completion.
>> The coll and onesided tests, on the other hand,
>> start to produce output but then seem to hang.
>> Actually, the cpus appear to be busy doing
>> something (I don't know what), but output stops.
>> The tests should only last the order of minutes
>> but I end up deleting the job after about 15 min.
>>
>> All test run to completion with --mca btl tcp,self
>>
>> Any suggestions as to how to diagnose this problem?
>> Are there any known issues with OpenMPI/IB and the
>> SKaMPI benchmark?
>>
>> (BTW, skampi works with mvapich2)
>>
>> System details follow...
>>
>> --
>> Ed[mund [Sumbar]]
>> AICT Research Support, Univ of Alberta
>>
>>
>> $ uname -a
>> Linux opteron-cluster.nic.ualberta.ca 2.6.21-smp #1 SMP Tue Aug 7 12:45:20 MDT 2007 x86_64 x86_64 x86_64 GNU/Linux
>>
>> $ ./configure --prefix=/usr/local/openmpi-1.2.3 --with-tm=/opt/torque --with-openib=/usr/lib --with-libnuma=/usr/lib64
>>
>> $ ompi_info
>> Open MPI: 1.2.3
>> Open MPI SVN revision: r15136
>> Open RTE: 1.2.3
>> Open RTE SVN revision: r15136
>> OPAL: 1.2.3
>> OPAL SVN revision: r15136
>> Prefix: /usr/local/openmpi-1.2.3
>> Configured architecture: x86_64-unknown-linux-gnu
>> Configured by: esumbar
>> Configured on: Mon Sep 17 10:00:35 MDT 2007
>> Configure host: opteron-cluster.nic.ualberta.ca
>> Built by: esumbar
>> Built on: Mon Sep 17 10:05:09 MDT 2007
>> Built host: opteron-cluster.nic.ualberta.ca
>> C bindings: yes
>> C++ bindings: yes
>> Fortran77 bindings: yes (all)
>> Fortran90 bindings: yes
>> Fortran90 bindings size: small
>> C compiler: gcc
>> C compiler absolute: /usr/bin/gcc
>> C++ compiler: g++
>> C++ compiler absolute: /usr/bin/g++
>> Fortran77 compiler: gfortran
>> Fortran77 compiler abs: /usr/bin/gfortran
>> Fortran90 compiler: gfortran
>> Fortran90 compiler abs: /usr/bin/gfortran
>> C profiling: yes
>> C++ profiling: yes
>> Fortran77 profiling: yes
>> Fortran90 profiling: yes
>> C++ exceptions: no
>> Thread support: posix (mpi: no, progress: no)
>> Internal debug support: no
>> MPI parameter check: runtime
>> Memory profiling support: no
>> Memory debugging support: no
>> libltdl support: yes
>> Heterogeneous support: yes
>> mpirun default --prefix: no
>> MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.3)
>> MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.3)
>> MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.3)
>> MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.3)
>> MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.2.3)
>> MCA timer: linux (MCA v1.0, API v1.0, Component v1.2.3)
>> MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.3)
>> MCA installdirs: config (MCA v1.0, API v1.0, Component v1.2.3)
>> MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
>> MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
>> MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.3)
>> MCA coll: self (MCA v1.0, API v1.0, Component v1.2.3)
>> MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.3)
>> MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.3)
>> MCA io: romio (MCA v1.0, API v1.0, Component v1.2.3)
>> MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.3)
>> MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.3)
>> MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.3)
>> MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.3)
>> MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.3)
>> MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.3)
>> MCA btl: openib (MCA v1.0, API v1.0.1, Component v1.2.3)
>> MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.3)
>> MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.3)
>> MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0)
>> MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.3)
>> MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.3)
>> MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.3)
>> MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.3)
>> MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.3)
>> MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.3)
>> MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.3)
>> MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2.3)
>> MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2.3)
>> MCA iof: svc (MCA v1.0, API v1.0, Component v1.2.3)
>> MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2.3)
>> MCA ns: replica (MCA v1.0, API v2.0, Component v1.2.3)
>> MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
>> MCA ras: dash_host (MCA v1.0, API v1.3, Component v1.2.3)
>> MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2.3)
>> MCA ras: localhost (MCA v1.0, API v1.3, Component v1.2.3)
>> MCA ras: slurm (MCA v1.0, API v1.3, Component v1.2.3)
>> MCA ras: tm (MCA v1.0, API v1.3, Component v1.2.3)
>> MCA rds: hostfile (MCA v1.0, API v1.3, Component v1.2.3)
>> MCA rds: proxy (MCA v1.0, API v1.3, Component v1.2.3)
>> MCA rds: resfile (MCA v1.0, API v1.3, Component v1.2.3)
>> MCA rmaps: round_robin (MCA v1.0, API v1.3, Component v1.2.3)
>> MCA rmgr: proxy (MCA v1.0, API v2.0, Component v1.2.3)
>> MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.2.3)
>> MCA rml: oob (MCA v1.0, API v1.0, Component v1.2.3)
>> MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2.3)
>> MCA pls: proxy (MCA v1.0, API v1.3, Component v1.2.3)
>> MCA pls: rsh (MCA v1.0, API v1.3, Component v1.2.3)
>> MCA pls: slurm (MCA v1.0, API v1.3, Component v1.2.3)
>> MCA pls: tm (MCA v1.0, API v1.3, Component v1.2.3)
>> MCA sds: env (MCA v1.0, API v1.0, Component v1.2.3)
>> MCA sds: pipe (MCA v1.0, API v1.0, Component v1.2.3)
>> MCA sds: seed (MCA v1.0, API v1.0, Component v1.2.3)
>> MCA sds: singleton (MCA v1.0, API v1.0, Component v1.2.3)
>> MCA sds: slurm (MCA v1.0, API v1.0, Component v1.2.3)
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> --
> Gleb.
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>