Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Sridhar Chirravuri (sridhar_at_[hidden])
Date: 2005-08-09 06:24:07


Hi,

I have fixed the timing issue between the server and client, and now I
could build Open MPI successfully.

Here is the output of ompi_info....

[root_at_micrompi-2 ompi]# ompi_info
                Open MPI: 1.0a1r6760M
   Open MPI SVN revision: r6760M
                Open RTE: 1.0a1r6760M
   Open RTE SVN revision: r6760M
                    OPAL: 1.0a1r6760M
       OPAL SVN revision: r6760M
                  Prefix: /openmpi
 Configured architecture: x86_64-redhat-linux-gnu
           Configured by: root
           Configured on: Mon Aug 8 23:58:08 IST 2005
          Configure host: micrompi-2
                Built by: root
                Built on: Tue Aug 9 00:09:10 IST 2005
              Built host: micrompi-2
              C bindings: yes
            C++ bindings: yes
      Fortran77 bindings: yes (all)
      Fortran90 bindings: no
              C compiler: gcc
     C compiler absolute: /usr/bin/gcc
            C++ compiler: g++
   C++ compiler absolute: /usr/bin/g++
      Fortran77 compiler: g77
  Fortran77 compiler abs: /usr/bin/g77
      Fortran90 compiler: none
  Fortran90 compiler abs: none
             C profiling: yes
           C++ profiling: yes
     Fortran77 profiling: yes
     Fortran90 profiling: no
          C++ exceptions: no
          Thread support: posix (mpi: no, progress: no)
  Internal debug support: yes
     MPI parameter check: runtime
Memory profiling support: yes
Memory debugging support: yes
         libltdl support: 1
           MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
           MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
                MCA coll: basic (MCA v1.0, API v1.0, Component v1.0)
                MCA coll: self (MCA v1.0, API v1.0, Component v1.0)
                  MCA io: romio (MCA v1.0, API v1.0, Component v1.0)
               MCA mpool: mvapi (MCA v1.0, API v1.0, Component v1.0)
               MCA mpool: sm (MCA v1.0, API v1.0, Component v1.0)
                 MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.0)
                 MCA pml: teg (MCA v1.0, API v1.0, Component v1.0)
                 MCA pml: uniq (MCA v1.0, API v1.0, Component v1.0)
                 MCA ptl: self (MCA v1.0, API v1.0, Component v1.0)
                 MCA ptl: sm (MCA v1.0, API v1.0, Component v1.0)
                 MCA ptl: tcp (MCA v1.0, API v1.0, Component v1.0)
                 MCA btl: mvapi (MCA v1.0, API v1.0, Component v1.0)
                 MCA btl: self (MCA v1.0, API v1.0, Component v1.0)
                 MCA btl: sm (MCA v1.0, API v1.0, Component v1.0)
                 MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)
                MCA topo: unity (MCA v1.0, API v1.0, Component v1.0)
                 MCA gpr: null (MCA v1.0, API v1.0, Component v1.0)
                 MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.0)
                 MCA gpr: replica (MCA v1.0, API v1.0, Component v1.0)
                 MCA iof: proxy (MCA v1.0, API v1.0, Component v1.0)
                 MCA iof: svc (MCA v1.0, API v1.0, Component v1.0)
                  MCA ns: proxy (MCA v1.0, API v1.0, Component v1.0)
                  MCA ns: replica (MCA v1.0, API v1.0, Component v1.0)
                 MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
                 MCA ras: host (MCA v1.0, API v1.0, Component v1.0)
                 MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.0)
                 MCA rds: resfile (MCA v1.0, API v1.0, Component v1.0)
               MCA rmaps: round_robin (MCA v1.0, API v1.0, Component
v1.0)
                MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.0)
                MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.0)
                 MCA rml: oob (MCA v1.0, API v1.0, Component v1.0)
                 MCA pls: fork (MCA v1.0, API v1.0, Component v1.0)
                 MCA pls: proxy (MCA v1.0, API v1.0, Component v1.0)
                 MCA pls: rsh (MCA v1.0, API v1.0, Component v1.0)
                 MCA sds: env (MCA v1.0, API v1.0, Component v1.0)
                 MCA sds: pipe (MCA v1.0, API v1.0, Component v1.0)
                 MCA sds: seed (MCA v1.0, API v1.0, Component v1.0)
                 MCA sds: singleton (MCA v1.0, API v1.0, Component v1.0)

This time, I could see that btl mvapi component is built.

But I am still seeing the same problem while running Pallas Benchmark
i.e., I still see that the data is passing over TCP/GigE and NOT over
Infiniband.

I have disabled building OpenIB and to do so I have touched
.ompi_ignore. This should not be a problem for MVAPI. I have run
autogen.sh, configure and make all. The output of autogen.sh, configure
and make all commands are <<ompi_out.tar.gz>> gzip'ed in
ompi_out.tar.gz file which is attached in this mail. This gzip file also
contains the output of Pallas Benchmark results. At the end of Pallas
Benchmark output, you can find the error

Request for 0 bytes (coll_basic_reduce.c, 193)
Request for 0 bytes (coll_basic_reduce.c, 193)
Request for 0 bytes (coll_basic_reduce.c, 193)
Request for 0 bytes (coll_basic_reduce.c, 193)
Request for 0 bytes (coll_basic_reduce.c, 193)
Request for 0 bytes (coll_basic_reduce_scatter.c, 79)
Request for 0 bytes (coll_basic_reduce.c, 193)
Request for 0 bytes (coll_basic_reduce_scatter.c, 79)
Request for 0 bytes (coll_basic_reduce.c, 193)

..and Pallas just hung.

I have no clue about the above errors which are coming from Open MPI
source code.

The configure options that I have used is

./configure --prefix=/openmpi --with-btl-mvapi=/usr/local/topspin/

and exported

export CFLAGS="-I/usr/local/topspin/include -I
/usr/local/topspin/include/vapi"
export LDFLAGS="-lmosal -lvapi -L/usr/local/topspin/lib64"
export btl_mvapi_LIBS="-lvapi -lmosal -L/usr/local/topspin/lib64"
export btl_mvapi_LDFLAGS=$btl_mvapi_LIBS
export btl_mvapi_CFLAGS=$CFLAGS
export LD_LIBRARY_PATH=/usr/local/topspin/lib64
export PATH=/openmpi/bin:$PATH

We are using Mellanox infiniband stack. We call it as MVAPICH 092 code
which is MPI stack over VAPI i.e, inifiniband.

Vapi.h is located in /usr/local/topspin/include/vapi and this path is
mentioned in CFLAGS.

Libmosal and libvapi are located in /usr/local/topspin/lib64 directory.

Info about machine:

model name : Intel(R) Xeon(TM) CPU 3.20GHz

Linux micrompi-2 2.6.9-5.ELsmp #1 SMP Wed Jan 5 19:29:47 EST 2005 x86_64
x86_64 x86_64 GNU/Linux

[root_at_micrompi-2 vapi]# cat /etc/redhat-release
Red Hat Enterprise Linux AS release 4 (Nahant)

Is there any thing that I am missing while building btl mvapi? Also, is
anyone built for mvapi and tested this OMPI stack. Please let me know.

Thanks
-Sridhar

-----Original Message-----
From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]] On
Behalf Of Jeff Squyres
Sent: Monday, August 08, 2005 8:21 PM
To: Open MPI Developers
Subject: Re: [O-MPI devel] Fwd: Regarding MVAPI Component in Open MPI

It looks like you are having timestamp issues, e.g.:

> make: Warning: File `Makefile.am' has modification time 3.6e+04 s in
> the future

We typically see this in environments where NFS clients are not time
synchronized properly with the NFS server (e.g., using ntp either to
the NFS server directly, or to a common parent ntp server, or something
similar).

Automake-derived build systems are *extremely* sensitive to filesystem
timestamps because they are driven off Makefile dependencies. So if
you are working on a networked filesystem and do not have your time
tightly synchronized between the client and server, these kinds of
errors will occur.

Two fixes for this are:

1. Fix the time issues between network filesystem client and server
2. Build on a non-networked filesystem

On Aug 8, 2005, at 6:19 AM, Sridhar Chirravuri wrote:

>
> Hi,
>
> I was trying to build the latest code but as I mentioned in one of my
> previous mails, build is getting into a loop.
>
> [root_at_micrompi-1 ompi]# make all | tee mymake.log
>
> make: Warning: File `Makefile.am' has modification time 3.6e+04 s in
> the future
>
> cd . && /bin/sh /ompi/config/missing --run aclocal-1.9
>
> /usr/share/aclocal/libgcrypt.m4:23: warning: underquoted definition of

> AM_PATH_LIBGCRYPT
>
> run info '(automake)Extending aclocal'
>
> or see
> http://sources.redhat.com/automake/automake.html#Extending-aclocal
>
> /usr/share/aclocal/ao.m4:9: warning: underquoted definition of
> XIPH_PATH_AO
>
> cd . && /bin/sh /ompi/config/missing --run automake-1.9 --foreign
>
> cd . && /bin/sh /ompi/config/missing --run autoconf
>
> /bin/sh ./config.status --recheck
>
> /bin/sh ./config.status
>
> Making all in config
>
> make[1]: make[1]: Entering directory `/ompi/config'
>
> Warning: File `Makefile.am' has modification time 3.6e+04 s in the
> future
>
> cd .. && make am--refresh
>
> make[2]: Entering directory `/ompi'
>
> make[2]: Warning: File `Makefile.am' has modification time 3.6e+04 s
> in the future
>
> cd . && /bin/sh /ompi/config/missing --run aclocal-1.9
>
> /usr/share/aclocal/libgcrypt.m4:23: warning: underquoted definition of

> AM_PATH_LIBGCRYPT
>
> run info '(automake)Extending aclocal'
>
> or see
> http://sources.redhat.com/automake/automake.html#Extending-aclocal
>
> /usr/share/aclocal/ao.m4:9: warning: underquoted definition of
> XIPH_PATH_AO
>
> cd . && /bin/sh /ompi/config/missing --run automake-1.9 --foreign
>
> cd . && /bin/sh /ompi/config/missing --run autoconf
>
> /bin/sh ./config.status --recheck
>
> /bin/sh ./config.status
>
> make[2]: warning: Clock skew detected. Your build may be incomplete.
>
> make[2]: Leaving directory `/ompi'
>
> make[2]: Entering directory `/ompi'
>
> make[2]: Warning: File `Makefile.am' has modification time 3.6e+04 s
> in the future
>
> cd . && /bin/sh /ompi/config/missing --run aclocal-1.9
>
> /usr/share/aclocal/libgcrypt.m4:23: warning: underquoted definition of

> AM_PATH_LIBGCRYPT
>
> run info '(automake)Extending aclocal'
>
> or see
> http://sources.redhat.com/automake/automake.html#Extending-aclocal
>
> /usr/share/aclocal/ao.m4:9: warning: underquoted definition of
> XIPH_PATH_AO
>
> cd . && /bin/sh /ompi/config/missing --run automake-1.9 --foreign
>
> cd . && /bin/sh /ompi/config/missing --run autoconf
>
> /bin/sh ./config.status --recheck
>
> /bin/sh ./config.status
>
> make[2]: warning: Clock skew detected. Your build may be incomplete.
>
> make[2]: Leaving directory `/ompi'
>
> cd .. && make am--refresh
>
> make[2]: make[2]: Entering directory `/ompi'
>
> Warning: File `Makefile.am' has modification time 3.6e+04 s in the
> future
>
> cd . && /bin/sh /ompi/config/missing --run aclocal-1.9
>
> /usr/share/aclocal/libgcrypt.m4:23: warning: underquoted definition of

> AM_PATH_LIBGCRYPT
>
> run info '(automake)Extending aclocal'
>
> or see
> http://sources.redhat.com/automake/automake.html#Extending-aclocal
>
> /usr/share/aclocal/ao.m4:9: warning: underquoted definition of
> XIPH_PATH_AO
>
> cd . && /bin/sh /ompi/config/missing --run automake-1.9 --foreign
>
> make[2]: *** [Makefile.in] Interrupt
>
> make[1]: *** [../configure] Interrupt
>
> make: *** [all-recursive] Interrupt
>
>
> The config.status -recheck is being issued from Makefile. I have moved

> config.status to config.status.old and did touch config.status but
> still "make all" is going in loop.
>
> Is anyone tried building the latest code drop of OpenMPI? Or Is anyone

> has seen this type of behavior?
>
> Please let me know.
>
> Thanks
>
> -Sridhar

-- 
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/
_______________________________________________
devel mailing list
devel_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/devel