Hi,

I have fixed the timing issue between the server and client, and now I could build Open MPI successfully.

Here is the output of ompi_info....

[root@micrompi-2 ompi]# ompi_info

                Open MPI: 1.0a1r6760M

   Open MPI SVN revision: r6760M

                Open RTE: 1.0a1r6760M

   Open RTE SVN revision: r6760M

                    OPAL: 1.0a1r6760M

       OPAL SVN revision: r6760M

                  Prefix: /openmpi

 Configured architecture: x86_64-redhat-linux-gnu

           Configured by: root

           Configured on: Mon Aug  8 23:58:08 IST 2005

          Configure host: micrompi-2

                Built by: root

                Built on: Tue Aug  9 00:09:10 IST 2005

              Built host: micrompi-2

              C bindings: yes

            C++ bindings: yes

      Fortran77 bindings: yes (all)

      Fortran90 bindings: no

              C compiler: gcc

     C compiler absolute: /usr/bin/gcc

            C++ compiler: g++

   C++ compiler absolute: /usr/bin/g++

      Fortran77 compiler: g77

  Fortran77 compiler abs: /usr/bin/g77

      Fortran90 compiler: none

  Fortran90 compiler abs: none

             C profiling: yes

           C++ profiling: yes

     Fortran77 profiling: yes

     Fortran90 profiling: no

          C++ exceptions: no

          Thread support: posix (mpi: no, progress: no)

  Internal debug support: yes

     MPI parameter check: runtime

Memory profiling support: yes

Memory debugging support: yes

         libltdl support: 1

           MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)

           MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)

                MCA coll: basic (MCA v1.0, API v1.0, Component v1.0)

                MCA coll: self (MCA v1.0, API v1.0, Component v1.0)

                  MCA io: romio (MCA v1.0, API v1.0, Component v1.0)

               MCA mpool: mvapi (MCA v1.0, API v1.0, Component v1.0)

               MCA mpool: sm (MCA v1.0, API v1.0, Component v1.0)

                 MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.0)

                 MCA pml: teg (MCA v1.0, API v1.0, Component v1.0)

                 MCA pml: uniq (MCA v1.0, API v1.0, Component v1.0)

                 MCA ptl: self (MCA v1.0, API v1.0, Component v1.0)

                 MCA ptl: sm (MCA v1.0, API v1.0, Component v1.0)

                 MCA ptl: tcp (MCA v1.0, API v1.0, Component v1.0)

                 MCA btl: mvapi (MCA v1.0, API v1.0, Component v1.0)

                 MCA btl: self (MCA v1.0, API v1.0, Component v1.0)

                 MCA btl: sm (MCA v1.0, API v1.0, Component v1.0)

                 MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)

                MCA topo: unity (MCA v1.0, API v1.0, Component v1.0)

                 MCA gpr: null (MCA v1.0, API v1.0, Component v1.0)

                 MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.0)

                 MCA gpr: replica (MCA v1.0, API v1.0, Component v1.0)

                 MCA iof: proxy (MCA v1.0, API v1.0, Component v1.0)

                 MCA iof: svc (MCA v1.0, API v1.0, Component v1.0)

                  MCA ns: proxy (MCA v1.0, API v1.0, Component v1.0)

                  MCA ns: replica (MCA v1.0, API v1.0, Component v1.0)

                 MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)

                 MCA ras: host (MCA v1.0, API v1.0, Component v1.0)

                 MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.0)

                 MCA rds: resfile (MCA v1.0, API v1.0, Component v1.0)

               MCA rmaps: round_robin (MCA v1.0, API v1.0, Component v1.0)

                MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.0)

                MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.0)

                 MCA rml: oob (MCA v1.0, API v1.0, Component v1.0)

                 MCA pls: fork (MCA v1.0, API v1.0, Component v1.0)

                 MCA pls: proxy (MCA v1.0, API v1.0, Component v1.0)

                 MCA pls: rsh (MCA v1.0, API v1.0, Component v1.0)

                 MCA sds: env (MCA v1.0, API v1.0, Component v1.0)

                 MCA sds: pipe (MCA v1.0, API v1.0, Component v1.0)

                 MCA sds: seed (MCA v1.0, API v1.0, Component v1.0)

                 MCA sds: singleton (MCA v1.0, API v1.0, Component v1.0)


This time, I could see that btl mvapi component is built.

But I am still seeing the same problem while running Pallas Benchmark i.e., I still see that the data is passing over TCP/GigE and NOT over Infiniband.

I have disabled building OpenIB and to do so I have touched .ompi_ignore. This should not be a problem for MVAPI. I have run autogen.sh, configure and make all. The output of autogen.sh, configure and make all commands are <<ompi_out.tar.gz>> gzip'ed in ompi_out.tar.gz file which is attached in this mail. This gzip file also contains the output of Pallas Benchmark results. At the end of Pallas Benchmark output, you can find the error

Request for 0 bytes (coll_basic_reduce.c, 193)

Request for 0 bytes (coll_basic_reduce.c, 193)

Request for 0 bytes (coll_basic_reduce.c, 193)

Request for 0 bytes (coll_basic_reduce.c, 193)

Request for 0 bytes (coll_basic_reduce.c, 193)

Request for 0 bytes (coll_basic_reduce_scatter.c, 79)

Request for 0 bytes (coll_basic_reduce.c, 193)

Request for 0 bytes (coll_basic_reduce_scatter.c, 79)

Request for 0 bytes (coll_basic_reduce.c, 193)

..and Pallas just hung.

I have no clue about the above errors which are coming from Open MPI source code.

The configure options that I have used is

./configure --prefix=/openmpi --with-btl-mvapi=/usr/local/topspin/

and exported

export CFLAGS="-I/usr/local/topspin/include -I /usr/local/topspin/include/vapi"

export LDFLAGS="-lmosal -lvapi -L/usr/local/topspin/lib64"

export btl_mvapi_LIBS="-lvapi -lmosal -L/usr/local/topspin/lib64"

export btl_mvapi_LDFLAGS=$btl_mvapi_LIBS

export btl_mvapi_CFLAGS=$CFLAGS

export LD_LIBRARY_PATH=/usr/local/topspin/lib64

export PATH=/openmpi/bin:$PATH

We are using Mellanox infiniband stack. We call it as MVAPICH 092 code which is MPI stack over VAPI i.e, inifiniband.

Vapi.h is located in /usr/local/topspin/include/vapi and this path is mentioned in CFLAGS.

Libmosal and libvapi are located in /usr/local/topspin/lib64 directory.

Info about machine:

model name      :                   Intel(R) Xeon(TM) CPU 3.20GHz

Linux micrompi-2 2.6.9-5.ELsmp #1 SMP Wed Jan 5 19:29:47 EST 2005 x86_64 x86_64 x86_64 GNU/Linux

[root@micrompi-2 vapi]# cat /etc/redhat-release

Red Hat Enterprise Linux AS release 4 (Nahant)


Is there any thing that I am missing while building btl mvapi? Also, is anyone built for mvapi and tested this OMPI stack. Please let me know.

Thanks

-Sridhar




-----Original Message-----
From: devel-bounces@open-mpi.org [mailto:devel-bounces@open-mpi.org] On Behalf Of Jeff Squyres
Sent: Monday, August 08, 2005 8:21 PM
To: Open MPI Developers
Subject: Re: [O-MPI devel] Fwd: Regarding MVAPI Component in Open MPI

It looks like you are having timestamp issues, e.g.:

> make: Warning: File `Makefile.am' has modification time 3.6e+04 s in

> the future

We typically see this in environments where NFS clients are not time

synchronized properly with the NFS server (e.g., using ntp either to

the NFS server directly, or to a common parent ntp server, or something

similar).

Automake-derived build systems are *extremely* sensitive to filesystem

timestamps because they are driven off Makefile dependencies.  So if

you are working on a networked filesystem and do not have your time

tightly synchronized between the client and server, these kinds of

errors will occur.

Two fixes for this are:

1. Fix the time issues between network filesystem client and server

2. Build on a non-networked filesystem


On Aug 8, 2005, at 6:19 AM, Sridhar Chirravuri wrote:

>

> Hi,

>

> I was trying to build the latest code but as I mentioned in one of my

> previous mails, build is getting into a loop.

>

> [root@micrompi-1 ompi]# make all | tee mymake.log

>

> make: Warning: File `Makefile.am' has modification time 3.6e+04 s in

> the future

>

> cd . && /bin/sh /ompi/config/missing --run aclocal-1.9

>

> /usr/share/aclocal/libgcrypt.m4:23: warning: underquoted definition of

> AM_PATH_LIBGCRYPT

>

>   run info '(automake)Extending aclocal'

>

>   or see

> http://sources.redhat.com/automake/automake.html#Extending-aclocal

>

> /usr/share/aclocal/ao.m4:9: warning: underquoted definition of

> XIPH_PATH_AO

>

>  cd . && /bin/sh /ompi/config/missing --run automake-1.9 --foreign

>

> cd . && /bin/sh /ompi/config/missing --run autoconf

>

> /bin/sh ./config.status --recheck

>

>  /bin/sh ./config.status

>

> Making all in config

>

> make[1]: make[1]: Entering directory `/ompi/config'

>

> Warning: File `Makefile.am' has modification time 3.6e+04 s in the

> future

>

> cd .. && make  am--refresh

>

> make[2]: Entering directory `/ompi'

>

> make[2]: Warning: File `Makefile.am' has modification time 3.6e+04 s

> in the future

>

> cd . && /bin/sh /ompi/config/missing --run aclocal-1.9

>

> /usr/share/aclocal/libgcrypt.m4:23: warning: underquoted definition of

> AM_PATH_LIBGCRYPT

>

>   run info '(automake)Extending aclocal'

>

>   or see

> http://sources.redhat.com/automake/automake.html#Extending-aclocal

>

> /usr/share/aclocal/ao.m4:9: warning: underquoted definition of

> XIPH_PATH_AO

>

>  cd . && /bin/sh /ompi/config/missing --run automake-1.9 --foreign

>

> cd . && /bin/sh /ompi/config/missing --run autoconf

>

> /bin/sh ./config.status --recheck

>

>  /bin/sh ./config.status

>

> make[2]: warning:  Clock skew detected.  Your build may be incomplete.

>

> make[2]: Leaving directory `/ompi'

>

> make[2]: Entering directory `/ompi'

>

> make[2]: Warning: File `Makefile.am' has modification time 3.6e+04 s

> in the future

>

> cd . && /bin/sh /ompi/config/missing --run aclocal-1.9

>

> /usr/share/aclocal/libgcrypt.m4:23: warning: underquoted definition of

> AM_PATH_LIBGCRYPT

>

>   run info '(automake)Extending aclocal'

>

>   or see

> http://sources.redhat.com/automake/automake.html#Extending-aclocal

>

> /usr/share/aclocal/ao.m4:9: warning: underquoted definition of

> XIPH_PATH_AO

>

>  cd . && /bin/sh /ompi/config/missing --run automake-1.9 --foreign

>

> cd . && /bin/sh /ompi/config/missing --run autoconf

>

> /bin/sh ./config.status --recheck

>

>  /bin/sh ./config.status

>

> make[2]: warning:  Clock skew detected.  Your build may be incomplete.

>

> make[2]: Leaving directory `/ompi'

>

> cd .. && make  am--refresh

>

> make[2]: make[2]: Entering directory `/ompi'

>

> Warning: File `Makefile.am' has modification time 3.6e+04 s in the

> future

>

> cd . && /bin/sh /ompi/config/missing --run aclocal-1.9

>

> /usr/share/aclocal/libgcrypt.m4:23: warning: underquoted definition of

> AM_PATH_LIBGCRYPT

>

>   run info '(automake)Extending aclocal'

>

>   or see

> http://sources.redhat.com/automake/automake.html#Extending-aclocal

>

> /usr/share/aclocal/ao.m4:9: warning: underquoted definition of

> XIPH_PATH_AO

>

>  cd . && /bin/sh /ompi/config/missing --run automake-1.9 --foreign

>

> make[2]: *** [Makefile.in] Interrupt

>

> make[1]: *** [../configure] Interrupt

>

> make: *** [all-recursive] Interrupt

>

>

> The config.status –recheck is being issued from Makefile. I have moved

> config.status to config.status.old and did touch config.status but

> still “make all” is going in loop.

>

> Is anyone tried building the latest code drop of OpenMPI? Or Is anyone

> has seen this type of behavior?

>

> Please let me know.

>

> Thanks

>

> -Sridhar

--

{+} Jeff Squyres

{+} The Open MPI Project

{+} http://www.open-mpi.org/


_______________________________________________

devel mailing list

devel@open-mpi.org

http://www.open-mpi.org/mailman/listinfo.cgi/devel