Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Beowulf cluster and openmpi
From: Daniel Gruner (dgruner_at_[hidden])
Date: 2008-11-05 14:08:52


Can your nodes see the openmpi libraries and executables? I have the
/usr/local and /opt from the master node mounted on the compute nodes,
in addition to having the LD_LIBRARY_PATH defined correctly. In your
case the nodes must be able to see /home/rchaud/openmpi-1.2.6 in order
to get the libraries and executables, so this directory must be mounted
on the nodes. You don't want to copy all this stuff to the nodes in a
bproc environment, since it would eat away at your ram.

Daniel

On Wed, Nov 05, 2008 at 12:44:03PM -0600, Rima Chaudhuri wrote:
> Thanks for all your help Ralph and Sean!!
> I changed the machinefile to just containing the node numbers. I added
> the env variable NODES in my .bash_profile and .bashrc.
> As per Sean's suggestion I added the $LD_LIBRARY_PATH (shared lib path
> which the openmpi lib directory path) and the $AMBERHOME/lib as 2 of
> the libraries' path in the config file of beowulf. I also checked by
> bpsh from one of the compute nodes whether it can see the executables
> which is in $AMBERHOME/exe and the mpirun(OMPI):
> I get the following error message:
>
> [rchaud_at_helios amber10]$ ./step1
> --------------------------------------------------------------------------
> A daemon (pid 25319) launched by the bproc PLS component on node 2 died
> unexpectedly on signal 13 so we are aborting.
>
> This may be because the daemon was unable to find all the needed shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> --------------------------------------------------------------------------
> [helios.structure.uic.edu:25317] [0,0,0] ORTE_ERROR_LOG: Error in file
> pls_bproc.c at line 717
> [helios.structure.uic.edu:25317] [0,0,0] ORTE_ERROR_LOG: Error in file
> pls_bproc.c at line 1164
> [helios.structure.uic.edu:25317] [0,0,0] ORTE_ERROR_LOG: Error in file
> rmgr_urm.c at line 462
> [helios.structure.uic.edu:25317] mpirun: spawn failed with errno=-1
>
>
> I tested to see if the compute nodes could see the master by the
> following commands:
>
> [rchaud_at_helios amber10]$ bpsh 2 echo $LD_LIBRARY_PATH
> /home/rchaud/openmpi-1.2.6/openmpi-1.2.6_ifort/lib
> [rchaud_at_helios amber10]$ bpsh 2 echo $AMBERHOME
> /home/rchaud/Amber10_openmpi/amber10
> [rchaud_at_helios amber10]$ bpsh 2 ls -al
> total 11064
> drwxr-xr-x 11 rchaud 0 4096 Nov 5 11:33 .
> drwxr-xr-x 3 rchaud 100 4096 Oct 20 17:21 ..
> -rw-r--r-- 1 128 53 1201 Jul 10 17:08 Changelog_at
> -rw-rw-r-- 1 128 53 25975 Feb 28 2008
> GNU_Lesser_Public_License
> -rw-rw---- 1 128 53 3232 Mar 30 2008 INSTALL
> -rw-rw-r-- 1 128 53 20072 Feb 11 2008 LICENSE_at
> -rw-r--r-- 1 0 0 1814241 Oct 31 13:32 PLP_617_xtal_nolig.crd
> -rw-r--r-- 1 0 0 8722770 Oct 31 13:31 PLP_617_xtal_nolig.top
> -rw-rw-r-- 1 128 53 1104 Mar 18 2008 README
> -rw-r--r-- 1 128 53 1783 Jun 23 19:43 README_at
> drwxrwxr-x 10 128 53 4096 Oct 20 17:23 benchmarks
> drwxr-xr-x 2 0 0 4096 Oct 20 18:21 bin
> -rw-r--r-- 1 0 0 642491 Oct 20 17:51 bugfix.all
> drwxr-xr-x 13 0 0 4096 Oct 20 17:37 dat
> drwxr-xr-x 3 0 0 4096 Oct 20 17:23 doc
> drwxrwxr-x 9 128 53 4096 Oct 20 17:23 examples
> lrwxrwxrwx 1 0 0 3 Oct 20 17:34 exe -> bin
> drwxr-xr-x 2 0 0 4096 Oct 20 17:35 include
> drwxr-xr-x 2 0 0 4096 Oct 20 17:36 lib
> -rw-r--r-- 1 rchaud 100 30 Nov 5 11:33 machinefile
> -rw-r--r-- 1 rchaud 100 161 Nov 5 12:11 min
> drwxrwxr-x 40 128 53 4096 Oct 20 17:50 src
> -rwxr-xr-x 1 rchaud 100 376 Nov 3 16:41 step1
> drwxrwxr-x 114 128 53 4096 Oct 20 17:23 test
>
> [rchaud_at_helios amber10]$ bpsh 2 which mpirun
> /home/rchaud/openmpi-1.2.6/openmpi-1.2.6_ifort/bin/mpirun
>
> The $LD_LIBRARY_PATH seems to be defined correctly, but then why is it
> not being read?
>
> thanks
>
> On Wed, Nov 5, 2008 at 11:08 AM, <users-request_at_[hidden]> wrote:
> > Send users mailing list submissions to
> > users_at_[hidden]
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > or, via email, send a message with subject or body 'help' to
> > users-request_at_[hidden]
> >
> > You can reach the person managing the list at
> > users-owner_at_[hidden]
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of users digest..."
> >
> >
> > Today's Topics:
> >
> > 1. Re: Beowulf cluster and openmpi (Kelley, Sean)
> >
> >
> > ----------------------------------------------------------------------
> >
> > Message: 1
> > Date: Wed, 5 Nov 2008 12:08:13 -0500
> > From: "Kelley, Sean" <Sean.Kelley_at_[hidden]>
> > Subject: Re: [OMPI users] Beowulf cluster and openmpi
> > To: "Open MPI Users" <users_at_[hidden]>
> > Message-ID:
> > <A804E989DCC5234FBA6C4E62B939978F2EB3D5_at_ava-es5.solers.local>
> > Content-Type: text/plain; charset="us-ascii"
> >
> > I would suggest making sure that the /etc/beowulf/config file has a
> > "libraries" line for every directory where required shared libraries
> > (application and mpi) are located.
> >
> > Also, make sure that the filesystems containing the executables and
> > shared libraries are accessible from the compute nodes.
> >
> > Sean
> >
> > -----Original Message-----
> > From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On
> > Behalf Of Rima Chaudhuri
> > Sent: Monday, November 03, 2008 5:50 PM
> > To: users_at_[hidden]
> > Subject: Re: [OMPI users] Beowulf cluster and openmpi
> >
> > I added the option for -hostfile machinefile where the machinefile is a
> > file with the IP of the nodes:
> > #host names
> > 192.168.0.100 slots=2
> > 192.168.0.101 slots=2
> > 192.168.0.102 slots=2
> > 192.168.0.103 slots=2
> > 192.168.0.104 slots=2
> > 192.168.0.105 slots=2
> > 192.168.0.106 slots=2
> > 192.168.0.107 slots=2
> > 192.168.0.108 slots=2
> > 192.168.0.109 slots=2
> >
> >
> > [rchaud_at_helios amber10]$ ./step1
> > ------------------------------------------------------------------------
> > --
> > A daemon (pid 29837) launched by the bproc PLS component on node 192
> > died unexpectedly so we are aborting.
> >
> > This may be because the daemon was unable to find all the needed shared
> > libraries on the remote node. You may set your LD_LIBRARY_PATH to have
> > the location of the shared libraries on the remote nodes and this will
> > automatically be forwarded to the remote nodes.
> > ------------------------------------------------------------------------
> > --
> > [helios.structure.uic.edu:29836] [0,0,0] ORTE_ERROR_LOG: Error in file
> > pls_bproc.c at line 717 [helios.structure.uic.edu:29836] [0,0,0]
> > ORTE_ERROR_LOG: Error in file pls_bproc.c at line 1164
> > [helios.structure.uic.edu:29836] [0,0,0] ORTE_ERROR_LOG: Error in file
> > rmgr_urm.c at line 462 [helios.structure.uic.edu:29836] mpirun: spawn
> > failed with errno=-1
> >
> > I used bpsh to see if the master and one of the nodes n8 could see the
> > $LD_LIBRARY_PATH, and it does..
> >
> > [rchaud_at_helios amber10]$ echo $LD_LIBRARY_PATH
> > /home/rchaud/openmpi-1.2.6/openmpi-1.2.6_ifort/lib
> >
> > [rchaud_at_helios amber10]$ bpsh n8 echo $LD_LIBRARY_PATH
> > /home/rchaud/openmpi-1.2.6/openmpi-1.2.6_ifort/lib
> >
> > thanks!
> >
> >
> > On Mon, Nov 3, 2008 at 3:14 PM, <users-request_at_[hidden]> wrote:
> >> Send users mailing list submissions to
> >> users_at_[hidden]
> >>
> >> To subscribe or unsubscribe via the World Wide Web, visit
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> or, via email, send a message with subject or body 'help' to
> >> users-request_at_[hidden]
> >>
> >> You can reach the person managing the list at
> >> users-owner_at_[hidden]
> >>
> >> When replying, please edit your Subject line so it is more specific
> >> than "Re: Contents of users digest..."
> >>
> >>
> >> Today's Topics:
> >>
> >> 1. Re: Problems installing in Cygwin - Problem with GCC 3.4.4
> >> (Jeff Squyres)
> >> 2. switch from mpich2 to openMPI <newbie question> (PattiMichelle)
> >> 3. Re: users Digest, Vol 1055, Issue 2 (Ralph Castain)
> >>
> >>
> >> ----------------------------------------------------------------------
> >>
> >> Message: 1
> >> Date: Mon, 3 Nov 2008 15:52:22 -0500
> >> From: Jeff Squyres <jsquyres_at_[hidden]>
> >> Subject: Re: [OMPI users] Problems installing in Cygwin - Problem with
> >> GCC 3.4.4
> >> To: "Gustavo Seabra" <gustavo.seabra_at_[hidden]>
> >> Cc: Open MPI Users <users_at_[hidden]>
> >> Message-ID: <A016B8C4-510B-4FD2-AD3B-A1B6440508F5_at_[hidden]>
> >> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
> >>
> >> On Nov 3, 2008, at 3:36 PM, Gustavo Seabra wrote:
> >>
> >>>> For your fortran issue, the Fortran 90 interface needs the Fortran
> >>>> 77 interface. So you need to supply an F77 as well (the output from
> >
> >>>> configure should indicate that the F90 interface was disabled
> >>>> because the F77 interface was disabled).
> >>>
> >>> Is that what you mean (see below)?
> >>
> >> Ah yes -- that's another reason the f90 interface could be disabled:
> >> if configure detects that the f77 and f90 compilers are not link-
> >> compatible.
> >>
> >>> I thought the g95 compiler could
> >>> deal with F77 as well as F95... If so, could I just pass F77='g95'?
> >>
> >> That would probably work (F77=g95). I don't know the g95 compiler at
> >> all, so I don't know if it also accepts Fortran-77-style codes. But
> >> if it does, then you're set. Otherwise, specify a different F77
> >> compiler that is link compatible with g95 and you should be good.
> >>>>> I looked in some places in the OpenMPI code, but I couldn't find
> >>>>> "max" being redefined anywhere, but I may be looking in the wrong
> >>>>> places. Anyways, the only way of found of compiling OpenMPI was a
> >>>>> very ugly hack: I have to go into those files and remove the
> >>>>> "std::"
> >>>>> before
> >>>>> the "max". With that, it all compiled cleanly.
> >>>>
> >>>> I'm not sure I follow -- I don't see anywhere in OMPI where we use
> >>>> std::max.
> >>>> What areas did you find that you needed to change?
> >>>
> >>> These files are part of the standard C++ headers. In my case, they
> >>> sit in:
> >>> /usr/lib/gcc/i686-pc-cygwin/3.4.4/include/c++/bits
> >>
> >> Ah, I see.
> >>
> >>> In principle, the problems that comes from those files would mean
> >>> that the OpenMPI source has some macro redefining max, but that's
> >>> what I could not find :-(
> >>
> >> Gotcha. I don't think we are defining a "max" macro anywhere in the
> >> ompi_info source or related header files. :-(
> >>
> >>>> No. We don't really maintain the "make check" stuff too well.
> >>>
> >>> Oh well... What do you use for testing the implementation?
> >>
> >>
> >> We have a whole pile of MPI tests in a private SVN repository. The
> >> repository is only private because it contains a lot of other people's
> >
> >> [public] MPI test suites and benchmarks, and we never looked into
> >> redistribution rights for their software. There's nothing really
> >> secret about it -- we just haven't bothered to look into the IP
> >> issues. :-)
> >>
> >> We use the MPI Testing Tool (MTT) for nightly regression across the
> >> community:
> >>
> >> http://www.open-mpi.org/mtt/
> >>
> >> We have weekday and weekend testing schedules. M-Th we do nightly
> >> tests; F-Mon morning, we do a long weekend schedule. This weekend,
> >> for example, we ran about 675k regression tests:
> >>
> >> http://www.open-mpi.org/mtt/index.php?do_redir=875
> >>
> >> --
> >> Jeff Squyres
> >> Cisco Systems
> >>
> >>
> >>
> >> ------------------------------
> >>
> >> Message: 2
> >> Date: Mon, 03 Nov 2008 12:59:59 -0800
> >> From: PattiMichelle <miche1_at_[hidden]>
> >> Subject: [OMPI users] switch from mpich2 to openMPI <newbie question>
> >> To: users_at_[hidden], patti.sheaffer_at_[hidden]
> >> Message-ID: <490F664F.4000000_at_[hidden]>
> >> Content-Type: text/plain; charset="iso-8859-1"
> >>
> >> I just found out I need to switch from mpich2 to openMPI for some code
> >
> >> I'm running. I noticed that it's available in an openSuSE repo (I'm
> >> using openSuSE 11.0 x86_64 on a TYAN 32-processor Opteron 8000
> >> system), but when I was using mpich2 I seemed to have better luck
> >> compiling it from code. This is the line I used:
> >>
> >> # $ F77=/path/to/g95 F90=/path/to/g95 ./configure
> >> --prefix=/some/place/mpich2-install
> >>
> >> But usually I left the "--prefix=" off and just let it install to it's
> >
> >> default... which is /usr/local/bin and that's nice because it's
> >> already in the PATH and very usable. I guess my question is whether
> >> or not the defaults and configuration syntax have stayed the same in
> >> openMPI. I also could use a "quickstart" guide for a non-programming
> >> user (e.g., I think I have to start a daemon before running
> > parallelized programs).
> >>
> >> THANKS!!!
> >> PattiM.
> >> -------------- next part -------------- HTML attachment scrubbed and
> >> removed
> >>
> >> ------------------------------
> >>
> >> Message: 3
> >> Date: Mon, 3 Nov 2008 14:14:36 -0700
> >> From: Ralph Castain <rhc_at_[hidden]>
> >> Subject: Re: [OMPI users] users Digest, Vol 1055, Issue 2
> >> To: Open MPI Users <users_at_[hidden]>
> >> Message-ID: <2FBDF4DC-B2DF-4486-A644-0F18C96E8EB2_at_[hidden]>
> >> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
> >>
> >> The problem is that you didn't specify or allocate any nodes for the
> >> job. At the least, you need to tell us what nodes to use via a
> > hostfile.
> >>
> >> Alternatively, are you using a resource manager to assign the nodes?
> >> OMPI didn't see anything from one, but it could be that we just didn't
> >
> >> see the right envar.
> >>
> >> Ralph
> >>
> >> On Nov 3, 2008, at 1:39 PM, Rima Chaudhuri wrote:
> >>
> >>> Thanks a lot Ralph!
> >>> I corrected the no_local to nolocal and now when I try to execute the
> >
> >>> script step1 (pls find it attached) [rchaud_at_helios amber10]$ ./step1
> >>> [helios.structure.uic.edu:16335] [0,0,0] ORTE_ERROR_LOG: Not
> >>> available in file ras_bjs.c at line 247
> >>> ---------------------------------------------------------------------
> >>> ----- There are no available nodes allocated to this job. This could
> >>> be because no nodes were found or all the available nodes were
> >>> already used.
> >>>
> >>> Note that since the -nolocal option was given no processes can be
> >>> launched on the local node.
> >>> ---------------------------------------------------------------------
> >>> ----- [helios.structure.uic.edu:16335] [0,0,0] ORTE_ERROR_LOG:
> >>> Temporarily out of resource in file base/rmaps_base_support_fns.c at
> >>> line 168 [helios.structure.uic.edu:16335] [0,0,0] ORTE_ERROR_LOG:
> >>> Temporarily out of resource in file rmaps_rr.c at line 402
> >>> [helios.structure.uic.edu:16335] [0,0,0] ORTE_ERROR_LOG: Temporarily
> >>> out of resource in file base/rmaps_base_map_job.c at line 210
> >>> [helios.structure.uic.edu:16335] [0,0,0] ORTE_ERROR_LOG: Temporarily
> >>> out of resource in file rmgr_urm.c at line 372
> >>> [helios.structure.uic.edu:16335] mpirun: spawn failed with errno=-3
> >>>
> >>>
> >>>
> >>> If I use the script without the --nolocal option, I get the following
> >
> >>> error:
> >>> [helios.structure.uic.edu:20708] [0,0,0] ORTE_ERROR_LOG: Not
> >>> available in file ras_bjs.c at line 247
> >>>
> >>>
> >>> thanks,
> >>>
> >>>
> >>> On Mon, Nov 3, 2008 at 2:04 PM, <users-request_at_[hidden]> wrote:
> >>>> Send users mailing list submissions to
> >>>> users_at_[hidden]
> >>>>
> >>>> To subscribe or unsubscribe via the World Wide Web, visit
> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>> or, via email, send a message with subject or body 'help' to
> >>>> users-request_at_[hidden]
> >>>>
> >>>> You can reach the person managing the list at
> >>>> users-owner_at_[hidden]
> >>>>
> >>>> When replying, please edit your Subject line so it is more specific
> >>>> than "Re: Contents of users digest..."
> >>>>
> >>>>
> >>>> Today's Topics:
> >>>>
> >>>> 1. Scyld Beowulf and openmpi (Rima Chaudhuri) 2. Re: Scyld Beowulf
> >
> >>>> and openmpi (Ralph Castain) 3. Problems installing in Cygwin -
> >>>> Problem with GCC 3.4.4
> >>>> (Gustavo Seabra)
> >>>> 4. Re: MPI + Mixed language coding(Fortran90 + C++) (Jeff Squyres)
> >>>> 5. Re: Problems installing in Cygwin - Problem with GCC 3.4.4
> >>>> (Jeff Squyres)
> >>>>
> >>>>
> >>>> --------------------------------------------------------------------
> >>>> --
> >>>>
> >>>> Message: 1
> >>>> Date: Mon, 3 Nov 2008 11:30:01 -0600
> >>>> From: "Rima Chaudhuri" <rima.chaudhuri_at_[hidden]>
> >>>> Subject: [OMPI users] Scyld Beowulf and openmpi
> >>>> To: users_at_[hidden]
> >>>> Message-ID:
> >>>> <7503b17d0811030930i13acb974kc627983a1d481192_at_[hidden]>
> >>>> Content-Type: text/plain; charset=ISO-8859-1
> >>>>
> >>>> Hello!
> >>>> I am a new user of openmpi -- I've installed openmpi 1.2.6 for our
> >>>> x86_64 linux scyld beowulf cluster inorder to make it run with
> >>>> amber10 MD simulation package.
> >>>>
> >>>> The nodes can see the home directory i.e. a bpsh to the nodes works
> >>>> fine and lists all the files in the home directory where I have both
> >
> >>>> openmpi and amber10 installed.
> >>>> However if I try to run:
> >>>>
> >>>> $MPI_HOME/bin/mpirun -no_local=1 -np 4 $AMBERHOME/exe/ sander.MPI
> >>>> ........
> >>>>
> >>>> I get the following error:
> >>>> [0,0,0] ORTE_ERROR_LOG: Not available in file ras_bjs.c at line 247
> >>>> --------------------------------------------------------------------
> >>>> ------ Failed to find the following executable:
> >>>>
> >>>> Host: helios.structure.uic.edu
> >>>> Executable: -o
> >>>>
> >>>> Cannot continue.
> >>>> --------------------------------------------------------------------
> >>>> ------ [helios.structure.uic.edu:23611] [0,0,0] ORTE_ERROR_LOG: Not
> >>>> found in file rmgr_urm.c at line 462
> >>>> [helios.structure.uic.edu:23611] mpirun: spawn failed with errno=-13
> >>>>
> >>>> any cues?
> >>>>
> >>>>
> >>>> --
> >>>> -Rima
> >>>>
> >>>>
> >>>> ------------------------------
> >>>>
> >>>> Message: 2
> >>>> Date: Mon, 3 Nov 2008 12:08:36 -0700
> >>>> From: Ralph Castain <rhc_at_[hidden]>
> >>>> Subject: Re: [OMPI users] Scyld Beowulf and openmpi
> >>>> To: Open MPI Users <users_at_[hidden]>
> >>>> Message-ID: <91044A7E-ADA5-4B94-AA11-B3C1D9843606_at_[hidden]>
> >>>> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
> >>>>
> >>>> For starters, there is no "-no_local" option to mpirun. You might
> >>>> want to look at mpirun --help, or man mpirun.
> >>>>
> >>>> I suspect the option you wanted was --nolocal. Note that --nolocal
> >>>> does not take an argument.
> >>>>
> >>>> Mpirun is confused by the incorrect option and looking for an
> >>>> incorrectly named executable.
> >>>> Ralph
> >>>>
> >>>>
> >>>> On Nov 3, 2008, at 10:30 AM, Rima Chaudhuri wrote:
> >>>>
> >>>>> Hello!
> >>>>> I am a new user of openmpi -- I've installed openmpi 1.2.6 for our
> >>>>> x86_64 linux scyld beowulf cluster inorder to make it run with
> >>>>> amber10 MD simulation package.
> >>>>>
> >>>>> The nodes can see the home directory i.e. a bpsh to the nodes works
> >
> >>>>> fine and lists all the files in the home directory where I have
> >>>>> both openmpi and amber10 installed.
> >>>>> However if I try to run:
> >>>>>
> >>>>> $MPI_HOME/bin/mpirun -no_local=1 -np 4 $AMBERHOME/exe/ sander.MPI
> >>>>> ........
> >>>>>
> >>>>> I get the following error:
> >>>>> [0,0,0] ORTE_ERROR_LOG: Not available in file ras_bjs.c at line 247
> >>>>> -------------------------------------------------------------------
> >>>>> ------- Failed to find the following executable:
> >>>>>
> >>>>> Host: helios.structure.uic.edu
> >>>>> Executable: -o
> >>>>>
> >>>>> Cannot continue.
> >>>>> -------------------------------------------------------------------
> >>>>> ------- [helios.structure.uic.edu:23611] [0,0,0] ORTE_ERROR_LOG:
> >>>>> Not found in file rmgr_urm.c at line 462
> >>>>> [helios.structure.uic.edu:23611] mpirun: spawn failed with
> >>>>> errno=-13
> >>>>>
> >>>>> any cues?
> >>>>>
> >>>>>
> >>>>> --
> >>>>> -Rima
> >>>>> _______________________________________________
> >>>>> users mailing list
> >>>>> users_at_[hidden]
> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>>
> >>>>
> >>>>
> >>>> ------------------------------
> >>>>
> >>>> Message: 3
> >>>> Date: Mon, 3 Nov 2008 14:53:55 -0500
> >>>> From: "Gustavo Seabra" <gustavo.seabra_at_[hidden]>
> >>>> Subject: [OMPI users] Problems installing in Cygwin - Problem with
> >>>> GCC
> >>>> 3.4.4
> >>>> To: "Open MPI Users" <users_at_[hidden]>
> >>>> Message-ID:
> >>>> <f79359b60811031153l5591e0f8j49a7e4d9fb02eea3_at_[hidden]>
> >>>> Content-Type: text/plain; charset=ISO-8859-1
> >>>>
> >>>> Hi everyone,
> >>>>
> >>>> Here's a "progress report"... more questions in the end :-)
> >>>>
> >>>> Finally, I was *almost* able to compile OpenMPI in Cygwin using the
> >>>> following configure command:
> >>>>
> >>>> ./configure --prefix=/home/seabra/local/openmpi-1.3b1 \
> >>>> --with-mpi-param_check=always --with-threads=posix \
> >>>> --enable-mpi-threads --disable-io-romio \
> >>>> --enable-mca-no-
> >>>> build=memory_mallopt,maffinity,paffinity \
> >>>> --enable-contrib-no-build=vt \
> >>>> FC=g95 'FFLAGS=-O0 -fno-second-underscore' CXX=g++
> >>>>
> >>>> I then had a very weird error during compilation of
> >>>> ompi/tools/ompi_info/params.cc. (See below).
> >>>>
> >>>> The lines causing the compilation errors are:
> >>>>
> >>>> vector.tcc:307: const size_type __len = __old_size +
> >>>> std::max(__old_size, __n);
> >>>> vector.tcc:384: const size_type __len = __old_size +
> >>>> std::max(__old_size, __n);
> >>>> stl_bvector.h:522: const size_type __len = size() +
> >>>> std::max(size(), __n);
> >>>> stl_bvector.h:823: const size_type __len = size() +
> >>>> std::max(size(), __n);
> >>>>
> >>>> (Notice that those are from the standard gcc libraries.)
> >>>>
> >>>> After googling it for a while, I could find that this error is
> >>>> caused because, at come point, the source code being compiled
> >>>> redefined the "max" function with a macro, g++ cannot recognize the
> >>>> "std::max" that happens in those lines and only "sees" a (...), thus
> >
> >>>> printing that cryptic complaint.
> >>>>
> >>>> I looked in some places in the OpenMPI code, but I couldn't find
> >>>> "max" being redefined anywhere, but I may be looking in the wrong
> >>>> places. Anyways, the only way of found of compiling OpenMPI was a
> >>>> very ugly hack: I have to go into those files and remove the "std::"
> >>>> before
> >>>> the "max". With that, it all compiled cleanly.
> >>>>
> >>>> I did try running the tests in the 'tests' directory (with 'make
> >>>> check'), and I didn't get any alarming message, except that in some
> >>>> cases (class, threads, peruse) it printed "All 0 tests passed". I
> >>>> got and "All (n) tests passed" (n>0) for asm and datatype.
> >>>>
> >>>> Can anybody comment on the meaning of those test results? Should I
> >>>> be alarmed with the "All 0 tests passed" messages?
> >>>>
> >>>> Finally, in the absence of big red flags (that I noticed), I went
> >>>> ahead and tried to compile my program. However, as soon as
> >>>> compilation starts, I get the following:
> >>>>
> >>>> /local/openmpi/openmpi-1.3b1/bin/mpif90 -c -O3 -fno-second-
> >>>> underscore -ffree-form -o constants.o _constants.f
> >>>> --------------------------------------------------------------------
> >>>> ------ Unfortunately, this installation of Open MPI was not compiled
> >
> >>>> with Fortran 90 support. As such, the mpif90 compiler is
> >>>> non-functional.
> >>>> --------------------------------------------------------------------
> >>>> ------
> >>>> make[1]: *** [constants.o] Error 1
> >>>> make[1]: Leaving directory `/home/seabra/local/amber11/src/sander'
> >>>> make: *** [parallel] Error 2
> >>>>
> >>>> Notice that I compiled OpenMPI with g95, so there *should* be
> >>>> Fortran95 support... Any ideas on what could be going wrong?
> >>>>
> >>>> Thank you very much,
> >>>> Gustavo.
> >>>>
> >>>> ======================================
> >>>> Error in the compilation of params.cc
> >>>> ======================================
> >>>> $ g++ -DHAVE_CONFIG_H -I. -I../../../opal/include
> >>>> -I../../../orte/include -I../../../ompi/include
> >>>> -I../../../opal/mca/paffinity/linux/plpa/src/libplpa
> >>>> -DOMPI_CONFIGURE_USER="\"seabra\"" -DOMPI_CONFIGURE_HOST="\"ACS02\""
> >>>> -DOMPI_CONFIGURE_DATE="\"Sat Nov 1 20:44:32 EDT 2008\""
> >>>> -DOMPI_BUILD_USER="\"$USER\"" -DOMPI_BUILD_HOST="\"`hostname`\""
> >>>> -DOMPI_BUILD_DATE="\"`date`\"" -DOMPI_BUILD_CFLAGS="\"-O3 -DNDEBUG
> >>>> -finline-functions -fno-strict-aliasing \""
> >>>> -DOMPI_BUILD_CPPFLAGS="\"-I../../.. -D_REENTRANT\""
> >>>> -DOMPI_BUILD_CXXFLAGS="\"-O3 -DNDEBUG -finline-functions \""
> >>>> -DOMPI_BUILD_CXXCPPFLAGS="\"-I../../.. -D_REENTRANT\""
> >>>> -DOMPI_BUILD_FFLAGS="\"-O0 -fno-second-underscore\""
> >>>> -DOMPI_BUILD_FCFLAGS="\"\"" -DOMPI_BUILD_LDFLAGS="\"-export-dynamic
> >>>> \"" -DOMPI_BUILD_LIBS="\"-lutil \""
> >>>> -DOMPI_CC_ABSOLUTE="\"/usr/bin/gcc\""
> >>>> -DOMPI_CXX_ABSOLUTE="\"/usr/bin/g++\""
> >>>> -DOMPI_F77_ABSOLUTE="\"/usr/bin/g77\""
> >>>> -DOMPI_F90_ABSOLUTE="\"/usr/local/bin/g95\""
> >>>> -DOMPI_F90_BUILD_SIZE="\"small\"" -I../../.. -D_REENTRANT -O3
> >>>> -DNDEBUG -finline-functions -MT param.o -MD -MP -MF $depbase.Tpo -c
> >
> >>>> -o param.o param.cc In file included from
> >>>> /usr/lib/gcc/i686-pc-cygwin/3.4.4/include/c++/
> >>>> vector:72,
> >>>> from ../../../ompi/tools/ompi_info/ompi_info.h:24,
> >>>> from param.cc:43:
> >>>> /usr/lib/gcc/i686-pc-cygwin/3.4.4/include/c++/bits/stl_bvector.h: In
> >
> >>>> member function `void std::vector<bool,
> >>>> _Alloc>::_M_insert_range(std::_Bit_iterator, _ForwardIterator,
> >>>> _ForwardIterator, std::forward_iterator_tag)':
> >>>>
> > /usr/lib/gcc/i686-pc-cygwin/3.4.4/include/c++/bits/stl_bvector.h:522:
> >>>> error: expected unqualified-id before '(' token
> >>>> /usr/lib/gcc/i686-pc-cygwin/3.4.4/include/c++/bits/stl_bvector.h: In
> >
> >>>> member function `void std::vector<bool,
> >>>> _Alloc>::_M_fill_insert(std::_Bit_iterator, size_t, bool)':
> >>>>
> > /usr/lib/gcc/i686-pc-cygwin/3.4.4/include/c++/bits/stl_bvector.h:823:
> >>>> error: expected unqualified-id before '(' token In file included
> >>>> from /usr/lib/gcc/i686-pc-cygwin/3.4.4/include/c++/
> >>>> vector:75,
> >>>> from ../../../ompi/tools/ompi_info/ompi_info.h:24,
> >>>> from param.cc:43:
> >>>> /usr/lib/gcc/i686-pc-cygwin/3.4.4/include/c++/bits/vector.tcc: In
> >>>> member function `void std::vector<_Tp,
> >>>> _Alloc>::_M_fill_insert(__gnu_cxx::__normal_iterator<typename
> >>>> _Alloc::pointer, std::vector<_Tp, _Alloc> >, size_t, const _Tp&)':
> >>>> /usr/lib/gcc/i686-pc-cygwin/3.4.4/include/c++/bits/vector.tcc:307:
> >>>> error: expected unqualified-id before '(' token
> >>>> /usr/lib/gcc/i686-pc-cygwin/3.4.4/include/c++/bits/vector.tcc: In
> >>>> member function `void std::vector<_Tp,
> >>>> _Alloc>::_M_range_insert(__gnu_cxx::__normal_iterator<typename
> >>>> _Alloc::pointer, std::vector<_Tp, _Alloc> >, _ForwardIterator,
> >>>> _ForwardIterator, std::forward_iterator_tag)':
> >>>> /usr/lib/gcc/i686-pc-cygwin/3.4.4/include/c++/bits/vector.tcc:384:
> >>>> error: expected unqualified-id before '(' token
> >>>>
> >>>>
> >>>> --
> >>>> Gustavo Seabra
> >>>> Postdoctoral Associate
> >>>> Quantum Theory Project - University of Florida Gainesville - Florida
> >
> >>>> - USA
> >>>>
> >>>>
> >>>> ------------------------------
> >>>>
> >>>> Message: 4
> >>>> Date: Mon, 3 Nov 2008 14:54:25 -0500
> >>>> From: Jeff Squyres <jsquyres_at_[hidden]>
> >>>> Subject: Re: [OMPI users] MPI + Mixed language coding(Fortran90 + C+
> >>>> +)
> >>>> To: Open MPI Users <users_at_[hidden]>
> >>>> Message-ID: <45698801-0857-466F-A19D-C529F72D4A18_at_[hidden]>
> >>>> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
> >>>>
> >>>> Can you replicate the scenario in smaller / different cases?
> >>>>
> >>>> - write a sample plugin in C instead of C++
> >>>> - write a non-MPI Fortran application that loads your C++
> >>>> application
> >>>> - ...?
> >>>>
> >>>> In short, *MPI* shouldn't be interfering with Fortran/C++ common
> >>>> blocks. Try taking MPI out of the picture and see if that makes the
> >
> >>>> problem go away.
> >>>>
> >>>> Those are pretty much shots in the dark, but I don't know where to
> >>>> go, either -- try random things until you find what you want.
> >>>>
> >>>>
> >>>> On Nov 3, 2008, at 3:51 AM, Rajesh Ramaya wrote:
> >>>>
> >>>>> Helllo Jeff, Gustavo, Mi
> >>>>> Thank for the advice. I am familiar with the difference in the
> >>>>> compiler code generation for C, C++ & FORTRAN. I even tried to look
> >
> >>>>> at some of the common block symbols. The name of the symbol remains
> >
> >>>>> the same. The only difference that I observe is in FORTRAN compiled
> >
> >>>>> *.o 0000000000515bc0 B aux7loc_ and the C++ compiled code U
> >>>>> aux7loc_ the memory is not allocated as it has been declared as
> >>>>> extern in C++. When the executable loads the shared library it
> >>>>> finds all the undefined symbols. Atleast if it did not manage to
> >>>>> find a single symbol it prints undefined symbol error.
> >>>>> I am completely stuck up and do not know how to continue further.
> >>>>>
> >>>>> Thanks,
> >>>>> Rajesh
> >>>>>
> >>>>> From: users-bounces_at_[hidden]
> >>>>> [mailto:users-bounces_at_[hidden]]
> >>>>> On Behalf Of Mi Yan
> >>>>> Sent: samedi 1 novembre 2008 23:26
> >>>>> To: Open MPI Users
> >>>>> Cc: 'Open MPI Users'; users-bounces_at_[hidden]
> >>>>> Subject: Re: [OMPI users] MPI + Mixed language coding(Fortran90 + C
> >>>>> ++)
> >>>>>
> >>>>> So your tests show:
> >>>>> 1. "Shared library in FORTRAN + MPI executable in FORTRAN" works.
> >>>>> 2. "Shared library in C++ + MPI executable in FORTRAN " does not
> >>>>> work.
> >>>>>
> >>>>> It seems to me that the symbols in C library are not really
> >>>>> recognized by FORTRAN executable as you thought. What compilers did
> >
> >>>>> yo use to built OpenMPI?
> >>>>>
> >>>>> Different compiler has different convention to handle symbols. E.g.
> >>>>> if there is a variable "var_foo" in your FORTRAN code, some FORTRN
> >>>>> compiler will save "var_foo_" in the object file by default; if you
> >
> >>>>> want to access "var_foo" in C code, you actually need to refer
> >>>>> "var_foo_" in C code. If you define "var_foo" in a module in the
> >>>>> FORTAN compiler, some FORTRAN compiler may append the module name
> >>>>> to "var_foo".
> >>>>> So I suggest to check the symbols in the object files generated by
> >>>>> your FORTAN and C compiler to see the difference.
> >>>>>
> >>>>> Mi
> >>>>> <image001.gif>"Rajesh Ramaya" <rajesh.ramaya_at_[hidden]>
> >>>>>
> >>>>>
> >>>>> "Rajesh Ramaya" <rajesh.ramaya_at_[hidden]> Sent by:
> >>>>> users-bounces_at_[hidden]
> >>>>> 10/31/2008 03:07 PM
> >>>>>
> >>>>> Please respond to
> >>>>> Open MPI Users <users_at_[hidden]> <image002.gif> To
> >>>>> <image003.gif> "'Open MPI Users'" <users_at_[hidden]>, "'Jeff
> >>>>> Squyres'" <jsquyres_at_[hidden]
> >>>>>>
> >>>>> <image002.gif>
> >>>>> cc
> >>>>> <image003.gif>
> >>>>> <image002.gif>
> >>>>> Subject
> >>>>> <image003.gif>
> >>>>> Re: [OMPI users] MPI + Mixed language coding(Fortran90 + C++)
> >>>>>
> >>>>> <image003.gif>
> >>>>> <image003.gif>
> >>>>>
> >>>>> Hello Jeff Squyres,
> >>>>> Thank you very much for the immediate reply. I am able to
> >>>>> successfully access the data from the common block but the values
> >>>>> are zero. In my algorithm I even update a common block but the
> >>>>> update made by the shared library is not taken in to account by the
> >
> >>>>> executable. Can you please be very specific how to make the
> >>>>> parallel algorithm aware of the data?
> >>>>> Actually I am
> >>>>> not writing any MPI code inside? It's the executable (third party
> >>>>> software)
> >>>>> who does that part. All that I am doing is to compile my code with
> >>>>> MPI c compiler and add it in the LD_LIBIRARY_PATH.
> >>>>> In fact I did a simple test by creating a shared library using a
> >>>>> FORTRAN code and the update made to the common block is taken in to
> >
> >>>>> account by the executable. Is there any flag or pragma that need to
> >
> >>>>> be activated for mixed language MPI?
> >>>>> Thank you once again for the reply.
> >>>>>
> >>>>> Rajesh
> >>>>>
> >>>>> -----Original Message-----
> >>>>> From: users-bounces_at_[hidden]
> >>>>> [mailto:users-bounces_at_[hidden]]
> >>>>> On
> >>>>> Behalf Of Jeff Squyres
> >>>>> Sent: vendredi 31 octobre 2008 18:53
> >>>>> To: Open MPI Users
> >>>>> Subject: Re: [OMPI users] MPI + Mixed language coding(Fortran90 + C
> >>>>> ++)
> >>>>>
> >>>>> On Oct 31, 2008, at 11:57 AM, Rajesh Ramaya wrote:
> >>>>>
> >>>>>> I am completely new to MPI. I have a basic question concerning
> >>>>>> MPI and mixed language coding. I hope any of you could help me
> > out.
> >>>>>> Is it possible to access FORTRAN common blocks in C++ in a MPI
> >>>>>> compiled code. It works without MPI but as soon I switch to MPI
> >>>>>> the access of common block does not work anymore.
> >>>>>> I have a Linux MPI executable which loads a shared library at
> >>>>>> runtime and resolves all undefined symbols etc The shared library
> >
> >>>>>> is written in C++ and the MPI executable in written in FORTRAN.
> >>>>>> Some
> >>>>>> of the input that the shared library looking for are in the
> >>>>>> Fortran common blocks. As I access those common blocks during
> >>>>>> runtime the values are not initialized. I would like to know if
> >>>>>> what I am doing is possible ?I hope that my problem is clear......
> >>>>>
> >>>>>
> >>>>> Generally, MPI should not get in the way of sharing common blocks
> >>>>> between Fortran and C/C++. Indeed, in Open MPI itself, we share a
> >>>>> few common blocks between Fortran and the main C Open MPI
> >>>>> implementation.
> >>>>>
> >>>>> What is the exact symptom that you are seeing? Is the application
> >>>>> failing to resolve symbols at run-time, possibly indicating that
> >>>>> something hasn't instantiated a common block? Or are you able to
> >>>>> successfully access the data from the common block, but it doesn't
> >>>>> have the values you expect (e.g., perhaps you're seeing all zeros)?
> >>>>>
> >>>>> If the former, you might want to check your build procedure. You
> >>>>> *should* be able to simply replace your C++ / F90 compilers with
> >>>>> mpicxx and mpif90, respectively, and be able to build an MPI
> >>>>> version of your app. If the latter, you might need to make your
> >>>>> parallel algorithm aware of what data is available in which MPI
> >>>>> process -- perhaps not all the data is filled in on each MPI
> > process...?
> >>>>>
> >>>>> --
> >>>>> Jeff Squyres
> >>>>> Cisco Systems
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> users mailing list
> >>>>> users_at_[hidden]
> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>>>
> >>>>> _______________________________________________
> >>>>> users mailing list
> >>>>> users_at_[hidden]
> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>>>
> >>>>> _______________________________________________
> >>>>> users mailing list
> >>>>> users_at_[hidden]
> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>>
> >>>>
> >>>> --
> >>>> Jeff Squyres
> >>>> Cisco Systems
> >>>>
> >>>>
> >>>>
> >>>> ------------------------------
> >>>>
> >>>> Message: 5
> >>>> Date: Mon, 3 Nov 2008 15:04:47 -0500
> >>>> From: Jeff Squyres <jsquyres_at_[hidden]>
> >>>> Subject: Re: [OMPI users] Problems installing in Cygwin - Problem
> >>>> with
> >>>> GCC 3.4.4
> >>>> To: Open MPI Users <users_at_[hidden]>
> >>>> Message-ID: <8E364B51-6726-4533-ADE2-AEA266380DCC_at_[hidden]>
> >>>> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
> >>>>
> >>>> On Nov 3, 2008, at 2:53 PM, Gustavo Seabra wrote:
> >>>>
> >>>>> Finally, I was *almost* able to compile OpenMPI in Cygwin using the
> >
> >>>>> following configure command:
> >>>>>
> >>>>> ./configure --prefix=/home/seabra/local/openmpi-1.3b1 \
> >>>>> --with-mpi-param_check=always --with-threads=posix \
> >>>>> --enable-mpi-threads --disable-io-romio \
> >>>>> --enable-mca-no-
> >>>>> build=memory_mallopt,maffinity,paffinity \
> >>>>> --enable-contrib-no-build=vt \
> >>>>> FC=g95 'FFLAGS=-O0 -fno-second-underscore' CXX=g++
> >>>>
> >>>> For your fortran issue, the Fortran 90 interface needs the Fortran
> >>>> 77 interface. So you need to supply an F77 as well (the output from
> >
> >>>> configure should indicate that the F90 interface was disabled
> >>>> because the F77 interface was disabled).
> >>>>
> >>>>> I then had a very weird error during compilation of
> >>>>> ompi/tools/ompi_info/params.cc. (See below).
> >>>>>
> >>>>> The lines causing the compilation errors are:
> >>>>>
> >>>>> vector.tcc:307: const size_type __len = __old_size +
> >>>>> std::max(__old_size, __n);
> >>>>> vector.tcc:384: const size_type __len = __old_size +
> >>>>> std::max(__old_size, __n);
> >>>>> stl_bvector.h:522: const size_type __len = size() +
> >>>>> std::max(size(), __n);
> >>>>> stl_bvector.h:823: const size_type __len = size() +
> >>>>> std::max(size(), __n);
> >>>>>
> >>>>> (Notice that those are from the standard gcc libraries.)
> >>>>>
> >>>>> After googling it for a while, I could find that this error is
> >>>>> caused because, at come point, the source code being compiled
> >>>>> redefined the "max" function with a macro, g++ cannot recognize the
> >
> >>>>> "std::max"
> >>>>> that
> >>>>> happens in those lines and only "sees" a (...), thus printing that
> >>>>> cryptic complaint.
> >>>>>
> >>>>> I looked in some places in the OpenMPI code, but I couldn't find
> >>>>> "max" being redefined anywhere, but I may be looking in the wrong
> >>>>> places. Anyways, the only way of found of compiling OpenMPI was a
> >>>>> very ugly hack: I have to go into those files and remove the
> >>>>> "std::"
> >>>>> before
> >>>>> the "max". With that, it all compiled cleanly.
> >>>>
> >>>> I'm not sure I follow -- I don't see anywhere in OMPI where we use
> >>>> std::max. What areas did you find that you needed to change?
> >>>>
> >>>>> I did try running the tests in the 'tests' directory (with 'make
> >>>>> check'), and I didn't get any alarming message, except that in some
> >
> >>>>> cases (class, threads, peruse) it printed "All 0 tests passed". I
> >>>>> got and "All (n) tests passed" (n>0) for asm and datatype.
> >>>>>
> >>>>> Can anybody comment on the meaning of those test results? Should I
> >>>>> be alarmed with the "All 0 tests passed" messages?
> >>>>
> >>>> No. We don't really maintain the "make check" stuff too well.
> >>>>
> >>>> --
> >>>> Jeff Squyres
> >>>> Cisco Systems
> >>>>
> >>>>
> >>>>
> >>>> ------------------------------
> >>>>
> >>>> _______________________________________________
> >>>> users mailing list
> >>>> users_at_[hidden]
> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>>
> >>>> End of users Digest, Vol 1055, Issue 2
> >>>> **************************************
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> -Rima
> >>> <step1>_______________________________________________
> >>> users mailing list
> >>> users_at_[hidden]
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >>
> >>
> >> ------------------------------
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >> End of users Digest, Vol 1055, Issue 4
> >> **************************************
> >>
> >
> >
> >
> > --
> > -Rima
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> >
> > ------------------------------
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > End of users Digest, Vol 1057, Issue 3
> > **************************************
> >
>
>
>
> --
> -Rima
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Dr. Daniel Gruner                        dgruner_at_[hidden]
Dept. of Chemistry                       daniel.gruner_at_[hidden]
University of Toronto                    phone:  (416)-978-8689
80 St. George Street                     fax:    (416)-978-5325
Toronto, ON  M5S 3H6, Canada             finger for PGP public key