Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] ssh MPi and program tests
From: Francesco Pietra (chiendarret_at_[hidden])
Date: 2009-04-06 15:40:54


Hi Gus:
Partial quick answers below. I have reestablished the ssh connection
so that tomorrow I'll run the tests. Everything that relates to
running amber is on the "parallel computer", where I have access to
everything.

On Mon, Apr 6, 2009 at 7:53 PM, Gus Correa <gus_at_[hidden]> wrote:
> Hi Francesco, list
>
> Francesco Pietra wrote:
>>
>> On Mon, Apr 6, 2009 at 5:21 PM, Gus Correa <gus_at_[hidden]> wrote:
>>>
>>> Hi Francesco
>>>
>>> Did you try to run examples/connectivity_c.c,
>>> or examples/hello_c.c before trying amber?
>>> They are in the directory where you untarred the OpenMPI tarball.
>>> It is easier to troubleshoot
>>> possible network and host problems
>>> with these simpler programs.
>>
>> I have found the "examples". Should they be compiled? how? This is my
>> only question here.
>
> cd examples/
> /full/path/to/openmpi/bin/mpicc -o connectivity_c connectivity_c.c
>
> Then run it with, say:
>
> /full/path/to/openmpi/bin/mpirun -host {whatever_hosts_you_want}
> -n {as_many_processes_you_want} connectivity_c
>
> Likewise for hello_c.c
>
>> What's below is info. Although amber parallel
>> would have not compiled with faulty openmpi, I'll run openmpi tests as
>> soon as I understand how.
>>
>>> Also, to avoid confusion,
>>> you may use a full path name to mpirun,
>>> in case you have other MPI flavors in your system.
>>> Often times the mpirun your path is pointing to is not what you
>>> may think it is.
>>
>>
>> which mpirun
>> /usr/local/bin/mpirun
>
> Did you install OpenMPI on /usr/local ?
> When you do "mpirun -help", do you see "mpirun (Open MPI) 1.3"?

mpirun -help
mpirun (Open MPI) 1.3.1
on the 1st line, then follow the options

> How about the output of "orte_info" ?
orte_info was not installed. See below what has been installed.

> Does it show your Intel compilers, etc?

I guess so, otherwise amber would have not been compiled, but I don't
know the commands to prove it. The intel compilers are on the path:
/opt/intel/cce/10.1.015/bin:/opt/intel/fce/10.1.015/bin and the mkl
are sourced in .bashrc.

>
> I ask because many Linux distributions come with one or more flavors
> of MPI (OpenMPI, MPICH, LAM, etc), some compilers also do (PGI for
> instance), some tools (Intel MKL?) may also have their MPI,
> and you end up with a bunch of MPI commands
> on your path that may produce a big mixup.
> This is a pretty common problem that affect new users on this list,
> on the MPICH list, on clustering lists, etc.
> The errors messages often don't help find the source of the problem,
> and people spend a lot of time trying to troubleshoot network,
> etc, when is often just a path problem.
>
> So, this is why when you begin, you may want to use full path
> names, to avoid confusion.
> After the basic MPI functionality is working,
> then you can go and fix your path chain,
> and rely on your path chain.
>
>>
>> there is no other accessible MPI (one application, DOT2, has mpich but
>> it is a static compilation; DOT2 parallelizatuion requires thar the
>> computer knows itself, i.e." ssh hostname date" should afford the date
>> passwordless. The reported issues in testing amber have destroyed this
>> situation: now deb64 has port22 closed, evem to itself.
>>
>
> Have you tried to reboot the master node, to see if it comes back
> to the original ssh setup?
> You need ssh to be functional to run OpenMPI code,
> including the tests above.
>
>>
>>> I don't know if you want to run on amd64 alone (master node?)
>>> or on a cluster.
>>> In any case, you may use a list of hosts
>>> or a hostfile on the mpirun command line,
>>> to specify where you want to run.
>>
>> With amber I use the parallel computer directly and the amber
>> installation is chown to me. The ssh connection, in this case, only
>> serves to get file from. or send files to, my desktop.
>>
>
> It is unclear to me what you mean by "the parallel computer directly".
> Can you explain better which computers are in this game?
> Your desktop and a cluster perhaps?
> Are they both Debian 64 Linux?
> Where do you compile the programs?
> Where do you want to run the programs?
>
>> In my .bashrc:
>>
>> (for amber)
>> MPI_HOME=/usr/local
>> export MPI_HOME
>>
>> (for openmpi)
>> if [ "$LD_LIBRARY_PATH" ] ; then
>>  export LD_LIBRARY_PATH="$LD_LIBRARY_PATH'/usr/local/lib"
>> else
>>  export LD_LIBRARY_PATH="/usr/local/lib"
>> fi
>>
>
> Is this on your desktop or on the "parallel computer"?

On both "parallel computers" (there is my desktop, ssh to two uma-type
dual-opteron "parallel computers". Only one was active when the "test"
problems arose. While the (ten years old) destop is i386, both other
machines are amd64, i.e., all debian lenny. I prepare the input files
on the i386 and use it also as storage for backups. The "parallel
computer" has only the X server and a minimal window for a
two-dimensional graphics of amber. The other parallel computer has a
GeForce 6600 card with GLSL support, which I use to elaborate
graphically the outputs from the numerical computations (using VMD,
Chimera and other 64 bit graphical programs).

>
>>
>> There is also
>>
>> MPICH_HOME=/usr/local
>> export MPICH_HOME
>>
>> this is for DOCK, which, with this env variabl, accepts openmpi (at
>> lest it was so with v 1.2.6)
>>
>
> Oh, well, it looks like there is MPICH already installed on /usr/local.
> So, this may be part of the confusion, the path confusion I referred to.

No, there is no MPICH installed. With the above export, DOCK (a
docking program from the same developers of Amber) is so kind to use
the executables of openmpi. The export was suggested by the DOCK
developers, and it worked. Unable to explain why.

As far as the parallel support is concerned, /usr/local/bin only
contains what openmpi 1.3.1 has installed (resulting from ./configure
cc=/path/icc cxx=/path/icpc F77=path/ifort FC=path/ifort
--with-libnuma=/usr/lib):
mpic++ mpicc mpiCC mpicc-vt mpiCC-vt mpic++-vt mpicxx mpicxx-vt
mpiexec mpif77 mpif77-vt mpif90 mpif90-vt mpirun ompi-clean ompi-info
ompi-ps ompi-server opal-wapper opari orte-clean orted orte-iof
orte-ps orterun otfaux otfcompress otfconfig otfdecompress otfdump
otfmerge vtcc vtcxx vtf77 vtf90 vtfilter vtunify. There is no
orte_info.

>
> I would suggest installing OpenMPI on a different directory,
> using the --prefix option of the OpenMPI configure script.
> Do configure --help for details about all configuration options.
>
>
>> the intel compilers (compiled ifort and icc, are sourced in both my
>> .bashrc and root home .bashrc.
>>
>> Thanks and apologies for my low level in these affairs. It is the
>> first time I am faced by such problems, with amd64, same intel
>> compilers, and openmpi 1.2.6 everything was in order.
>>
>
> To me it doesn't look like the problem is related to the new version
> of OpenMPI.

I asked about that because I am using the same commands, .bashrc, etc
that worked with version 1.2.6. The computers are the same, the only
(non minor) difference is upgrading from amd64 etch to amd64 lenny (or
I am doing mistakes that I have not yet detected).
>
> Try the test programs with full path names first.
> It may not solve the problem, but it may clarify things a bit.
>
> Gus Correa
> ---------------------------------------------------------------------
> Gustavo Correa
> Lamont-Doherty Earth Observatory - Columbia University
> Palisades, NY, 10964-8000 - USA
> ---------------------------------------------------------------------
>
>> francesco
>>
>>
>>
>>> Do "/full/path/to/openmpi/bin/mpirun --help" for details.
>>>
>>> I am not familiar to amber, but how does it find your openmpi
>>> libraries and compiler wrappers?
>>> Don't you need to give it the paths during configuration,
>>> say,
>>> /configure_amber -openmpi=/full/path/to/openmpi
>>> or similar?
>>>
>>> I hope this helps.
>>> Gus Correa
>>> ---------------------------------------------------------------------
>>> Gustavo Correa
>>> Lamont-Doherty Earth Observatory - Columbia University
>>> Palisades, NY, 10964-8000 - USA
>>> ---------------------------------------------------------------------
>>>
>>>
>>> Francesco Pietra wrote:
>>>>
>>>> I have compiled openmpi 1.3.1 on debian amd64 lenny with icc/ifort
>>>> (10.1.015) and libnuma. Tests passed:
>>>>
>>>> ompi_info | grep libnuma
>>>>  MCA affinity: libnuma (MCA v 2.0, API 2.0)
>>>>
>>>> ompi_info | grep maffinity
>>>>  MCA affinity: first use (MCA as above)
>>>>  MCA affinity: libnuma as above.
>>>>
>>>> Then, I have compiled parallel a molecular dynamics package, amber10,
>>>> without error signals but I am having problems in testing the amber
>>>> parallel installation.
>>>>
>>>> amber10 configure was set as:
>>>>
>>>> ./configure_amber -openmpi -nobintray ifort
>>>>
>>>> just as I used before with openmpi 1.2.6. Could you say if the
>>>> -openmpi should be changed?
>>>>
>>>> cd tests
>>>>
>>>> export DO_PARALLEL='mpirun -np 4'
>>>>
>>>> make test.parallel.MM  < /dev/null
>>>>
>>>> cd cytosine && ./Run.cytosine
>>>> The authenticity of host deb64 (which is the hostname) (127.0.1.1)
>>>> can't be established.
>>>> RSA fingerprint .....
>>>> connecting ?
>>>>
>>>> I stopped the ssh daemon, whereby tests were interrupted because deb64
>>>> (i.e., itself) could no more be accessed. Further attempts under these
>>>> conditions failed for the same reason. Now, sshing to deb64 is no more
>>>> possible: port 22 closed. In contrast, sshing from deb64 to other
>>>> computers occurs passwordless. No such problems arose at the time of
>>>> amd64 etch with the same
>>>> configuration of ssh, same compilers, and openmpi 1.2.6.
>>>>
>>>> I am here because the warning from the amber site is that I should to
>>>> learn how to use my installation of MPI. Therefore, if there is any
>>>> clue ..
>>>>
>>>> thanks
>>>> francesco pietra
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>