Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] ssh MPi and program tests
From: Gus Correa (gus_at_[hidden])
Date: 2009-04-06 17:03:11

Hi Francesco

See answers inline.

Francesco Pietra wrote:
> Hi Gus:
> Partial quick answers below. I have reestablished the ssh connection
> so that tomorrow I'll run the tests. Everything that relates to
> running amber is on the "parallel computer", where I have access to
> everything.
> On Mon, Apr 6, 2009 at 7:53 PM, Gus Correa <gus_at_[hidden]> wrote:
>> Hi Francesco, list
>> Francesco Pietra wrote:
>>> On Mon, Apr 6, 2009 at 5:21 PM, Gus Correa <gus_at_[hidden]> wrote:
>>>> Hi Francesco
>>>> Did you try to run examples/connectivity_c.c,
>>>> or examples/hello_c.c before trying amber?
>>>> They are in the directory where you untarred the OpenMPI tarball.
>>>> It is easier to troubleshoot
>>>> possible network and host problems
>>>> with these simpler programs.
>>> I have found the "examples". Should they be compiled? how? This is my
>>> only question here.
>> cd examples/
>> /full/path/to/openmpi/bin/mpicc -o connectivity_c connectivity_c.c
>> Then run it with, say:
>> /full/path/to/openmpi/bin/mpirun -host {whatever_hosts_you_want}
>> -n {as_many_processes_you_want} connectivity_c
>> Likewise for hello_c.c
>>> What's below is info. Although amber parallel
>>> would have not compiled with faulty openmpi, I'll run openmpi tests as
>>> soon as I understand how.
>>>> Also, to avoid confusion,
>>>> you may use a full path name to mpirun,
>>>> in case you have other MPI flavors in your system.
>>>> Often times the mpirun your path is pointing to is not what you
>>>> may think it is.
>>> which mpirun
>>> /usr/local/bin/mpirun
>> Did you install OpenMPI on /usr/local ?
>> When you do "mpirun -help", do you see "mpirun (Open MPI) 1.3"?
> mpirun -help
> mpirun (Open MPI) 1.3.1
> on the 1st line, then follow the options

Ok, it looks like you installed OpenMPI 1.3.1 with the default
"--prefix" which is /usr/local.

>> How about the output of "orte_info" ?
> orte_info was not installed. See below what has been installed.

Sorry, my fault.
I meant ompi_info (not orte_info).
Please try ompi_info or "ompi_info --config".
It will tell you the compilers used to build OpenMPI, etc.

I presume all of this is being done in the "parallel computer",
i.e., in one of the AMD64 Debian systems, right?

>> Does it show your Intel compilers, etc?
> I guess so, otherwise amber would have not been compiled, but I don't
> know the commands to prove it. The intel compilers are on the path:
> /opt/intel/cce/10.1.015/bin:/opt/intel/fce/10.1.015/bin and the mkl
> are sourced in .bashrc.

Again, all in the AMD64 system, right?

>> I ask because many Linux distributions come with one or more flavors
>> of MPI (OpenMPI, MPICH, LAM, etc), some compilers also do (PGI for
>> instance), some tools (Intel MKL?) may also have their MPI,
>> and you end up with a bunch of MPI commands
>> on your path that may produce a big mixup.
>> This is a pretty common problem that affect new users on this list,
>> on the MPICH list, on clustering lists, etc.
>> The errors messages often don't help find the source of the problem,
>> and people spend a lot of time trying to troubleshoot network,
>> etc, when is often just a path problem.
>> So, this is why when you begin, you may want to use full path
>> names, to avoid confusion.
>> After the basic MPI functionality is working,
>> then you can go and fix your path chain,
>> and rely on your path chain.
>>> there is no other accessible MPI (one application, DOT2, has mpich but
>>> it is a static compilation; DOT2 parallelizatuion requires thar the
>>> computer knows itself, i.e." ssh hostname date" should afford the date
>>> passwordless. The reported issues in testing amber have destroyed this
>>> situation: now deb64 has port22 closed, evem to itself.
>> Have you tried to reboot the master node, to see if it comes back
>> to the original ssh setup?
>> You need ssh to be functional to run OpenMPI code,
>> including the tests above.
>>>> I don't know if you want to run on amd64 alone (master node?)
>>>> or on a cluster.
>>>> In any case, you may use a list of hosts
>>>> or a hostfile on the mpirun command line,
>>>> to specify where you want to run.
>>> With amber I use the parallel computer directly and the amber
>>> installation is chown to me. The ssh connection, in this case, only
>>> serves to get file from. or send files to, my desktop.
>> It is unclear to me what you mean by "the parallel computer directly".
>> Can you explain better which computers are in this game?
>> Your desktop and a cluster perhaps?
>> Are they both Debian 64 Linux?
>> Where do you compile the programs?
>> Where do you want to run the programs?
>>> In my .bashrc:
>>> (for amber)
>>> MPI_HOME=/usr/local
>>> export MPI_HOME
>>> (for openmpi)
>>> if [ "$LD_LIBRARY_PATH" ] ; then
>>> export LD_LIBRARY_PATH="$LD_LIBRARY_PATH'/usr/local/lib"
>>> else
>>> export LD_LIBRARY_PATH="/usr/local/lib"
>>> fi
>> Is this on your desktop or on the "parallel computer"?
> On both "parallel computers" (there is my desktop, ssh to two uma-type
> dual-opteron "parallel computers".
> Only one was active when the "test"
> problems arose. While the (ten years old) destop is i386, both other
> machines are amd64, i.e., all debian lenny. I prepare the input files
> on the i386 and use it also as storage for backups.

So, you only use your i386 desktop to ssh to the AMD64 machine,
and to prepare input files, etc, right?
The OpenMPI installation, the compilations you do, and the job runs
all happen in the AMD64 system, right?

BTW, do you use each of these systems separately on your
MPI program runs,
or do you use them together?
If you use them together, are they connected through a network,
and did you setup passowrdless ssh connections between them?

> The "parallel
> computer" has only the X server and a minimal window for a
> two-dimensional graphics of amber.

I don't know how amber works, so please tell me.
Do you somehow interact with amber while it is running in parallel mode,
using this "minimal window for a two dimensional graphics"?
Or is this only a data post-processing activity that happens after the
parallel run of amber finishes?

> The other parallel computer has a
> GeForce 6600 card with GLSL support, which I use to elaborate
> graphically the outputs from the numerical computations (using VMD,
> Chimera and other 64 bit graphical programs).
>>> There is also
>>> MPICH_HOME=/usr/local
>>> export MPICH_HOME
>>> this is for DOCK, which, with this env variabl, accepts openmpi (at
>>> lest it was so with v 1.2.6)
>> Oh, well, it looks like there is MPICH already installed on /usr/local.
>> So, this may be part of the confusion, the path confusion I referred to.
> No, there is no MPICH installed. With the above export, DOCK (a
> docking program from the same developers of Amber) is so kind to use
> the executables of openmpi. The export was suggested by the DOCK
> developers, and it worked. Unable to explain why.

OK, this may be a way the DOCK developers found to trick their own
software (DOCK) to think MPICH is installed in /usr/local,
and actually use the OpenMPI libraries instead of MPICH.
They may have hardwired on their build scripts the "MPICH_HOME"
environment variable as the location where the MPI libraries reside.
But which MPI libraries are there may not matter much, I would guess.
Just a guess anyway.
(I have no idea of what the heck DOCK is or how it works.)

> As far as the parallel support is concerned, /usr/local/bin only
> contains what openmpi 1.3.1 has installed (resulting from ./configure
> cc=/path/icc cxx=/path/icpc F77=path/ifort FC=path/ifort
> --with-libnuma=/usr/lib):
> mpic++ mpicc mpiCC mpicc-vt mpiCC-vt mpic++-vt mpicxx mpicxx-vt
> mpiexec mpif77 mpif77-vt mpif90 mpif90-vt mpirun ompi-clean ompi-info
> ompi-ps ompi-server opal-wapper opari orte-clean orted orte-iof
> orte-ps orterun otfaux otfcompress otfconfig otfdecompress otfdump
> otfmerge vtcc vtcxx vtf77 vtf90 vtfilter vtunify. There is no
> orte_info.

Of course not.
Doh! I misspelled the name ... :(
It is ompi_info for sure.

>> I would suggest installing OpenMPI on a different directory,
>> using the --prefix option of the OpenMPI configure script.
>> Do configure --help for details about all configuration options.
>>> the intel compilers (compiled ifort and icc, are sourced in both my
>>> .bashrc and root home .bashrc.
>>> Thanks and apologies for my low level in these affairs. It is the
>>> first time I am faced by such problems, with amd64, same intel
>>> compilers, and openmpi 1.2.6 everything was in order.
>> To me it doesn't look like the problem is related to the new version
>> of OpenMPI.
> I asked about that because I am using the same commands, .bashrc, etc
> that worked with version 1.2.6. The computers are the same, the only
> (non minor) difference is upgrading from amd64 etch to amd64 lenny (or
> I am doing mistakes that I have not yet detected).

Yes, but I still don't think it is some problem in OpenMPI 1.3.1 that is
causing trouble here.
If it were, the program would start running, but mpirun is having
trouble even to start the programs, right?

Since you seem to have also upgraded the Debian release,
therefore another part of the system also changed.
But still, it may not be related to Debian either.
It may be just some confusion on paths, etc.

I really encourage you to try to compile and run the programs in the
examples directory.
They are very clear and simple (as opposed to amber, which hides behind
a few layers of software), and even if they fail, the failure will help
clarify the nature of the problem, and find a fix.

Oh, well, I am afraid I am asking more questions than helping out,
but I am trying to understand what is going on.

Gus Correa

>> Try the test programs with full path names first.
>> It may not solve the problem, but it may clarify things a bit.
>> Gus Correa
>> ---------------------------------------------------------------------
>> Gustavo Correa
>> Lamont-Doherty Earth Observatory - Columbia University
>> Palisades, NY, 10964-8000 - USA
>> ---------------------------------------------------------------------
>>> francesco
>>>> Do "/full/path/to/openmpi/bin/mpirun --help" for details.
>>>> I am not familiar to amber, but how does it find your openmpi
>>>> libraries and compiler wrappers?
>>>> Don't you need to give it the paths during configuration,
>>>> say,
>>>> /configure_amber -openmpi=/full/path/to/openmpi
>>>> or similar?
>>>> I hope this helps.
>>>> Gus Correa
>>>> ---------------------------------------------------------------------
>>>> Gustavo Correa
>>>> Lamont-Doherty Earth Observatory - Columbia University
>>>> Palisades, NY, 10964-8000 - USA
>>>> ---------------------------------------------------------------------
>>>> Francesco Pietra wrote:
>>>>> I have compiled openmpi 1.3.1 on debian amd64 lenny with icc/ifort
>>>>> (10.1.015) and libnuma. Tests passed:
>>>>> ompi_info | grep libnuma
>>>>> MCA affinity: libnuma (MCA v 2.0, API 2.0)
>>>>> ompi_info | grep maffinity
>>>>> MCA affinity: first use (MCA as above)
>>>>> MCA affinity: libnuma as above.
>>>>> Then, I have compiled parallel a molecular dynamics package, amber10,
>>>>> without error signals but I am having problems in testing the amber
>>>>> parallel installation.
>>>>> amber10 configure was set as:
>>>>> ./configure_amber -openmpi -nobintray ifort
>>>>> just as I used before with openmpi 1.2.6. Could you say if the
>>>>> -openmpi should be changed?
>>>>> cd tests
>>>>> export DO_PARALLEL='mpirun -np 4'
>>>>> make test.parallel.MM < /dev/null
>>>>> cd cytosine && ./Run.cytosine
>>>>> The authenticity of host deb64 (which is the hostname) (
>>>>> can't be established.
>>>>> RSA fingerprint .....
>>>>> connecting ?
>>>>> I stopped the ssh daemon, whereby tests were interrupted because deb64
>>>>> (i.e., itself) could no more be accessed. Further attempts under these
>>>>> conditions failed for the same reason. Now, sshing to deb64 is no more
>>>>> possible: port 22 closed. In contrast, sshing from deb64 to other
>>>>> computers occurs passwordless. No such problems arose at the time of
>>>>> amd64 etch with the same
>>>>> configuration of ssh, same compilers, and openmpi 1.2.6.
>>>>> I am here because the warning from the amber site is that I should to
>>>>> learn how to use my installation of MPI. Therefore, if there is any
>>>>> clue ..
>>>>> thanks
>>>>> francesco pietra
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
> _______________________________________________
> users mailing list
> users_at_[hidden]