Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Paul Fons (paul-fons_at_[hidden])
Date: 2006-09-05 03:42:42


  I have what is probably a simple question (I hope). I have built
openmpi-1.1.1 from source using gfortran on Mac OS X 10.4.7. I can
run parallel jobs on my own using the mpiexec -np command. My
machinefile contains the lines:

tachyon.a04.aist.go.jp
tachyon.a04.aist.go.jp
gehirn.local
gehirn.local

(the .local uses zeroconfig to find the address of gehirn -- it
works). Running a parallel job on my own machine (-np 2) everything
is fine. The job runs in parallel; it is faster and the output is
correct. When I try running with -np 4 to use an additional g5 dual
cpu machine, my job hangs whilst churning large amounts of cpu
(runaway processes). This continues without output until I break the
process with a ^C (which terminates them on all machines). I am
running the task via ssh using a ssh-agent. Might anyone have any
idea what possibly could be wrong. I have attached my config.log and
ompi_info files (bzip2'ed) to this mail as specified in the mailing
list instructions. This should be a simple thing I am guessing, but
it is taking too much time to figure it out on my own (e.g. I
couldn't find a FAQ or a user question/reply that answered this).

                                        Paul Fons

Script started on Tue Sep 5 16:01:18 2006
[tachyon:exafs/feff85/zno] paulfons% mpiexec -machinefile machinefile
-np 2 host name

tachyon.a04.aist.go.jp
tachyon.a04.aist.go.jp
[tachyon:exafs/feff85/zno] paulfons% mpiexec -machinefile machinefile
-np 2 /opt/feff/feff85/rdinp

Number of processors = 2
Feff 8.40
   XANES:
name: zincite ZnO
formula: ZnO
sites: Zn1,O1
refer1: wyckoff, vol 1, ch III, p 111
refer2:
schoen:
notes1:
[tachyon:exafs/feff85/zno] paulfons% mpiexec -machinefile machinefile
-np 2 hostname

tachyon.a04.aist.go.jp
tachyon.a04.aist.go.jp
dhcp054092.a04.aist.go.jp
dhcp054092.a04.aist.go.jp
[tachyon:exafs/feff85/zno] paulfons% mpiexec -machinefile machinefile
-np 4 /opt/feff/feff85/rdinp

Number of processors = 4
Feff 8.40
   XANES:
name: zincite ZnO
formula: ZnO
sites: Zn1,O1
refer1: wyckoff, vol 1, ch III, p 111
refer2:
schoen:
notes1:

^Cmpiexec: killing job...





  • application/pkcs7-signature attachment: smime.p7s