Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Mark Kosmowski (mark.kosmowski_at_[hidden])
Date: 2007-02-08 12:56:16


> Message: 1
> Date: Wed, 7 Feb 2007 17:37:41 -0500
> From: "Alex Tumanov" <atumanov_at_[hidden]>
> Subject: Re: [OMPI users] first time user - can run mpi job SMP but
> not over cluster
> To: "Open MPI Users" <users_at_[hidden]>
> Message-ID:
> <2453e2900702071437k20a13e97g5014253aa97ccaba_at_[hidden]>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Hello,
>
> > mpirun -np 2 myprogram inputfile >outputfile
> There can be a whole host of issues with the way you run your
> executable and/or the way you have the environment setup. First of
> all, when you ssh into the node, does the environment automatically
> get updated with correct Open MPI paths? I.e. LD_LIBRARY_PATH should
> be correctly set to the OMPI lib directory, PATH should contain OMPI's
> bin dir, etc. If this is not the case, you have two options:
> a. create small /etc/profile.d scripts to set up those env. variables
> b. use --prefix version when you invoke mpirun on the headnode
>
> Generally, it would be much more helpful if you provided the actual
> output of running the commands you listed here.
>
> > mpirun --hostfile myhostfile -np 4 myprogram inputfile >outputfile
> Another issue I can think of is path specification to 'myprogram'. Do
> you just cd into the directory where it resides and specify its name
> only? Try to either specify an absolute path to the executable or path
> relative to your homedir: ~/appdir/bin/appexec, assuming this location
> is the same on all the nodes. If mpirun can't find your executable on
> one of the nodes, it should report that as an error.
>
> > which does not write to the output file.
> Does it write anything to stderr? You could also try invoking mpirun
> with '--mca pls_rsh_agent ssh'
>
> > mpirun --hostfile myhostfile -np 4 `myprogram inputfile >outputfile`
> Are those backquotes?? I would recommend getting mpirun to invoke
> something basic on all the participating nodes successfully first, try
> mpirun --prefix /path/to/ompi/ --hostfile myhosfile --np 4 hostname
> for instance. Nothing else will work until this does.
>
> These are just a few pointers to get you started. Hope this helps.
>
> Alex.
>
Thanks for the suggestions - the mpirun ... hostname is helping me
narrow down the problem.

Both systems have PATH and LD_LIBRARY_PATH setup properly by
definition - mpirun can launch successfully for an SMP job.

Running mpirun --hostname myhostfile -np 4 hostname (with or without
-- prefix openmpi path) gives the following results:

MASTERNODE
MASTERNODE
(system hangs here and I have to cntl-c to kill mpirun)

I copied myhostfile to a shared directory and attempted the same
command from the slave node and got:

SLAVENODE
SLAVENODE
an echo message from masternode .bashrc
(system hangs here and I have to cntl-c to kill mpirun)

I'm thinking that either my ssh is misbehaving somehow or there is an
issue with having two network connections in each node (I haven't
unplugged the internet connection from my slave node yet and my master
node will always be having an internet connection in addition to the
gigabit cluster network).

I hope this is helpful to try to help me troubleshoot my system.

Thanks!

Mark Kosmowski