Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [Wrf-users] WRF Problem running in Parallel on multiple nodes(cluster)
From: Ahsan Ali (ahsanshah01_at_[hidden])
Date: 2011-05-03 23:14:41


Dear Bart,

I think OpenMPI don't need to be installed on all machines because they are
NFS shared with the master node. I don't know how to check output of which
orted, it is running just on the master node. I have another application
which is running similarly but I am having problem with WRF.

On Tue, May 3, 2011 at 9:06 PM, Bart Brashers <bbrashers_at_[hidden]>wrote:

> It looks like OpenMPI is not installed on all your execution machines. You
> need to install at least the libs on all machines, or on an NFS-shared
> location. Check the output of "which orted" on the machine that works.
>
>
>
> Bart
>
>
>
> *From:* wrf-users-bounces_at_[hidden] [mailto:wrf-users-bounces_at_[hidden]] *On
> Behalf Of *Ahsan Ali
> *Sent:* Tuesday, May 03, 2011 1:04 AM
> *To:* users_at_[hidden]
> *Subject:* [Wrf-users] WRF Problem running in Parallel on multiple
> nodes(cluster)
>
>
>
> Hello,
>
>
>
> I am able to run WRFV3.2.1 using mpirun on multiple cores of single
> machine, but when I want to run it across multiple nodes in cluster using
> hostlist then I get error, The compute nodes are mounted with the master
> node during boot using NFS. I get following error. Please help.
>
>
>
> [root_at_pmd02 em_real]# mpirun -np 10 -hostfile /home/pmdtest/hostlist
> ./real.exe
>
> bash: orted: command not found
>
> bash: orted: command not found
>
> --------------------------------------------------------------------------
>
> A daemon (pid 22006) died unexpectedly with status 127 while attempting
>
> to launch so we are aborting.
>
>
>
> There may be more information reported by the environment (see above).
>
>
>
> This may be because the daemon was unable to find all the needed shared
>
> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
>
> location of the shared libraries on the remote nodes and this will
>
> automatically be forwarded to the remote nodes.
>
> --------------------------------------------------------------------------
>
> --------------------------------------------------------------------------
>
> mpirun noticed that the job aborted, but has no info as to the process
>
> that caused that situation.
>
> --------------------------------------------------------------------------
>
> mpirun: clean termination accomplished
>
>
>
>
> --
> Syed Ahsan Ali Bokhari
> Electronic Engineer (EE)
>
>
> Research & Development Division
> Pakistan Meteorological Department H-8/4, Islamabad.
> Phone # off +92518358714
>
> Cell # +923155145014
>
>
>
> ------------------------------
> This message contains information that may be confidential, privileged or
> otherwise protected by law from disclosure. It is intended for the exclusive
> use of the Addressee(s). Unless you are the addressee or authorized agent of
> the addressee, you may not review, copy, distribute or disclose to anyone
> the message or any information contained within. If you have received this
> message in error, please contact the sender by electronic reply to
> email_at_[hidden] and immediately delete all copies of the message.
>
>

-- 
Syed Ahsan Ali Bokhari
Electronic Engineer (EE)
Research & Development Division
Pakistan Meteorological Department H-8/4, Islamabad.
Phone # off  +92518358714
Cell # +923155145014