Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Can't start program across network
From: Prentice Bisbal (prentice_at_[hidden])
Date: 2009-03-16 10:22:59


Raymond Wan wrote:
>
> Hi Jeff,
>
> Some "good" news (but still some bad news). Y and Z are part of a set
> of 8 machines and I found out that mpirun works for one of them. I
> didn't checked a couple of them before -- sorry! However, I'm no closer
> to the solution since all 8 should be "identical", according to our
> sysadmin. He said the only difference (that he can think of) between
> the working one and all the others is that the working one has an NIS
> server installed. It is the NIS server for the cluster (presumably, the
> others run a client version). Could that be the reason? He can't think
> of anything else that distinguishes between them but he says it is
> possible that the NIS server is correctly configured for what we use it
> for, but not for what I'm doing with Open MPI -- he doesn't know what
> should be done, though.

In an earlier e-mail in this thread, I theorized that this might be a
problem with your name service. This latest information seems to support
that theory.

To test, on all 3 systems, use the 'host' command to see if you can
resolve the hostnames of all the 3 systems.

On host X, do this:

host X
host Y
host Z

Then do the same on hosts Y and Z.

If the 'host' command can resolve properly, you should see something
like this:

$ host foo
foo.example.com has address 192.168.1.1

If 'host' can't resolve a hostname properly, you should see something
like this:

$ host bar
Host bar not found: 3(NXDOMAIN)

OpenMPI should be using the same nameservice libraries all the other
programs use, so I find it hard to believe everything *but* OpenMPI is
working propery, but I suppose it could be possible. I've seen weirder.

-- 
Prentice