Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] intermittent node file error running with torque/maui integration
From: Noam Bernstein (noam.bernstein_at_[hidden])
Date: 2013-09-20 12:48:50


On Sep 20, 2013, at 11:52 AM, Gus Correa <gus_at_[hidden]> wrote:

> Hi Noam
>
> Could it be that Torque, or probably more likely NFS,
> is too slow to create/make available the PBS_NODEFILE?
>
> What if you insert a "sleep 2",
> or whatever number of seconds you want,
> before the mpiexec command line?
> Or maybe better, a "ls -l $PBS_NODEFILE; cat $PBS_NODEFILE",
> just to make sure the file it is available and
> filled with the node list, before mpiexec takes over?

I don't see how NFS could be involved, since it's on a local filesystem.
As for adding a sleep, I already tried that - if the file doesn't exist, I sleep a few
seconds and check again, and in every case if it's not there to begin with it's not
there the second time either. And this all doesn't explain the very
mysterious even more infrequent situation where I can cat the file, but
mpirun can't find it.

                                                                                                Noam