Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] " MPI can not open file?"
From: Bernhard Knapp (bernhard.knapp_at_[hidden])
Date: 2009-04-07 08:21:38


Dear Ralph and other users

I tried both versions with the relative path and with the -wdir option
but in both cases the error is still the same. Additionally I tried to
simply start the job in my home directory but it does not help either
... any other ideas?

thx
Bernhard

[bknapp_at_quoVadis04 testSet]$ mpirun -np 8 -machinefile
/home/bknapp/scripts/machinefile.txt mdrun -np 8 -nice 0 -s
gromacsRuns/testSet/1fyt_PKYVKQNTLELAT_bindingRegionsOnly.md.tpr -o
gromacsRuns/testSet/1fyt_PKYVKQNTLELAT_bindingRegionsOnly.md.trr -c
gromacsRuns/testSet/1fyt_PKYVKQNTLELAT_bindingRegionsOnly.md.pdb -g
gromacsRuns/testSet/1fyt_PKYVKQNTLELAT_bindingRegionsOnly.md.log -e
gromacsRuns/testSet/1fyt_PKYVKQNTLELAT_bindingRegionsOnly.md.edr -v

[bknapp_at_quoVadis04 testSet]$ mpirun -np 8 -machinefile
/home/bknapp/scripts/machinefile.txt mdrun -np 8 -nice 0 -s
1fyt_PKYVKQNTLELAT_bindingRegionsOnly.md.tpr -o
1fyt_PKYVKQNTLELAT_bindingRegionsOnly.md.trr -c
1fyt_PKYVKQNTLELAT_bindingRegionsOnly.md.pdb -g
1fyt_PKYVKQNTLELAT_bindingRegionsOnly.md.log -e
1fyt_PKYVKQNTLELAT_bindingRegionsOnly.md.edr -v -wdir
/home/bknapp/gromacsRuns/testSet/

Back Off! I just backed up 1fyt_PKYVKQNTLELAT_bindingRegionsOnly.md.log
to ./#1fyt_PKYVKQNTLELAT_bindingRegionsOnly.md.log.15#
Getting Loaded...
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode -1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------

-------------------------------------------------------
Program mdrun, VERSION 4.0.3
Source code file: gmxfio.c, line: 736

Can not open file:
1fyt_PKYVKQNTLELAT_bindingRegionsOnly.md.tpr
-------------------------------------------------------

"My Brothers are Protons (Protons!), My Sisters are Neurons (Neurons)"
(Gogol Bordello)

Error on node 0, will try to stop all the nodes
Halting parallel program mdrun on CPU 0 out of 8

gcq#318: "My Brothers are Protons (Protons!), My Sisters are Neurons
(Neurons)" (Gogol Bordello)

--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 4313 on
node 192.168.0.103 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------

Ralph wrote:

I assume you are running in a non-managed environment and so are using
ssh for your launcher? Could you tell us what version of OMPI you are
using?

The problem is that ssh drops you in your home directory, not your
current working directory. Thus, the path to any file you specify must
be relative to your home directory. Alternatively, you can specify the
desired current working directory on the mpirun cmd line. Do a "man
mpirun" to find the specific option.

I'd have to check, but we may have corrected this in recent versions
(or a soon-to-be-released one) so that we automatically move you to
the cwd after the daemon is started. However, I know that we didn't do
that in some earlier versions - perhaps in the 1.2.x series as well.

Ralph

On Apr 7, 2009, at 5:05 AM, Bernhard Knapp wrote:

> Hi
>
> I am trying to get a parallel job of the gromacs software started.
> MPI seems to boot fine but unfortunately it seems not to be able to
> open a specified file although it is definitly in the directory
> where the job is started. I also changed the file permissions to 777
> but it does not affect the result. Any suggestions?
>
> cheers
> Bernhard
>
>
> [bknapp_at_quoVadis04 testSet]$ mpirun -np 8 -machinefile /home/bknapp/
> scripts/machinefile.txt mdrun -np 8 -nice 0 -s
> 1fyt_PKYVKQNTLELAT_bindingRegionsOnly.md.tpr -o
> 1fyt_PKYVKQNTLELAT_bindingRegionsOnly.md.trr -c
> 1fyt_PKYVKQNTLELAT_bindingRegionsOnly.md.pdb -g
> 1fyt_PKYVKQNTLELAT_bindingRegionsOnly.md.log -e
> 1fyt_PKYVKQNTLELAT_bindingRegionsOnly.md.edr -v
> bknapp_at_192.168.0.103's password:
> NNODES=8, MYRANK=1, HOSTNAME=quoVadis04
> NNODES=8, MYRANK=3, HOSTNAME=quoVadis04
> NNODES=8, MYRANK=7, HOSTNAME=quoVadis04
> NNODES=8, MYRANK=0, HOSTNAME=quoVadis03
> NNODES=8, MYRANK=4, HOSTNAME=quoVadis03
> NNODES=8, MYRANK=6, HOSTNAME=quoVadis03
> NODEID=4 argc=16
> NNODES=8, MYRANK=2, HOSTNAME=quoVadis03
> NODEID=1 argc=16
> NODEID=3 argc=16
> NODEID=7 argc=16
> NODEID=2 argc=16
> NODEID=6 argc=16
> NODEID=0 argc=16
>
>
>
--------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> with errorcode -1.
>
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
>
--------------------------------------------------------------------------
>
> -------------------------------------------------------
> Program mdrun, VERSION 4.0.3
> Source code file: gmxfio.c, line: 736
>
> Can not open file:
> 1fyt_PKYVKQNTLELAT_bindingRegionsOnly.md.tpr
> -------------------------------------------------------
>
> "I Need a Little Poison" (Throwing Muses)
>
> Error on node 0, will try to stop all the nodes
> Halting parallel program mdrun on CPU 0 out of 8
>
> gcq#108: "I Need a Little Poison" (Throwing Muses)
>
>
--------------------------------------------------------------------------
> mpirun has exited due to process rank 0 with PID 3777 on
> node 192.168.0.103 exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
>
--------------------------------------------------------------------------
>
>
>
>
>
> [bknapp_at_quoVadis04 testSet]$ ll
> 1fyt_PKYVKQNTLELAT_bindingRegionsOnly.md.tpr
> -rwxrwxrwx 1 bknapp bknapp 6118424 2009-03-13 09:44
> 1fyt_PKYVKQNTLELAT_bindingRegionsOnly.md.tpr