Hi,

 

You should consult the CPMD manual on how to run the program in parallel – this doesn’t look like a problem in Open MPI. The error comes from MPI_ABORT being called by rank 0. As rank 0 process is the one that reads all the input data and prepares the computation I would say that the most probable reason for the crash is inconsistency in the program input. It could be that some of the parameters specified there are not compatible with running the program with 4 processes. It can also happen (at least with some DFT codes) if you try to continue a previous simulation that was performed on different number of processes. Quantum Espresso also uses similar technique to abort but at least it prints a cryptic error message before the crash :)

 

Hope that helps!

 

Kind regards,

Hristo

--

Hristo Iliev, Ph.D. -- High Performance Computing

RWTH Aachen University, Center for Computing and Communication

Rechen- und Kommunikationszentrum der RWTH Aachen

Seffenter Weg 23,  D 52074  Aachen (Germany)

Tel: +49 241 80 24367 -- Fax/UMS: +49 241 80 624367

 

From: users-bounces@open-mpi.org [mailto:users-bounces@open-mpi.org] On Behalf Of Abhra Paul
Sent: Thursday, July 19, 2012 1:35 PM
To: users@open-mpi.org
Subject: [OMPI users] mpirun command gives ERROR

 

Respected developers and users

 

I am trying to run a parallel program CPMD with the command " /usr/local/bin/mpirun -np 4 ./cpmd.x 1-h2-wave.inp > 1-h2-wave.out &" , it

is giving the following error:

======================================================================================================

 

[testcpmd@slater CPMD_3_15_3]$ /usr/local/bin/mpirun -np 4 ./cpmd.x 1-h2-wave.inp > 1-h2-wave.out &
[1] 1769
[testcpmd@slater CPMD_3_15_3]$ --------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 999.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 1770 on
node slater.rcamos.iacs exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------

[1]+  Exit 231                /usr/local/bin/mpirun -np 4 ./cpmd.x 1-h2-wave.inp > 1-h2-wave.out
======================================================================================================

I am unable to find out the reason of that error. Please help. My Open-MPI version is 1.6.

 

With regards

Abhra Paul