Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] mpirun command gives ERROR
From: Iliev, Hristo (iliev_at_[hidden])
Date: 2012-07-19 08:15:23


Hi,

 

You should consult the CPMD manual on how to run the program in parallel -
this doesn't look like a problem in Open MPI. The error comes from MPI_ABORT
being called by rank 0. As rank 0 process is the one that reads all the
input data and prepares the computation I would say that the most probable
reason for the crash is inconsistency in the program input. It could be that
some of the parameters specified there are not compatible with running the
program with 4 processes. It can also happen (at least with some DFT codes)
if you try to continue a previous simulation that was performed on different
number of processes. Quantum Espresso also uses similar technique to abort
but at least it prints a cryptic error message before the crash :)

 

Hope that helps!

 

Kind regards,

Hristo

--
Hristo Iliev, Ph.D. -- High Performance Computing
RWTH Aachen University, Center for Computing and Communication
Rechen- und Kommunikationszentrum der RWTH Aachen
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241 80 24367 -- Fax/UMS: +49 241 80 624367
 
From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On
Behalf Of Abhra Paul
Sent: Thursday, July 19, 2012 1:35 PM
To: users_at_[hidden]
Subject: [OMPI users] mpirun command gives ERROR
 
Respected developers and users
 
I am trying to run a parallel program CPMD with the command "
/usr/local/bin/mpirun -np 4 ./cpmd.x 1-h2-wave.inp > 1-h2-wave.out &" , it 
is giving the following error:
============================================================================
==========================
 
[testcpmd_at_slater CPMD_3_15_3]$ /usr/local/bin/mpirun -np 4 ./cpmd.x
1-h2-wave.inp > 1-h2-wave.out &
[1] 1769
[testcpmd_at_slater CPMD_3_15_3]$
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD 
with errorcode 999.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 1770 on
node slater.rcamos.iacs exiting improperly. There are two reasons this could
occur:
1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.
2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"
This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[1]+  Exit 231                /usr/local/bin/mpirun -np 4 ./cpmd.x
1-h2-wave.inp > 1-h2-wave.out
============================================================================
==========================
I am unable to find out the reason of that error. Please help. My Open-MPI
version is 1.6.
 
With regards
Abhra Paul




  • application/pkcs7-signature attachment: smime.p7s