Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Openmpi problem
From: Acero Fernandez Alicia (alicia.acero_at_[hidden])
Date: 2012-05-03 11:59:20


Hello,

I have a problem when running a mpi program with openmpi library. I did the following.

 1.- I installed the ofed 1.5.4 from RHEL. The hardware are qlogic 7340 ib cards.
 
2.- I am using openmpi 1.4.3 , the one that comes with ofed 1.5.4
 
3.- I have check openmpi website, and I have all the requirements they asked:
 
        ssh passwordless
        same ofed/openmpi version in all the cluster nodes
        iband conectivity between the nodes, etc
 
4.- When I run an mpi program it runs properly in one node, but it doesn´t run in more than one node. The error I can see in the execution is the following:
 
dirac13.ciemat.es:06415] plm:tm: failed to poll for a spawned daemon, return status = 17002
------------------------------------------------------------------------

--
A daemon (pid unknown) died unexpectedly on signal 1  while attempting to launch so we are aborting.
 
There may be more information reported by the environment (see above).
 
This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes.
------------------------------------------------------------------------
--
------------------------------------------------------------------------
--
mpiexec noticed that the job aborted, but has no info as to the process that caused that situation.
------------------------------------------------------------------------
--
------------------------------------------------------------------------
--
mpiexec was unable to cleanly terminate the daemons on the nodes shown below. Additional manual cleanup may be required - please refer to the "orte-clean" tool for assistance.
------------------------------------------------------------------------
--
        dirac12.ciemat.es - daemon did not report back when launched
 
 
The command I use to run the mpi program is the following:
 
 
        mpiexec -H dirac12,dirac13 ./cpi
 
I have also tried
 
        mpiexec -np 24 -H dirac12,dirac13 ./cpi
 
And sending to the batch
 
        mpiexec -np 24 -hostfile $PBS_NODEFILE ./cpi
 
All of them with the same result.
 
 
All the mpi libraries in the cluster are the same in all the nodes.
 
Please, could anyone help me?
    
Thanks,
Alicia
----------------------------
Confidencialidad: 
Este mensaje y sus ficheros adjuntos se dirige exclusivamente a su destinatario y puede contener información privilegiada o confidencial. Si no es vd. el destinatario indicado, queda notificado de que la utilización, divulgación y/o copia sin autorización está prohibida en virtud de la legislación vigente. Si ha recibido este mensaje por error, le rogamos que nos lo comunique inmediatamente respondiendo al mensaje y proceda a su destrucción.
Disclaimer: 
This message and its attached files is intended exclusively for its recipients and may contain confidential information. If you received this e-mail in error you are hereby notified that any dissemination, copy or disclosure of this communication is strictly prohibited and may be unlawful. In this case, please notify us by a reply and delete this email and its contents immediately. 
----------------------------