Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Openmpi problem
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-05-03 12:09:04


You apparently are running on a cluster that uses Torque, yes? If so, it won't use ssh to do the launch - it uses Torque to do it, so the passwordless ssh setup is irrelevant.

Did you ensure that your LD_LIBRARY_PATH includes the OMPI install lib location?

On May 3, 2012, at 9:59 AM, Acero Fernandez Alicia wrote:

>
>
> Hello,
>
> I have a problem when running a mpi program with openmpi library. I did the following.
>
>
> 1.- I installed the ofed 1.5.4 from RHEL. The hardware are qlogic 7340 ib cards.
>
> 2.- I am using openmpi 1.4.3 , the one that comes with ofed 1.5.4
>
> 3.- I have check openmpi website, and I have all the requirements they asked:
>
> ssh passwordless
> same ofed/openmpi version in all the cluster nodes
> iband conectivity between the nodes, etc
>
> 4.- When I run an mpi program it runs properly in one node, but it doesn´t run in more than one node. The error I can see in the execution is the following:
>
> dirac13.ciemat.es:06415] plm:tm: failed to poll for a spawned daemon, return status = 17002
> ------------------------------------------------------------------------
>
> --
>
> A daemon (pid unknown) died unexpectedly on signal 1 while attempting to launch so we are aborting.
>
>
>
> There may be more information reported by the environment (see above).
>
>
>
> This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes.
>
> ------------------------------------------------------------------------
>
> --
>
> ------------------------------------------------------------------------
>
> --
>
> mpiexec noticed that the job aborted, but has no info as to the process that caused that situation.
>
> ------------------------------------------------------------------------
>
> --
>
> ------------------------------------------------------------------------
>
> --
>
> mpiexec was unable to cleanly terminate the daemons on the nodes shown below. Additional manual cleanup may be required - please refer to the "orte-clean" tool for assistance.
>
> ------------------------------------------------------------------------
>
> --
>
> dirac12.ciemat.es - daemon did not report back when launched
>
>
>
> The command I use to run the mpi program is the following:
>
>
> mpiexec -H dirac12,dirac13 ./cpi
>
> I have also tried
>
> mpiexec -np 24 -H dirac12,dirac13 ./cpi
>
> And sending to the batch
>
> mpiexec -np 24 -hostfile $PBS_NODEFILE ./cpi
>
> All of them with the same result.
>
>
> All the mpi libraries in the cluster are the same in all the nodes.
>
> Please, could anyone help me?
>
> Thanks,
> Alicia
>
> ----------------------------
> Confidencialidad:
> Este mensaje y sus ficheros adjuntos se dirige exclusivamente a su destinatario y puede contener información privilegiada o confidencial. Si no es vd. el destinatario indicado, queda notificado de que la utilización, divulgación y/o copia sin autorización está prohibida en virtud de la legislación vigente. Si ha recibido este mensaje por error, le rogamos que nos lo comunique inmediatamente respondiendo al mensaje y proceda a su destrucción.
>
> Disclaimer:
> This message and its attached files is intended exclusively for its recipients and may contain confidential information. If you received this e-mail in error you are hereby notified that any dissemination, copy or disclosure of this communication is strictly prohibited and may be unlawful. In this case, please notify us by a reply and delete this email and its contents immediately.
> ----------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users