Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Brock Palen (brockp_at_[hidden])
Date: 2006-06-15 10:08:12


Jezz i really cant read this morning, you are using torque and the
mpiexec is the one with openmpi. I cant help you then someone else
is going to have to. Sorry

Brock Palen
Center for Advanced Computing
brockp_at_[hidden]
(734)936-1985

On Jun 15, 2006, at 9:42 AM, Martin Schafföner wrote:

> Hi,
>
> I have been trying to set up OpenMPI 1.0.3a1r10374 on our cluster
> and was
> partly successful. Partly, because installation worked, compiling a
> simple
> example and running it through the rsh pls also worked. However,
> I'm the only
> user who has rsh access to the nodes, all other users must go
> through torque
> and launch mpi apps using torque's TM subsystem. That's where my
> problem
> starts: I was not successful in launching apps through TM. TM pls is
> configured okay, I can see it making connections to torque mom in
> mom's
> logfile; however, the app never gets run. Even if I only request one
> processor, mpiexec spawns several orted in a row. Here is my
> session log
> (where I kill mpiexec using CTRL-C cause it would otherwise run
> forever):
>
> schaffoe_at_node16:~/tmp/mpitest> mpiexec -np 1 --mca pls_tm_debug 1 --
> mca pls tm
> `pwd`/openmpitest
> [node16:03113] pls:tm: final top-level argv:
> [node16:03113] pls:tm: orted --no-daemonize --bootproxy 1 --name
> --num_procs 2 --vpid_start 0 --nodename --universe
> schaffoe_at_node16:default-universe-3113 --nsreplica
> "0.0.0;tcp://192.168.1.16:60601" --gprreplica
> "0.0.0;tcp://192.168.1.16:60601"
> [node16:03113] pls:tm: launching on node node16
> [node16:03113] pls:tm: found /opt/openmpi/bin/orted
> [node16:03113] pls:tm: not oversubscribed -- setting
> mpi_yield_when_idle to 0
> [node16:03113] pls:tm: executing: orted --no-daemonize --bootproxy
> 1 --name
> 0.0.1 --num_procs 2 --vpid_start 0 --nodename node16 --universe
> schaffoe_at_node16:default-universe-3113 --nsreplica
> "0.0.0;tcp://192.168.1.16:60601" --gprreplica
> "0.0.0;tcp://192.168.1.16:60601"
> [node16:03113] pls:tm: final top-level argv:
> [node16:03113] pls:tm: orted --no-daemonize --bootproxy 1 --name
> --num_procs 3 --vpid_start 0 --nodename --universe
> schaffoe_at_node16:default-universe-3113 --nsreplica
> "0.0.0;tcp://192.168.1.16:60601" --gprreplica
> "0.0.0;tcp://192.168.1.16:60601"
> [node16:03113] pls:tm: launching on node node16
> [node16:03113] pls:tm: not oversubscribed -- setting
> mpi_yield_when_idle to 0
> [node16:03113] pls:tm: executing: orted --no-daemonize --bootproxy
> 1 --name
> 0.0.2 --num_procs 3 --vpid_start 0 --nodename node16 --universe
> schaffoe_at_node16:default-universe-3113 --nsreplica
> "0.0.0;tcp://192.168.1.16:60601" --gprreplica
> "0.0.0;tcp://192.168.1.16:60601"
> [node16:03113] pls:tm: final top-level argv:
> [node16:03113] pls:tm: orted --no-daemonize --bootproxy 1 --name
> --num_procs 4 --vpid_start 0 --nodename --universe
> schaffoe_at_node16:default-universe-3113 --nsreplica
> "0.0.0;tcp://192.168.1.16:60601" --gprreplica
> "0.0.0;tcp://192.168.1.16:60601"
> [node16:03113] pls:tm: launching on node node16
> [node16:03113] pls:tm: not oversubscribed -- setting
> mpi_yield_when_idle to 0
> [node16:03113] pls:tm: executing: orted --no-daemonize --bootproxy
> 1 --name
> 0.0.3 --num_procs 4 --vpid_start 0 --nodename node16 --universe
> schaffoe_at_node16:default-universe-3113 --nsreplica
> "0.0.0;tcp://192.168.1.16:60601" --gprreplica
> "0.0.0;tcp://192.168.1.16:60601"
> mpiexec: killing job...
> [node16:03113] pls:tm: final top-level argv:
> [node16:03113] pls:tm: orted --no-daemonize --bootproxy 1 --name
> --num_procs 5 --vpid_start 0 --nodename --universe
> schaffoe_at_node16:default-universe-3113 --nsreplica
> "0.0.0;tcp://192.168.1.16:60601" --gprreplica
> "0.0.0;tcp://192.168.1.16:60601"
> [node16:03113] pls:tm: launching on node node16
> [node16:03113] pls:tm: not oversubscribed -- setting
> mpi_yield_when_idle to 0
> [node16:03113] pls:tm: executing: orted --no-daemonize --bootproxy
> 1 --name
> 0.0.4 --num_procs 5 --vpid_start 0 --nodename node16 --universe
> schaffoe_at_node16:default-universe-3113 --nsreplica
> "0.0.0;tcp://192.168.1.16:60601" --gprreplica
> "0.0.0;tcp://192.168.1.16:60601"
> [node16:03113] pls:tm: final top-level argv:
> [node16:03113] pls:tm: orted --no-daemonize --bootproxy 1 --name
> --num_procs 6 --vpid_start 0 --nodename --universe
> schaffoe_at_node16:default-universe-3113 --nsreplica
> "0.0.0;tcp://192.168.1.16:60601" --gprreplica
> "0.0.0;tcp://192.168.1.16:60601"
> [node16:03113] pls:tm: launching on node node16
> [node16:03113] pls:tm: not oversubscribed -- setting
> mpi_yield_when_idle to 0
> [node16:03113] pls:tm: executing: orted --no-daemonize --bootproxy
> 1 --name
> 0.0.5 --num_procs 6 --vpid_start 0 --nodename node16 --universe
> schaffoe_at_node16:default-universe-3113 --nsreplica
> "0.0.0;tcp://192.168.1.16:60601" --gprreplica
> "0.0.0;tcp://192.168.1.16:60601"
> ----------------------------------------------------------------------
> ----
> WARNING: mpiexec encountered an abnormal exit.
>
> This means that mpiexec exited before it received notification that
> all
> started processes had terminated. You should double check and ensure
> that there are no runaway processes still executing.
> ----------------------------------------------------------------------
> ----
>
>
> I read in the README that TM pls is working, whereas in the latex
> usersguide
> it says that only rsh and bproc are supported. I am confused...
>
> Can anybody shed a better light on this?
>
> Regards,
> --
> Martin Schafföner
>
> Cognitive Systems Group, Institute of Electronics, Signal
> Processing and
> Communication Technologies, Department of Electrical Engineering,
> Otto-von-Guericke University Magdeburg
> Phone: +49 391 6720063
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users