Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Grobe, Gary L. \(JSC-EV\)[ESCG] (gary.l.grobe_at_[hidden])
Date: 2007-01-08 14:52:42


I was wondering if someone could send me the HACKING file so I can do a
bit more with debugging on the snapshots. Our web proxy has webdav
methods turned off (request methods fail) so that I can't get to the
latest of the svn repos.

> Second thing. From one of your previous emails, I see that MX
> is configured with 4 instance by node. Your running with
> exactly 4 processes on the first 2 nodes. Weirds things might
> happens ...

Just curious about this comment. Are you referring to over subscribing?
We run 4 processes on each node because we have 2 dual core cpu's on
each node. Am I not understanding processor counts correctly?
 
> PS: Is there any way you can attach to the processes with gdb
> ? I would like to see the backtrace as showed by gdb in order
> to be able to figure out what's wrong there.

When I can get more detailed dbg, I'll send. Though I'm not clear on
what executable is being searched for below.

$ mpirun -dbg=gdb --prefix /usr/local/openmpi-1.2b3r13030 -x
LD_LIBRARY_PATH=${LD_LIBRARY_PATH} --hostfile ./h1-3 -np 5 --mca pml cm
--mca mtl mx ./cpi

[juggernaut:14949] connect_uni: connection not allowed
[juggernaut:14949] connect_uni: connection not allowed
[juggernaut:14949] connect_uni: connection not allowed
[juggernaut:14949] connect_uni: connection not allowed
[juggernaut:14949] connect_uni: connection not allowed
[juggernaut:14949] connect_uni: connection not allowed
[juggernaut:14949] connect_uni: connection not allowed
[juggernaut:14949] connect_uni: connection not allowed
[juggernaut:14949] connect_uni: connection not allowed
[juggernaut:14949] [0,0,0] setting up session dir with
[juggernaut:14949] universe default-universe-14949
[juggernaut:14949] user ggrobe
[juggernaut:14949] host juggernaut
[juggernaut:14949] jobid 0
[juggernaut:14949] procid 0
[juggernaut:14949] procdir:
/tmp/openmpi-sessions-ggrobe_at_juggernaut_0/default-universe-14949/0/0
[juggernaut:14949] jobdir:
/tmp/openmpi-sessions-ggrobe_at_juggernaut_0/default-universe-14949/0
[juggernaut:14949] unidir:
/tmp/openmpi-sessions-ggrobe_at_juggernaut_0/default-universe-14949
[juggernaut:14949] top: openmpi-sessions-ggrobe_at_juggernaut_0
[juggernaut:14949] tmp: /tmp
[juggernaut:14949] [0,0,0] contact_file
/tmp/openmpi-sessions-ggrobe_at_juggernaut_0/default-universe-14949/univers
e-setup.txt
[juggernaut:14949] [0,0,0] wrote setup file
[juggernaut:14949] pls:rsh: local csh: 0, local sh: 1
[juggernaut:14949] pls:rsh: assuming same remote shell as local shell
[juggernaut:14949] pls:rsh: remote csh: 0, remote sh: 1
[juggernaut:14949] pls:rsh: final template argv:
[juggernaut:14949] pls:rsh: /usr/bin/ssh <template> orted --debug
--bootproxy 1 --name <template> --num_procs 2 --vpid_start 0 --nodename
<template> --universe ggrobe_at_juggernaut:default-universe-14949
--nsreplica "0.0.0;tcp://192.168.2.10:43121" --gprreplica
"0.0.0;tcp://192.168.2.10:43121"
[juggernaut:14949] pls:rsh: launching on node juggernaut
[juggernaut:14949] pls:rsh: juggernaut is a LOCAL node
[juggernaut:14949] pls:rsh: changing to directory /home/ggrobe
[juggernaut:14949] pls:rsh: executing: orted --debug --bootproxy 1
--name 0.0.1 --num_procs 2 --vpid_start 0 --nodename juggernaut
--universe ggrobe_at_juggernaut:default-universe-14949 --nsreplica
"0.0.0;tcp://192.168.2.10:43121" --gprreplica
"0.0.0;tcp://192.168.2.10:43121"
[juggernaut:14950] [0,0,1] setting up session dir with
[juggernaut:14950] universe default-universe-14949
[juggernaut:14950] user ggrobe
[juggernaut:14950] host juggernaut
[juggernaut:14950] jobid 0
[juggernaut:14950] procid 1
[juggernaut:14950] procdir:
/tmp/openmpi-sessions-ggrobe_at_juggernaut_0/default-universe-14949/0/1
[juggernaut:14950] jobdir:
/tmp/openmpi-sessions-ggrobe_at_juggernaut_0/default-universe-14949/0
[juggernaut:14950] unidir:
/tmp/openmpi-sessions-ggrobe_at_juggernaut_0/default-universe-14949
[juggernaut:14950] top: openmpi-sessions-ggrobe_at_juggernaut_0
[juggernaut:14950] tmp: /tmp
------------------------------------------------------------------------

--
Failed to find the following executable:
Host:       juggernaut
Executable: -b
Cannot continue.
------------------------------------------------------------------------
--
[juggernaut:14950] [0,0,1] ORTE_ERROR_LOG: Fatal in file
odls_default_module.c at line 1193
[juggernaut:14949] spawn: in job_state_callback(jobid = 1, state = 0x80)
[juggernaut:14950] [0,0,1] ORTE_ERROR_LOG: Fatal in file orted.c at line
575
[juggernaut:14950] sess_dir_finalize: job session dir not empty -
leaving
[juggernaut:14950] sess_dir_finalize: proc session dir not empty -
leaving
[juggernaut:14949] sess_dir_finalize: proc session dir not empty -
leaving