Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Frank (openmpi-user_at_[hidden])
Date: 2006-03-19 07:10:04


Hi Brian,

that's all I get when submitting the job with the -d option to mpirun:

[powerbook:00682] procdir: (null)
[powerbook:00682] jobdir: (null)
[powerbook:00682] unidir:
/tmp/openmpi-sessions-admin_at_powerbook_0/default-universe
[powerbook:00682] top: openmpi-sessions-admin_at_powerbook_0
[powerbook:00682] tmp: /tmp
[powerbook:00682] connect_uni: contact info read
[powerbook:00682] connect_uni: connection not allowed
[powerbook:00682] [0,0,0] setting up session dir with
[powerbook:00682] tmpdir /tmp
[powerbook:00682] universe default-universe-682
[powerbook:00682] user admin
[powerbook:00682] host powerbook
[powerbook:00682] jobid 0
[powerbook:00682] procid 0
[powerbook:00682] procdir:
/tmp/openmpi-sessions-admin_at_powerbook_0/default-universe-682/0/0
[powerbook:00682] jobdir:
/tmp/openmpi-sessions-admin_at_powerbook_0/default-universe-682/0
[powerbook:00682] unidir:
/tmp/openmpi-sessions-admin_at_powerbook_0/default-universe-682
[powerbook:00682] top: openmpi-sessions-admin_at_powerbook_0
[powerbook:00682] tmp: /tmp
[powerbook:00682] [0,0,0] contact_file
/tmp/openmpi-sessions-admin_at_powerbook_0/default-universe-682/universe-setup.txt
[powerbook:00682] [0,0,0] wrote setup file
[powerbook:00682] spawn: in job_state_callback(jobid = 1, state = 0x1)
[g4d003.local:19326] [0,1,26] setting up session dir with
[g4d003.local:19327] [0,1,33] setting up session dir with
[g4d003.local:19326] universe default-universe
[g4d003.local:19327] universe default-universe
[powerbook:00690] [0,1,17] setting up session dir with
[g4d003.local:19326] user nobody
[g4d003.local:19327] user nobody
[powerbook:00690] universe default-universe
[g4d003.local:19326] host xgrid-node-26
[g4d003.local:19327] host xgrid-node-33
[powerbook:00690] user nobody
[g4d003.local:19326] jobid 1
[g4d003.local:19327] jobid 1
[powerbook:00690] host xgrid-node-17
[ibook-g4:14666] [0,1,7] setting up session dir with
[g4d003.local:19326] procid 26
[g4d003.local:19327] procid 33
[powerbook:00690] jobid 1
[ibook-g4:14666] universe default-universe
[g4d003.local:19326] procdir:
/tmp/openmpi-sessions-nobody_at_xgrid-node-26_0/default-universe/1/26
[g4d003.local:19327] procdir:
/tmp/openmpi-sessions-nobody_at_xgrid-node-33_0/default-universe/1/33
[powerbook:00690] procid 17
[ibook-g4:14666] user nobody
[g4d003.local:19326] jobdir:
/tmp/openmpi-sessions-nobody_at_xgrid-node-26_0/default-universe/1
[g4d003.local:19327] jobdir:
/tmp/openmpi-sessions-nobody_at_xgrid-node-33_0/default-universe/1
[powerbook:00690] procdir:
/tmp/openmpi-sessions-nobody_at_xgrid-node-17_0/default-universe/1/17
[ibook-g4:14666] host xgrid-node-7
[g4d003.local:19326] unidir:
/tmp/openmpi-sessions-nobody_at_xgrid-node-26_0/default-universe
[g4d003.local:19327] unidir:
/tmp/openmpi-sessions-nobody_at_xgrid-node-33_0/default-universe
[powerbook:00690] jobdir:
/tmp/openmpi-sessions-nobody_at_xgrid-node-17_0/default-universe/1
[ibook-g4:14666] jobid 1
[g4d003.local:19326] top: openmpi-sessions-nobody_at_xgrid-node-26_0
[g4d003.local:19327] top: openmpi-sessions-nobody_at_xgrid-node-33_0
[powerbook:00690] unidir:
/tmp/openmpi-sessions-nobody_at_xgrid-node-17_0/default-universe
[ibook-g4:14666] procid 7
[g4d003.local:19326] tmp: /tmp
[g4d003.local:19327] tmp: /tmp
[powerbook:00690] top: openmpi-sessions-nobody_at_xgrid-node-17_0
[ibook-g4:14666] procdir:
/tmp/openmpi-sessions-nobody_at_xgrid-node-7_0/default-universe/1/7
[powerbook:00690] tmp: /tmp
[ibook-g4:14666] jobdir:
/tmp/openmpi-sessions-nobody_at_xgrid-node-7_0/default-universe/1
[ibook-g4:14666] unidir:
/tmp/openmpi-sessions-nobody_at_xgrid-node-7_0/default-universe
[ibook-g4:14666] top: openmpi-sessions-nobody_at_xgrid-node-7_0
[ibook-g4:14666] tmp: /tmp

Does this is of any help to you?

Thanks,
Frank

On Mar 18, 2006, at 5:40 AM, Frank wrote:

> XGRID_CONTROLLER_HOSTNAME and XGRID_CONTROLLER_PASSWORD are
> properly set
> up, Open-MPI 1.0.1 is installed on all machines (with the same
> configure
> options). When configured with --prefix=/usr/local/openmpi my app is
> supplied to the xgrid controller and I can see that copy's of my
> app are
> "supplied" to the other machines, too - but the jobs hang, nothing
> happens (user nobody has full access to the folder /usr/local/myapp
> where my app is run). /usr/local/openmpi/bin and /usr/local/openmpi/
> lib
> are added to the variables PATH and DYLD_LIBRARY_PATH on every
> machine,
> too. I'm running into this situation no matter from which machine
> my app
> ist started. To the guys with openmpi and xgrid performing correct:
> which configure options did you use? The firewall is told not block
> any
> internal traffic on the subnet. When not using the xgrid my app
> performs
> correct.
>
> Has anyone any idea concerning this matter?

My first guess was going to be the firewall issue, but if you can run
without XGrid, that probably isn't the case. Could you try an XGrid
run with the -d option to mpirun? That will enable some debugging
output that should help determine what is going wrong.

Thanks,

Brian