Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] 1.3 hangs running 2 exes with different names (Ralph Castain)
From: Geoffroy Pignot (geopignot_at_[hidden])
Date: 2009-01-23 02:57:46


Hello

I redid few tests with my hello world , here are my results.

First of all my config :
configure --prefix=/tmp/openmpi-1.3 --libdir=/tmp/openmpi-1.3/lib64
--enable-heterogeneous . you will find attached my ompi_info -param all all
compil02 and compil03 are identical Rh43 64 bits nodes.

*Test 1 : *
compil02% ls /tmp
a.out openmpi-1.3

compil03% ls /tmp
a.out openmpi-1.3

/tmp/openmpi-1.3/bin/mpirun -d -n 1 -host compil03 /tmp/a.out : -n 1 -host
compil02 /tmp/a.out
WORKS

*Test 2 :*
compil02% mv a.out a.out_64 ; ls /tmp
a.out_64 openmpi-1.3

compil03% ls /tmp
a.out openmpi-1.3

compil03% /tmp/openmpi-1.3/bin/mpirun -d -n 1 -host compil03 /tmp/a.out : -n
1 -host compil02 /tmp/a.out_64
[compil03:03774] procdir: /tmp/openmpi-sessions-gpignot_at_compil03_0/20717/0/0
[compil03:03774] jobdir: /tmp/openmpi-sessions-gpignot_at_compil03_0/20717/0
[compil03:03774] top: openmpi-sessions-gpignot_at_compil03_0
[compil03:03774] tmp: /tmp
[compil03:03774] mpirun: reset PATH:
/tmp/openmpi-1.3/bin:/u/gpignot/jobmgr/bin:.:/cgg/lv5000/jobmgr/bin:/cgg/lv5000/jobmgr/exec/Linux2.6-x86_64/PIV:/cgg/jobmgr/bin:/cgg/jobmgr/exec/Linux2.6-x86_64/PIV:/cgg/lv5000/bin:/cgg/lv5000/exec/Linux2.6-x86_64/PIV:/cgg/util:/bin:/usr/bin:/usr/sbin:/etc:/usr/etc:/usr/local/bin:/usr/bin/X11:/nfs/softs/TOOLS/bin:/nfs/netapp1/DEVTOOLS/bin:/nfs/netapp1/DEVTOOLS/free/Linux2.6-x86_64/bin:/cgg/localdev:/cgg/Applis/bin
[compil03:03774] mpirun: reset LD_LIBRARY_PATH:
/tmp/openmpi-1.3/lib64:/tmp/openmpi-1.3/lib64
[compil02:10684] procdir: /tmp/openmpi-sessions-gpignot_at_compil02_0/20717/0/1
[compil02:10684] jobdir: /tmp/openmpi-sessions-gpignot_at_compil02_0/20717/0
[compil02:10684] top: openmpi-sessions-gpignot_at_compil02_0
[compil02:10684] tmp: /tmp
[compil03:03774] [[20717,0],0] node[0].name compil03 daemon 0 arch ffc91200
[compil03:03774] [[20717,0],0] node[1].name compil02 daemon 1 arch ffc91200
[compil02:10684] [[20717,0],1] node[0].name compil03 daemon 0 arch ffc91200
[compil02:10684] [[20717,0],1] node[1].name compil02 daemon 1 arch ffc91200
[compil03:03774] Info: Setting up debugger process table for applications
  MPIR_being_debugged = 0
  MPIR_debug_state = 1
  MPIR_partial_attach_ok = 1
  MPIR_i_am_starter = 0
  MPIR_proctable_size = 2
  MPIR_proctable:
    (i, host, exe, pid) = (0, compil03, /tmp/a.out, 0)
    (i, host, exe, pid) = (1, compil02, /tmp/a.out_64, 0)

HANGS : both exe have pid 0
*
Test 3 :*

compil02% cp a.out_64 a.out ; ls /tmp
a.out_64 a.out openmpi-1.3

compil03% ls /tmp
a.out openmpi-1.3

[compil03:03777] procdir: /tmp/openmpi-sessions-gpignot_at_compil03_0/20626/0/0
[compil03:03777] jobdir: /tmp/openmpi-sessions-gpignot_at_compil03_0/20626/0
[compil03:03777] top: openmpi-sessions-gpignot_at_compil03_0
[compil03:03777] tmp: /tmp
[compil03:03777] mpirun: reset PATH:
/tmp/openmpi-1.3/bin:/u/gpignot/jobmgr/bin:.:/cgg/lv5000/jobmgr/bin:/cgg/lv5000/jobmgr/exec/Linux2.6-x86_64/PIV:/cgg/jobmgr/bin:/cgg/jobmgr/exec/Linux2.6-x86_64/PIV:/cgg/lv5000/bin:/cgg/lv5000/exec/Linux2.6-x86_64/PIV:/cgg/util:/bin:/usr/bin:/usr/sbin:/etc:/usr/etc:/usr/local/bin:/usr/bin/X11:/nfs/softs/TOOLS/bin:/nfs/netapp1/DEVTOOLS/bin:/nfs/netapp1/DEVTOOLS/free/Linux2.6-x86_64/bin:/cgg/localdev:/cgg/Applis/bin
[compil03:03777] mpirun: reset LD_LIBRARY_PATH:
/tmp/openmpi-1.3/lib64:/tmp/openmpi-1.3/lib64
[compil02:10786] procdir: /tmp/openmpi-sessions-gpignot_at_compil02_0/20626/0/1
[compil02:10786] jobdir: /tmp/openmpi-sessions-gpignot_at_compil02_0/20626/0
[compil02:10786] top: openmpi-sessions-gpignot_at_compil02_0
[compil02:10786] tmp: /tmp
[compil03:03777] [[20626,0],0] node[0].name compil03 daemon 0 arch ffc91200
[compil03:03777] [[20626,0],0] node[1].name compil02 daemon 1 arch ffc91200
[compil02:10786] [[20626,0],1] node[0].name compil03 daemon 0 arch ffc91200
[compil02:10786] [[20626,0],1] node[1].name compil02 daemon 1 arch ffc91200
[compil03:03777] Info: Setting up debugger process table for applications
  MPIR_being_debugged = 0
  MPIR_debug_state = 1
  MPIR_partial_attach_ok = 1
  MPIR_i_am_starter = 0
  MPIR_proctable_size = 2
  MPIR_proctable:
    (i, host, exe, pid) = (0, compil03, /tmp/a.out, 0)
    (i, host, exe, pid) = (1, compil02, /tmp/a.out_64, 10787)
[compil02:10787] procdir: /tmp/openmpi-sessions-gpignot_at_compil02_0/20626/1/1
[compil02:10787] jobdir: /tmp/openmpi-sessions-gpignot_at_compil02_0/20626/1
[compil02:10787] top: openmpi-sessions-gpignot_at_compil02_0
[compil02:10787] tmp: /tmp
[compil02:10787] [[20626,1],1] node[0].name compil03 daemon 0 arch ffc91200
[compil02:10787] [[20626,1],1] node[1].name compil02 daemon 1 arch ffc91200

HANGS : go a little bit further but still one pid = 0

*Test4:*

compil02% ls /tmp
a.out_64 a.out openmpi-1.3

compil03% cp a.out a.out_64 ; ls /tmp
a.out_64 a.out openmpi-1.3

compil03% /tmp/openmpi-1.3/bin/mpirun -d -n 1 -host compil03 /tmp/a.out : -n
1 -host compil02 /tmp/a.out_64
[compil03:03789] procdir: /tmp/openmpi-sessions-gpignot_at_compil03_0/20638/0/0
[compil03:03789] jobdir: /tmp/openmpi-sessions-gpignot_at_compil03_0/20638/0
[compil03:03789] top: openmpi-sessions-gpignot_at_compil03_0
[compil03:03789] tmp: /tmp
[compil03:03789] mpirun: reset PATH:
/tmp/openmpi-1.3/bin:/u/gpignot/jobmgr/bin:.:/cgg/lv5000/jobmgr/bin:/cgg/lv5000/jobmgr/exec/Linux2.6-x86_64/PIV:/cgg/jobmgr/bin:/cgg/jobmgr/exec/Linux2.6-x86_64/PIV:/cgg/lv5000/bin:/cgg/lv5000/exec/Linux2.6-x86_64/PIV:/cgg/util:/bin:/usr/bin:/usr/sbin:/etc:/usr/etc:/usr/local/bin:/usr/bin/X11:/nfs/softs/TOOLS/bin:/nfs/netapp1/DEVTOOLS/bin:/nfs/netapp1/DEVTOOLS/free/Linux2.6-x86_64/bin:/cgg/localdev:/cgg/Applis/bin
[compil03:03789] mpirun: reset LD_LIBRARY_PATH:
/tmp/openmpi-1.3/lib64:/tmp/openmpi-1.3/lib64
[compil02:10937] procdir: /tmp/openmpi-sessions-gpignot_at_compil02_0/20638/0/1
[compil02:10937] jobdir: /tmp/openmpi-sessions-gpignot_at_compil02_0/20638/0
[compil02:10937] top: openmpi-sessions-gpignot_at_compil02_0
[compil02:10937] tmp: /tmp
[compil03:03789] [[20638,0],0] node[0].name compil03 daemon 0 arch ffc91200
[compil03:03789] [[20638,0],0] node[1].name compil02 daemon 1 arch ffc91200
[compil02:10937] [[20638,0],1] node[0].name compil03 daemon 0 arch ffc91200
[compil02:10937] [[20638,0],1] node[1].name compil02 daemon 1 arch ffc91200
[compil03:03789] Info: Setting up debugger process table for applications
  MPIR_being_debugged = 0
  MPIR_debug_state = 1
  MPIR_partial_attach_ok = 1
  MPIR_i_am_starter = 0
  MPIR_proctable_size = 2
  MPIR_proctable:
    (i, host, exe, pid) = (0, compil03, /tmp/a.out, 3792)
    (i, host, exe, pid) = (1, compil02, /tmp/a.out_64, 10938)
[compil03:03792] procdir: /tmp/openmpi-sessions-gpignot_at_compil03_0/20638/1/0
[compil03:03792] jobdir: /tmp/openmpi-sessions-gpignot_at_compil03_0/20638/1
[compil03:03792] top: openmpi-sessions-gpignot_at_compil03_0
[compil03:03792] tmp: /tmp
[compil03:03792] [[20638,1],0] node[0].name compil03 daemon 0 arch ffc91200
[compil03:03792] [[20638,1],0] node[1].name compil02 daemon 1 arch ffc91200
[compil02:10938] procdir: /tmp/openmpi-sessions-gpignot_at_compil02_0/20638/1/1
[compil02:10938] jobdir: /tmp/openmpi-sessions-gpignot_at_compil02_0/20638/1
[compil02:10938] top: openmpi-sessions-gpignot_at_compil02_0
[compil02:10938] tmp: /tmp
[compil02:10938] [[20638,1],1] node[0].name compil03 daemon 0 arch ffc91200
[compil02:10938] [[20638,1],1] node[1].name compil02 daemon 1 arch ffc91200
Hello world from process 0 of 2
Hello world from process 1 of 2
[compil03:03792] sess_dir_finalize: proc session dir not empty - leaving
[compil02:10938] sess_dir_finalize: proc session dir not empty - leaving
[compil03:03789] sess_dir_finalize: proc session dir not empty - leaving
[compil02:10937] sess_dir_finalize: proc session dir not empty - leaving
[compil03:03789] sess_dir_finalize: job session dir not empty - leaving
[compil02:10937] sess_dir_finalize: job session dir not empty - leaving
[compil03:03789] sess_dir_finalize: proc session dir not empty - leaving
orterun: exiting with status 0

WORKS PERFECTLY

I dont understand exactly what is going on , but I am not sure that this
behavoiur is considered as normal

Thanks in advance for your comments

Geoffroy