HI Geoffrey

Hmmm....well, I redid my tests to mirror yours, and still cannot replicate this problem. I tried it with both slurm and ssh environments - no difference in the results.

% make hello

% cp hello hello2

% ls
hello hello2

% mpirun -n 1 -host odin038 ./hello : -n 1 -host odin039 ./hello2
Hello World, I am 0 of 2
Hello World, I am 1 of 2

I have tried a variety of combinations, including giving a fake executable as one of the apps, and have not been able to replicate your observed behavior. In all cases, it works correctly.

It looks like you are using rsh/ssh as you launch environment. All I can advise at this stage is to again check to ensure that the .login/.cshrc (or whatever) on your remote nodes isn't setting your path to point at another OMPI installation. The fact that you can run at all would seem to indicate that things are okay, but I honestly have no ideas at this stage as to why you are seeing this behavior.

Sorry I can't be of more help...
Ralph

On Jan 23, 2009, at 12:57 AM, Geoffroy Pignot wrote:

Hello

I redid few tests with my hello world , here are my results.

First of all my config :
configure --prefix=/tmp/openmpi-1.3 --libdir=/tmp/openmpi-1.3/lib64 --enable-heterogeneous . you will find attached my ompi_info -param all all
compil02 and compil03 are identical Rh43 64 bits nodes.

Test 1 :
compil02% ls /tmp 
a.out  openmpi-1.3

compil03% ls /tmp
a.out  openmpi-1.3

/tmp/openmpi-1.3/bin/mpirun -d -n 1 -host compil03 /tmp/a.out : -n 1 -host compil02 /tmp/a.out
WORKS

Test 2 :
compil02% mv a.out a.out_64 ; ls /tmp 
a.out_64  openmpi-1.3

compil03% ls /tmp
a.out  openmpi-1.3

compil03% /tmp/openmpi-1.3/bin/mpirun -d -n 1 -host compil03 /tmp/a.out : -n 1 -host compil02 /tmp/a.out_64
[compil03:03774] procdir: /tmp/openmpi-sessions-gpignot@compil03_0/20717/0/0
[compil03:03774] jobdir: /tmp/openmpi-sessions-gpignot@compil03_0/20717/0
[compil03:03774] top: openmpi-sessions-gpignot@compil03_0
[compil03:03774] tmp: /tmp
[compil03:03774] mpirun: reset PATH: /tmp/openmpi-1.3/bin:/u/gpignot/jobmgr/bin:.:/cgg/lv5000/jobmgr/bin:/cgg/lv5000/jobmgr/exec/Linux2.6-x86_64/PIV:/cgg/jobmgr/bin:/cgg/jobmgr/exec/Linux2.6-x86_64/PIV:/cgg/lv5000/bin:/cgg/lv5000/exec/Linux2.6-x86_64/PIV:/cgg/util:/bin:/usr/bin:/usr/sbin:/etc:/usr/etc:/usr/local/bin:/usr/bin/X11:/nfs/softs/TOOLS/bin:/nfs/netapp1/DEVTOOLS/bin:/nfs/netapp1/DEVTOOLS/free/Linux2.6-x86_64/bin:/cgg/localdev:/cgg/Applis/bin
[compil03:03774] mpirun: reset LD_LIBRARY_PATH: /tmp/openmpi-1.3/lib64:/tmp/openmpi-1.3/lib64
[compil02:10684] procdir: /tmp/openmpi-sessions-gpignot@compil02_0/20717/0/1
[compil02:10684] jobdir: /tmp/openmpi-sessions-gpignot@compil02_0/20717/0
[compil02:10684] top: openmpi-sessions-gpignot@compil02_0
[compil02:10684] tmp: /tmp
[compil03:03774] [[20717,0],0] node[0].name compil03 daemon 0 arch ffc91200
[compil03:03774] [[20717,0],0] node[1].name compil02 daemon 1 arch ffc91200
[compil02:10684] [[20717,0],1] node[0].name compil03 daemon 0 arch ffc91200
[compil02:10684] [[20717,0],1] node[1].name compil02 daemon 1 arch ffc91200
[compil03:03774] Info: Setting up debugger process table for applications
  MPIR_being_debugged = 0
  MPIR_debug_state = 1
  MPIR_partial_attach_ok = 1
  MPIR_i_am_starter = 0
  MPIR_proctable_size = 2
  MPIR_proctable:
    (i, host, exe, pid) = (0, compil03, /tmp/a.out, 0)
    (i, host, exe, pid) = (1, compil02, /tmp/a.out_64, 0)

HANGS : both exe have pid 0

Test 3 :


compil02% cp a.out_64 a.out ; ls /tmp 
a.out_64  a.out  openmpi-1.3

compil03% ls /tmp
a.out  openmpi-1.3

[compil03:03777] procdir: /tmp/openmpi-sessions-gpignot@compil03_0/20626/0/0
[compil03:03777] jobdir: /tmp/openmpi-sessions-gpignot@compil03_0/20626/0
[compil03:03777] top: openmpi-sessions-gpignot@compil03_0
[compil03:03777] tmp: /tmp
[compil03:03777] mpirun: reset PATH: /tmp/openmpi-1.3/bin:/u/gpignot/jobmgr/bin:.:/cgg/lv5000/jobmgr/bin:/cgg/lv5000/jobmgr/exec/Linux2.6-x86_64/PIV:/cgg/jobmgr/bin:/cgg/jobmgr/exec/Linux2.6-x86_64/PIV:/cgg/lv5000/bin:/cgg/lv5000/exec/Linux2.6-x86_64/PIV:/cgg/util:/bin:/usr/bin:/usr/sbin:/etc:/usr/etc:/usr/local/bin:/usr/bin/X11:/nfs/softs/TOOLS/bin:/nfs/netapp1/DEVTOOLS/bin:/nfs/netapp1/DEVTOOLS/free/Linux2.6-x86_64/bin:/cgg/localdev:/cgg/Applis/bin
[compil03:03777] mpirun: reset LD_LIBRARY_PATH: /tmp/openmpi-1.3/lib64:/tmp/openmpi-1.3/lib64
[compil02:10786] procdir: /tmp/openmpi-sessions-gpignot@compil02_0/20626/0/1
[compil02:10786] jobdir: /tmp/openmpi-sessions-gpignot@compil02_0/20626/0
[compil02:10786] top: openmpi-sessions-gpignot@compil02_0
[compil02:10786] tmp: /tmp
[compil03:03777] [[20626,0],0] node[0].name compil03 daemon 0 arch ffc91200
[compil03:03777] [[20626,0],0] node[1].name compil02 daemon 1 arch ffc91200
[compil02:10786] [[20626,0],1] node[0].name compil03 daemon 0 arch ffc91200
[compil02:10786] [[20626,0],1] node[1].name compil02 daemon 1 arch ffc91200
[compil03:03777] Info: Setting up debugger process table for applications
  MPIR_being_debugged = 0
  MPIR_debug_state = 1
  MPIR_partial_attach_ok = 1
  MPIR_i_am_starter = 0
  MPIR_proctable_size = 2
  MPIR_proctable:
    (i, host, exe, pid) = (0, compil03, /tmp/a.out, 0)
    (i, host, exe, pid) = (1, compil02, /tmp/a.out_64, 10787)
[compil02:10787] procdir: /tmp/openmpi-sessions-gpignot@compil02_0/20626/1/1
[compil02:10787] jobdir: /tmp/openmpi-sessions-gpignot@compil02_0/20626/1
[compil02:10787] top: openmpi-sessions-gpignot@compil02_0
[compil02:10787] tmp: /tmp
[compil02:10787] [[20626,1],1] node[0].name compil03 daemon 0 arch ffc91200
[compil02:10787] [[20626,1],1] node[1].name compil02 daemon 1 arch ffc91200

HANGS : go a little bit further but still one pid = 0

Test4:

compil02% ls /tmp 
a.out_64  a.out  openmpi-1.3

compil03% cp a.out a.out_64 ; ls /tmp
a.out_64  a.out  openmpi-1.3

compil03% /tmp/openmpi-1.3/bin/mpirun -d -n 1 -host compil03 /tmp/a.out : -n 1 -host compil02 /tmp/a.out_64
[compil03:03789] procdir: /tmp/openmpi-sessions-gpignot@compil03_0/20638/0/0
[compil03:03789] jobdir: /tmp/openmpi-sessions-gpignot@compil03_0/20638/0
[compil03:03789] top: openmpi-sessions-gpignot@compil03_0
[compil03:03789] tmp: /tmp
[compil03:03789] mpirun: reset PATH: /tmp/openmpi-1.3/bin:/u/gpignot/jobmgr/bin:.:/cgg/lv5000/jobmgr/bin:/cgg/lv5000/jobmgr/exec/Linux2.6-x86_64/PIV:/cgg/jobmgr/bin:/cgg/jobmgr/exec/Linux2.6-x86_64/PIV:/cgg/lv5000/bin:/cgg/lv5000/exec/Linux2.6-x86_64/PIV:/cgg/util:/bin:/usr/bin:/usr/sbin:/etc:/usr/etc:/usr/local/bin:/usr/bin/X11:/nfs/softs/TOOLS/bin:/nfs/netapp1/DEVTOOLS/bin:/nfs/netapp1/DEVTOOLS/free/Linux2.6-x86_64/bin:/cgg/localdev:/cgg/Applis/bin
[compil03:03789] mpirun: reset LD_LIBRARY_PATH: /tmp/openmpi-1.3/lib64:/tmp/openmpi-1.3/lib64
[compil02:10937] procdir: /tmp/openmpi-sessions-gpignot@compil02_0/20638/0/1
[compil02:10937] jobdir: /tmp/openmpi-sessions-gpignot@compil02_0/20638/0
[compil02:10937] top: openmpi-sessions-gpignot@compil02_0
[compil02:10937] tmp: /tmp
[compil03:03789] [[20638,0],0] node[0].name compil03 daemon 0 arch ffc91200
[compil03:03789] [[20638,0],0] node[1].name compil02 daemon 1 arch ffc91200
[compil02:10937] [[20638,0],1] node[0].name compil03 daemon 0 arch ffc91200
[compil02:10937] [[20638,0],1] node[1].name compil02 daemon 1 arch ffc91200
[compil03:03789] Info: Setting up debugger process table for applications
  MPIR_being_debugged = 0
  MPIR_debug_state = 1
  MPIR_partial_attach_ok = 1
  MPIR_i_am_starter = 0
  MPIR_proctable_size = 2
  MPIR_proctable:
    (i, host, exe, pid) = (0, compil03, /tmp/a.out, 3792)
    (i, host, exe, pid) = (1, compil02, /tmp/a.out_64, 10938)
[compil03:03792] procdir: /tmp/openmpi-sessions-gpignot@compil03_0/20638/1/0
[compil03:03792] jobdir: /tmp/openmpi-sessions-gpignot@compil03_0/20638/1
[compil03:03792] top: openmpi-sessions-gpignot@compil03_0
[compil03:03792] tmp: /tmp
[compil03:03792] [[20638,1],0] node[0].name compil03 daemon 0 arch ffc91200
[compil03:03792] [[20638,1],0] node[1].name compil02 daemon 1 arch ffc91200
[compil02:10938] procdir: /tmp/openmpi-sessions-gpignot@compil02_0/20638/1/1
[compil02:10938] jobdir: /tmp/openmpi-sessions-gpignot@compil02_0/20638/1
[compil02:10938] top: openmpi-sessions-gpignot@compil02_0
[compil02:10938] tmp: /tmp
[compil02:10938] [[20638,1],1] node[0].name compil03 daemon 0 arch ffc91200
[compil02:10938] [[20638,1],1] node[1].name compil02 daemon 1 arch ffc91200
Hello world from process 0 of 2
Hello world from process 1 of 2
[compil03:03792] sess_dir_finalize: proc session dir not empty - leaving
[compil02:10938] sess_dir_finalize: proc session dir not empty - leaving
[compil03:03789] sess_dir_finalize: proc session dir not empty - leaving
[compil02:10937] sess_dir_finalize: proc session dir not empty - leaving
[compil03:03789] sess_dir_finalize: job session dir not empty - leaving
[compil02:10937] sess_dir_finalize: job session dir not empty - leaving
[compil03:03789] sess_dir_finalize: proc session dir not empty - leaving
orterun: exiting with status 0

WORKS PERFECTLY


I dont understand exactly what is going on , but I am not sure that this behavoiur is considered as normal

Thanks in advance for your comments

Geoffroy



<geoffroy_ompi_info>_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users