Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] 1.3 hangs running 2 exes with different names (Ralph Castain)
From: Geoffroy Pignot (geopignot_at_[hidden])
Date: 2009-01-23 12:45:01


Hi Ralph,

Thanks for taking time to look into my problem. As you can see , it happens
when i dont have both exe available on both nodes.
When it's the case (test3) , it works. I dont know if my particular libdir
causes the problem or not but I 'll try on Monday with a more classical
setup.

I ll keep you inform.

Geoffroy

>
> HI Geoffrey
>
> Hmmm....well, I redid my tests to mirror yours, and still cannot
> replicate this problem. I tried it with both slurm and ssh
> environments - no difference in the results.
>
> % make hello
>
> % cp hello hello2
>
> % ls
> hello hello2
>
> % mpirun -n 1 -host odin038 ./hello : -n 1 -host odin039 ./hello2
> Hello World, I am 0 of 2
> Hello World, I am 1 of 2
>
> I have tried a variety of combinations, including giving a fake
> executable as one of the apps, and have not been able to replicate
> your observed behavior. In all cases, it works correctly.
>
> It looks like you are using rsh/ssh as you launch environment. All I
> can advise at this stage is to again check to ensure that
> the .login/.cshrc (or whatever) on your remote nodes isn't setting
> your path to point at another OMPI installation. The fact that you can
> run at all would seem to indicate that things are okay, but I honestly
> have no ideas at this stage as to why you are seeing this behavior.
>
> Sorry I can't be of more help...
> Ralph
>
> On Jan 23, 2009, at 12:57 AM, Geoffroy Pignot wrote:
>
> > Hello
> >
> > I redid few tests with my hello world , here are my results.
> >
> > First of all my config :
> > configure --prefix=/tmp/openmpi-1.3 --libdir=/tmp/openmpi-1.3/lib64
> > --enable-heterogeneous . you will find attached my ompi_info -param
> > all all
> > compil02 and compil03 are identical Rh43 64 bits nodes.
> >
> > Test 1 :
> > compil02% ls /tmp
> > a.out openmpi-1.3
> >
> > compil03% ls /tmp
> > a.out openmpi-1.3
> >
> > /tmp/openmpi-1.3/bin/mpirun -d -n 1 -host compil03 /tmp/a.out : -n 1
> > -host compil02 /tmp/a.out
> > WORKS
> >
> > Test 2 :
> > compil02% mv a.out a.out_64 ; ls /tmp
> > a.out_64 openmpi-1.3
> >
> > compil03% ls /tmp
> > a.out openmpi-1.3
> >
> > compil03% /tmp/openmpi-1.3/bin/mpirun -d -n 1 -host compil03 /tmp/
> > a.out : -n 1 -host compil02 /tmp/a.out_64
> > [compil03:03774] procdir: /tmp/openmpi-sessions-
> > gpignot_at_compil03_0/20717/0/0
> > [compil03:03774] jobdir: /tmp/openmpi-sessions-
> > gpignot_at_compil03_0/20717/0
> > [compil03:03774] top: openmpi-sessions-gpignot_at_compil03_0
> > [compil03:03774] tmp: /tmp
> > [compil03:03774] mpirun: reset PATH: /tmp/openmpi-1.3/bin:/u/gpignot/
> > jobmgr/bin:.:/cgg/lv5000/jobmgr/bin:/cgg/lv5000/jobmgr/exec/Linux2.6-
> > x86_64/PIV:/cgg/jobmgr/bin:/cgg/jobmgr/exec/Linux2.6-x86_64/PIV:/cgg/
> > lv5000/bin:/cgg/lv5000/exec/Linux2.6-x86_64/PIV:/cgg/util:/bin:/usr/
> > bin:/usr/sbin:/etc:/usr/etc:/usr/local/bin:/usr/bin/X11:/nfs/softs/
> > TOOLS/bin:/nfs/netapp1/DEVTOOLS/bin:/nfs/netapp1/DEVTOOLS/free/
> > Linux2.6-x86_64/bin:/cgg/localdev:/cgg/Applis/bin
> > [compil03:03774] mpirun: reset LD_LIBRARY_PATH: /tmp/openmpi-1.3/
> > lib64:/tmp/openmpi-1.3/lib64
> > [compil02:10684] procdir: /tmp/openmpi-sessions-
> > gpignot_at_compil02_0/20717/0/1
> > [compil02:10684] jobdir: /tmp/openmpi-sessions-
> > gpignot_at_compil02_0/20717/0
> > [compil02:10684] top: openmpi-sessions-gpignot_at_compil02_0
> > [compil02:10684] tmp: /tmp
> > [compil03:03774] [[20717,0],0] node[0].name compil03 daemon 0 arch
> > ffc91200
> > [compil03:03774] [[20717,0],0] node[1].name compil02 daemon 1 arch
> > ffc91200
> > [compil02:10684] [[20717,0],1] node[0].name compil03 daemon 0 arch
> > ffc91200
> > [compil02:10684] [[20717,0],1] node[1].name compil02 daemon 1 arch
> > ffc91200
> > [compil03:03774] Info: Setting up debugger process table for
> > applications
> > MPIR_being_debugged = 0
> > MPIR_debug_state = 1
> > MPIR_partial_attach_ok = 1
> > MPIR_i_am_starter = 0
> > MPIR_proctable_size = 2
> > MPIR_proctable:
> > (i, host, exe, pid) = (0, compil03, /tmp/a.out, 0)
> > (i, host, exe, pid) = (1, compil02, /tmp/a.out_64, 0)
> >
> > HANGS : both exe have pid 0
> >
> > Test 3 :
> >
> > compil02% cp a.out_64 a.out ; ls /tmp
> > a.out_64 a.out openmpi-1.3
> >
> > compil03% ls /tmp
> > a.out openmpi-1.3
> >
> > [compil03:03777] procdir: /tmp/openmpi-sessions-
> > gpignot_at_compil03_0/20626/0/0
> > [compil03:03777] jobdir: /tmp/openmpi-sessions-
> > gpignot_at_compil03_0/20626/0
> > [compil03:03777] top: openmpi-sessions-gpignot_at_compil03_0
> > [compil03:03777] tmp: /tmp
> > [compil03:03777] mpirun: reset PATH: /tmp/openmpi-1.3/bin:/u/gpignot/
> > jobmgr/bin:.:/cgg/lv5000/jobmgr/bin:/cgg/lv5000/jobmgr/exec/Linux2.6-
> > x86_64/PIV:/cgg/jobmgr/bin:/cgg/jobmgr/exec/Linux2.6-x86_64/PIV:/cgg/
> > lv5000/bin:/cgg/lv5000/exec/Linux2.6-x86_64/PIV:/cgg/util:/bin:/usr/
> > bin:/usr/sbin:/etc:/usr/etc:/usr/local/bin:/usr/bin/X11:/nfs/softs/
> > TOOLS/bin:/nfs/netapp1/DEVTOOLS/bin:/nfs/netapp1/DEVTOOLS/free/
> > Linux2.6-x86_64/bin:/cgg/localdev:/cgg/Applis/bin
> > [compil03:03777] mpirun: reset LD_LIBRARY_PATH: /tmp/openmpi-1.3/
> > lib64:/tmp/openmpi-1.3/lib64
> > [compil02:10786] procdir: /tmp/openmpi-sessions-
> > gpignot_at_compil02_0/20626/0/1
> > [compil02:10786] jobdir: /tmp/openmpi-sessions-
> > gpignot_at_compil02_0/20626/0
> > [compil02:10786] top: openmpi-sessions-gpignot_at_compil02_0
> > [compil02:10786] tmp: /tmp
> > [compil03:03777] [[20626,0],0] node[0].name compil03 daemon 0 arch
> > ffc91200
> > [compil03:03777] [[20626,0],0] node[1].name compil02 daemon 1 arch
> > ffc91200
> > [compil02:10786] [[20626,0],1] node[0].name compil03 daemon 0 arch
> > ffc91200
> > [compil02:10786] [[20626,0],1] node[1].name compil02 daemon 1 arch
> > ffc91200
> > [compil03:03777] Info: Setting up debugger process table for
> > applications
> > MPIR_being_debugged = 0
> > MPIR_debug_state = 1
> > MPIR_partial_attach_ok = 1
> > MPIR_i_am_starter = 0
> > MPIR_proctable_size = 2
> > MPIR_proctable:
> > (i, host, exe, pid) = (0, compil03, /tmp/a.out, 0)
> > (i, host, exe, pid) = (1, compil02, /tmp/a.out_64, 10787)
> > [compil02:10787] procdir: /tmp/openmpi-sessions-
> > gpignot_at_compil02_0/20626/1/1
> > [compil02:10787] jobdir: /tmp/openmpi-sessions-
> > gpignot_at_compil02_0/20626/1
> > [compil02:10787] top: openmpi-sessions-gpignot_at_compil02_0
> > [compil02:10787] tmp: /tmp
> > [compil02:10787] [[20626,1],1] node[0].name compil03 daemon 0 arch
> > ffc91200
> > [compil02:10787] [[20626,1],1] node[1].name compil02 daemon 1 arch
> > ffc91200
> >
> > HANGS : go a little bit further but still one pid = 0
> >
> > Test4:
> >
> > compil02% ls /tmp
> > a.out_64 a.out openmpi-1.3
> >
> > compil03% cp a.out a.out_64 ; ls /tmp
> > a.out_64 a.out openmpi-1.3
> >
> > compil03% /tmp/openmpi-1.3/bin/mpirun -d -n 1 -host compil03 /tmp/
> > a.out : -n 1 -host compil02 /tmp/a.out_64
> > [compil03:03789] procdir: /tmp/openmpi-sessions-
> > gpignot_at_compil03_0/20638/0/0
> > [compil03:03789] jobdir: /tmp/openmpi-sessions-
> > gpignot_at_compil03_0/20638/0
> > [compil03:03789] top: openmpi-sessions-gpignot_at_compil03_0
> > [compil03:03789] tmp: /tmp
> > [compil03:03789] mpirun: reset PATH: /tmp/openmpi-1.3/bin:/u/gpignot/
> > jobmgr/bin:.:/cgg/lv5000/jobmgr/bin:/cgg/lv5000/jobmgr/exec/Linux2.6-
> > x86_64/PIV:/cgg/jobmgr/bin:/cgg/jobmgr/exec/Linux2.6-x86_64/PIV:/cgg/
> > lv5000/bin:/cgg/lv5000/exec/Linux2.6-x86_64/PIV:/cgg/util:/bin:/usr/
> > bin:/usr/sbin:/etc:/usr/etc:/usr/local/bin:/usr/bin/X11:/nfs/softs/
> > TOOLS/bin:/nfs/netapp1/DEVTOOLS/bin:/nfs/netapp1/DEVTOOLS/free/
> > Linux2.6-x86_64/bin:/cgg/localdev:/cgg/Applis/bin
> > [compil03:03789] mpirun: reset LD_LIBRARY_PATH: /tmp/openmpi-1.3/
> > lib64:/tmp/openmpi-1.3/lib64
> > [compil02:10937] procdir: /tmp/openmpi-sessions-
> > gpignot_at_compil02_0/20638/0/1
> > [compil02:10937] jobdir: /tmp/openmpi-sessions-
> > gpignot_at_compil02_0/20638/0
> > [compil02:10937] top: openmpi-sessions-gpignot_at_compil02_0
> > [compil02:10937] tmp: /tmp
> > [compil03:03789] [[20638,0],0] node[0].name compil03 daemon 0 arch
> > ffc91200
> > [compil03:03789] [[20638,0],0] node[1].name compil02 daemon 1 arch
> > ffc91200
> > [compil02:10937] [[20638,0],1] node[0].name compil03 daemon 0 arch
> > ffc91200
> > [compil02:10937] [[20638,0],1] node[1].name compil02 daemon 1 arch
> > ffc91200
> > [compil03:03789] Info: Setting up debugger process table for
> > applications
> > MPIR_being_debugged = 0
> > MPIR_debug_state = 1
> > MPIR_partial_attach_ok = 1
> > MPIR_i_am_starter = 0
> > MPIR_proctable_size = 2
> > MPIR_proctable:
> > (i, host, exe, pid) = (0, compil03, /tmp/a.out, 3792)
> > (i, host, exe, pid) = (1, compil02, /tmp/a.out_64, 10938)
> > [compil03:03792] procdir: /tmp/openmpi-sessions-
> > gpignot_at_compil03_0/20638/1/0
> > [compil03:03792] jobdir: /tmp/openmpi-sessions-
> > gpignot_at_compil03_0/20638/1
> > [compil03:03792] top: openmpi-sessions-gpignot_at_compil03_0
> > [compil03:03792] tmp: /tmp
> > [compil03:03792] [[20638,1],0] node[0].name compil03 daemon 0 arch
> > ffc91200
> > [compil03:03792] [[20638,1],0] node[1].name compil02 daemon 1 arch
> > ffc91200
> > [compil02:10938] procdir: /tmp/openmpi-sessions-
> > gpignot_at_compil02_0/20638/1/1
> > [compil02:10938] jobdir: /tmp/openmpi-sessions-
> > gpignot_at_compil02_0/20638/1
> > [compil02:10938] top: openmpi-sessions-gpignot_at_compil02_0
> > [compil02:10938] tmp: /tmp
> > [compil02:10938] [[20638,1],1] node[0].name compil03 daemon 0 arch
> > ffc91200
> > [compil02:10938] [[20638,1],1] node[1].name compil02 daemon 1 arch
> > ffc91200
> > Hello world from process 0 of 2
> > Hello world from process 1 of 2
> > [compil03:03792] sess_dir_finalize: proc session dir not empty -
> > leaving
> > [compil02:10938] sess_dir_finalize: proc session dir not empty -
> > leaving
> > [compil03:03789] sess_dir_finalize: proc session dir not empty -
> > leaving
> > [compil02:10937] sess_dir_finalize: proc session dir not empty -
> > leaving
> > [compil03:03789] sess_dir_finalize: job session dir not empty -
> > leaving
> > [compil02:10937] sess_dir_finalize: job session dir not empty -
> > leaving
> > [compil03:03789] sess_dir_finalize: proc session dir not empty -
> > leaving
> > orterun: exiting with status 0
> >
> > WORKS PERFECTLY
> >
> >
> > I dont understand exactly what is going on , but I am not sure that
> > this behavoiur is considered as normal
> >
> > Thanks in advance for your comments
> >
> > Geoffroy
> >
> >
> >
> > <geoffroy_ompi_info>_______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> End of users Digest, Vol 1127, Issue 8
> **************************************
>