Oops - I should have looked at your output more closely. The component_find warnings are clearly indicating some old libs laying around, but that isn't why your job is hanging.
The reason your job is hanging is sitting in the orte-ps output. You have multiple processes declaring themselves to be the same MPI rank. That definitely won't work.
The question is why is that happening? We use Torque all the time, so we know that the basic support is correct. It -could- be related to lib confusion, but I can't tell for sure.
Can you rebuild OMPI with --enable-debug, and rerun the job with the following added to your cmd line?
-mca plm_base_verbose 5 --debug-daemons -mca odls_base_verbose 5
I'm afraid the output will be a tad verbose, but I would appreciate seeing it. Might also tell us something about the lib issue.
Sorry, but Jeff is correct - that error message clearly indicates a version mismatch. Somewhere, one or more of your nodes is still picking up an old version.On Tue, Aug 11, 2009 at 7:16 AM, Jeff Squyres <firstname.lastname@example.org> wrote:
On Aug 11, 2009, at 9:11 AM, Klymak Jody wrote:This means that OMPI is finding an mca_iof_proxy.la file at run time from a prior version of Open MPI. You might want to use "find" or "locate" to search your nodes and find it. I suspect that you somehow have an OMPI 1.3.x install that overlaid an install of a prior OMPI version installation.
I have removed all the OS-X -supplied libraries, recompiled and
installed openmpi 1.3.3, and I am *still* getting this warning when
[saturna.cluster:50307] mca: base: component_find: iof "mca_iof_proxy"
uses an MCA interface that is not recognized (component MCA v1.0.0 !=
supported MCA v2.0.0) -- ignored
users mailing list