Oops - I should have looked at your output more closely. The component_find
warnings are clearly indicating some old libs laying around, but that isn't
why your job is hanging.
The reason your job is hanging is sitting in the orte-ps output. You have
multiple processes declaring themselves to be the same MPI rank. That
definitely won't work.
The question is why is that happening? We use Torque all the time, so we
know that the basic support is correct. It -could- be related to lib
confusion, but I can't tell for sure.
Can you rebuild OMPI with --enable-debug, and rerun the job with the
following added to your cmd line?
-mca plm_base_verbose 5 --debug-daemons -mca odls_base_verbose 5
I'm afraid the output will be a tad verbose, but I would appreciate seeing
it. Might also tell us something about the lib issue.
On Tue, Aug 11, 2009 at 7:22 AM, Ralph Castain <rhc_at_[hidden]> wrote:
> Sorry, but Jeff is correct - that error message clearly indicates a version
> mismatch. Somewhere, one or more of your nodes is still picking up an old
> On Tue, Aug 11, 2009 at 7:16 AM, Jeff Squyres <jsquyres_at_[hidden]> wrote:
>> On Aug 11, 2009, at 9:11 AM, Klymak Jody wrote:
>> I have removed all the OS-X -supplied libraries, recompiled and
>>> installed openmpi 1.3.3, and I am *still* getting this warning when
>>> running ompi_info:
>>> [saturna.cluster:50307] mca: base: component_find: iof "mca_iof_proxy"
>>> uses an MCA interface that is not recognized (component MCA v1.0.0 !=
>>> supported MCA v2.0.0) -- ignored
>> This means that OMPI is finding an mca_iof_proxy.la file at run time from
>> a prior version of Open MPI. You might want to use "find" or "locate" to
>> search your nodes and find it. I suspect that you somehow have an OMPI
>> 1.3.x install that overlaid an install of a prior OMPI version installation.
>> Jeff Squyres
>> users mailing list