Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] torque pbs behaviour...
From: Klymak Jody (jklymak_at_[hidden])
Date: 2009-08-10 23:48:56


On 10-Aug-09, at 8:03 PM, Ralph Castain wrote:

> Interesting! Well, I always make sure I have my personal OMPI build
> before any system stuff, and I work exclusively on Mac OS-X:

> Note that I always configure with --prefix=somewhere-in-my-own-dir,
> never to a system directory. Avoids this kind of confusion.

Yeah, I did configure --prefix=/usr/local/openmpi

> What the errors are saying is that we are picking up components from
> a very old version of OMPI that is distributed by Apple. It may or
> may not be causing confusion for the system - hard to tell. However,
> the fact that it is the IO forwarding subsystem that is picking them
> up, and the fact that you aren't seeing any output from your job,
> makes me a tad suspicious.

Me too!

> Can you run other jobs? In other words, do you get stdout/stderr
> from other programs you run, or does every MPI program hang (even
> simple ones)? If it is just your program, then it could just be that
> your application is hanging before any output is generated. Can you
> have it print something to stderr right when it starts?

No simple ones, like the examples I gave before, run fine, just with
the suspicious warnings.

I'm running a big general circulation model (MITgcm). Under normal
conditions it spits something out almost right away, and that is not
being done here. STDOUT.0001 etc are all opened, but nothing is put
into them.

I'm pretty sure I'm compliling the gcm properly:

otool -L mitgcmuv
mitgcmuv:
        /usr/local/openmpi/lib/libmpi_f77.0.dylib (compatibility version
1.0.0, current version 1.0.0)
        /usr/local/openmpi/lib/libmpi.0.dylib (compatibility version 1.0.0,
current version 1.0.0)
        /usr/local/openmpi/lib/libopen-rte.0.dylib (compatibility version
1.0.0, current version 1.0.0)
        /usr/local/openmpi/lib/libopen-pal.0.dylib (compatibility version
1.0.0, current version 1.0.0)
        /usr/lib/libutil.dylib (compatibility version 1.0.0, current version
1.0.0)
        /usr/local/lib/libgfortran.3.dylib (compatibility version 4.0.0,
current version 4.0.0)
        /usr/local/lib/libgcc_s.1.dylib (compatibility version 1.0.0, current
version 1.0.0)
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current
version 111.1.3)

Thanks, Jody

>
> On Aug 10, 2009, at 8:53 PM, Klymak Jody wrote:
>
>>
>> On 10-Aug-09, at 6:44 PM, Ralph Castain wrote:
>>
>>> Check your LD_LIBRARY_PATH - there is an earlier version of OMPI
>>> in your path that is interfering with operation (i.e., it comes
>>> before your 1.3.3 installation).
>>
>> Hmmmm, The OS X faq says not to do this:
>>
>> "Note that there is no need to add Open MPI's libdir to
>> LD_LIBRARY_PATH; Open MPI's shared library build process
>> automatically uses the "rpath" mechanism to automatically find the
>> correct shared libraries (i.e., the ones associated with this
>> build, vs., for example, the OS X-shipped OMPI shared libraries).
>> Also note that we specifically do not recommend adding Open MPI's
>> libdir to DYLD_LIBRARY_PATH."
>>
>> http://www.open-mpi.org/faq/?category=osx
>>
>> Regardless, if I set either, and run ompi_info I still get:
>>
>> [saturna.cluster:94981] mca: base: component_find: iof
>> "mca_iof_proxy" uses an MCA interface that is not recognized
>> (component MCA v1.0.0 != supported MCA v2.0.0) -- ignored
>> [saturna.cluster:94981] mca: base: component_find: iof
>> "mca_iof_svc" uses an MCA interface that is not recognized
>> (component MCA v1.0.0 != supported MCA v2.0.0) -- ignored
>>
>> echo $DYLD_LIBRARY_PATH $LD_LIBRARY_PATH
>> /usr/local/openmpi/lib: /usr/local/openmpi/lib:
>>
>> So I'm afraid I'm stumped again. I suppose I could go clean out
>> all the libraries in /usr/lib/...
>>
>> Thanks again, sorry to be a pain...
>>
>> Cheers, Jody
>>
>>
>>
>>
>>>
>>> On Aug 10, 2009, at 7:38 PM, Klymak Jody wrote:
>>>
>>>> So,
>>>>
>>>> mpirun --display-allocation -pernode --display-map hostname
>>>>
>>>> gives me the output below. Simple jobs seem to run, but the
>>>> MITgcm does not, either under ssh or torque. It hangs at some
>>>> early point in execution before anything is written, so its hard
>>>> for me to tell what the error is. Could these MCA warnings have
>>>> anything to do with it?
>>>>
>>>> I've recompiled the gcm with -L /usr/local/openmpi/lib, so
>>>> hopefully that catches the right library.
>>>>
>>>> Thanks, Jody
>>>>
>>>>
>>>> [xserve02.local:38126] mca: base: component_find: ras
>>>> "mca_ras_dash_host" uses an MCA interface that is not recogniz
>>>> ed (component MCA v1.0.0 != supported MCA v2.0.0) -- ignored
>>>> [xserve02.local:38126] mca: base: component_find: ras
>>>> "mca_ras_hostfile" uses an MCA interface that is not recognize
>>>> d (component MCA v1.0.0 != supported MCA v2.0.0) -- ignored
>>>> [xserve02.local:38126] mca: base: component_find: ras
>>>> "mca_ras_localhost" uses an MCA interface that is not recogniz
>>>> ed (component MCA v1.0.0 != supported MCA v2.0.0) -- ignored
>>>> [xserve02.local:38126] mca: base: component_find: ras
>>>> "mca_ras_xgrid" uses an MCA interface that is not recognized (
>>>> component MCA v1.0.0 != supported MCA v2.0.0) -- ignored
>>>> [xserve02.local:38126] mca: base: component_find: iof
>>>> "mca_iof_proxy" uses an MCA interface that is not recognized (
>>>> component MCA v1.0.0 != supported MCA v2.0.0) -- ignored
>>>> [xserve02.local:38126] mca: base: component_find: iof
>>>> "mca_iof_svc" uses an MCA interface that is not recognized (co
>>>> mponent MCA v1.0.0 != supported MCA v2.0.0) -- ignored
>>>>
>>>> ====================== ALLOCATED NODES ======================
>>>>
>>>> Data for node: Name: xserve02.local Num slots: 8 Max slots: 0
>>>> Data for node: Name: xserve01.local Num slots: 8 Max slots: 0
>>>>
>>>> =================================================================
>>>>
>>>> ======================== JOB MAP ========================
>>>>
>>>> Data for node: Name: xserve02.local Num procs: 1
>>>> Process OMPI jobid: [20967,1] Process rank: 0
>>>>
>>>> Data for node: Name: xserve01.local Num procs: 1
>>>> Process OMPI jobid: [20967,1] Process rank: 1
>>>>
>>>> =============================================================
>>>> [xserve01.cluster:38518] mca: base: component_find: iof
>>>> "mca_iof_proxy" uses an MCA interface that is not recognized
>>>> (component MCA v1.0.0 != supported MCA v2.0.0) -- ignored
>>>> [xserve01.cluster:38518] mca: base: component_find: iof
>>>> "mca_iof_svc" uses an MCA interface that is not recognized (
>>>> component MCA v1.0.0 != supported MCA v2.0.0) -- ignored
>>>> xserve02.local
>>>> xserve01.cluster
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users