Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OpenMPI 1.3RC2 job startup issue
From: Aurélien Bouteiller (bouteill_at_[hidden])
Date: 2008-12-23 02:25:34


To make sure you don't use any "leftover" from another system install
when upgrading, you should specify --enable-prefix-by-default when
configuring the source tree for compilation. This will always select
the binaries and libs that are part of the mpirun you are using.

Aurelien

Le 22 déc. 08 à 18:17, Ralph Castain a écrit :

> Your backend nodes are mistakenly picking up the OMPI 1.2 orted
> binary instead of the 1.3 orted. The two are not compatible.
>
> Check your LD_LIBRARY_PATH and PATH on the backend nodes and ensure
> they are pointing at the 1.3 installation. There are other ways as
> well of pointing to the correct installation - check the OMPI FAQ
> pages to find alternatives if this doesn't work for you.
>
> Ralph
>
>
> On Dec 22, 2008, at 2:58 PM, Ray Muno wrote:
>
>> We have been happily running under OpenMPI 1.2 on our cluster
>> unitil recently. It is 2200 processors (8 way Opteron) , Qlogic IB
>> connected.
>>
>> We have had issues starting larger jobs (600+ processors). There
>> seemed to be some indication that OpenMPI may solve our problems.
>>
>> It built with no problem and installed. Users can compile programs.
>>
>> When they tried to run, they got the attached output. Are we
>> missing something obvious?
>>
>> This is a Rocks cluster with jobs scheduled through SGE.
>>
>> =====================================================
>> $ mpirun -np 1024 program
>>
>> [compute-2-6.local:32580] Error: unknown option "--daemonize"
>> Usage: orted [OPTION]...
>> Start an Open RTE Daemon
>>
>> --bootproxy <arg0> Run as boot proxy for <job-id>
>> -d|--debug Debug the OpenRTE
>> -d|--spin Have the orted spin until we can connect a
>> debugger
>> to it
>> --debug-daemons Enable debugging of OpenRTE daemons
>> --debug-daemons-file Enable debugging of OpenRTE daemons, storing
>> output
>> in files
>> --gprreplica <arg0> Registry contact information.
>> -h|--help This help message
>> --mpi-call-yield <arg0>
>> Have MPI (or similar) applications call
>> yield when
>> idle
>> --name <arg0> Set the orte process name
>> --no-daemonize Don't daemonize into the background
>> --nodename <arg0> Node name as specified by host/resource
>> description.
>> --ns-nds <arg0> set sds/nds component to use for daemon
>> (normally
>> not needed)
>> --nsreplica <arg0> Name service contact information.
>> --num_procs <arg0> Set the number of process in this job
>> --persistent Remain alive after the application process
>> completes
>> --report-uri <arg0> Report this process' uri on indicated pipe
>> --scope <arg0> Set restrictions on who can connect to this
>> universe
>> --seed Host replicas for the core universe services
>> --set-sid Direct the orted to separate from the current
>> session
>> --tmpdir <arg0> Set the root for the session directory tree
>> --universe <arg0> Set the universe name as
>> username_at_hostname:universe_name for this
>> application
>> --vpid_start <arg0> Set the starting vpid for this job
>> --------------------------------------------------------------------------
>> A daemon (pid 4151) died unexpectedly with status 251 while
>> attempting
>> to launch so we are aborting.
>>
>> There may be more information reported by the environment (see
>> above).
>>
>> This may be because the daemon was unable to find all the needed
>> shared
>> libraries on the remote node. You may set your LD_LIBRARY_PATH to
>> have the
>> location of the shared libraries on the remote nodes and this will
>> automatically be forwarded to the remote nodes.
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> mpirun noticed that the job aborted, but has no info as to the
>> process
>> that caused that situation.
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> mpirun was unable to cleanly terminate the daemons on the nodes shown
>> below. Additional manual cleanup may be required - please refer to
>> the "orte-clean" tool for assistance.
>> --------------------------------------------------------------------------
>> compute-5-15.local - daemon did not report back when launched
>> compute-5-35.local - daemon did not report back when launched
>> compute-4-8.local - daemon did not report back when launched
>> compute-7-2.local - daemon did not report back when launched
>> compute-2-6.local - daemon did not report back when launched
>> compute-6-28.local - daemon did not report back when launched
>> compute-6-35.local - daemon did not report back when launched
>> compute-6-25.local
>> compute-6-26.local
>> compute-2-19.local - daemon did not report back when launched
>> compute-6-37.local - daemon did not report back when launched
>> compute-6-12.local - daemon did not report back when launched
>> compute-2-36.local - daemon did not report back when launched
>> compute-7-5.local - daemon did not report back when launched
>> compute-7-23.local - daemon did not report back when launched
>>
>> ================================================
>>
>> --
>>
>> Ray Muno
>> University of Minnesota
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

--
* Dr. Aurélien Bouteiller
* Sr. Research Associate at Innovative Computing Laboratory
* University of Tennessee
* 1122 Volunteer Boulevard, suite 350
* Knoxville, TN 37996
* 865 974 6321