Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] MPI orte_init fails on remote nodes
From: Richard Bardwell (richard_at_[hidden])
Date: 2012-02-13 11:09:46


My mistake Ralph, should have done a make uninstall instead !

Thanks

Richard
  ----- Original Message -----
  From: Ralph Castain
  To: Open MPI Users
  Sent: Monday, February 13, 2012 3:41 PM
  Subject: Re: [OMPI users] MPI orte_init fails on remote nodes


  You need to clean out the old attempt - that is a stale file

  Sent from my iPad




    OK, I installed 1.4.4, rebuilt the exec and guess what ...... I now get some weird errors as below:
    mca: base: component_find: unable to open /usr/local/lib/openmpi/mca_ras_dash_host
    along with a few other files
    even though the .so / .la files are all there !
      ----- Original Message -----
      From: Ralph Castain
      To: Open MPI Users
      Sent: Monday, February 13, 2012 2:59 PM
      Subject: Re: [OMPI users] MPI orte_init fails on remote nodes


      Good heavens - where did you find something that old? Can you use a more recent version?

      Sent from my iPad


       
        Gentlemen

        I am struggling to get MPI working when the hostfile contains different nodes.

        I get the error below. Any ideas ?? I can ssh without password between the two

        nodes. I am running 1.2.8 MPI on both machines.

        Any help most appreciated !!!!!



        MPITEST/v8_mpi_test> mpiexec -n 2 --debug-daemons -hostfile test.hst /home/sharc/MPITEST/v8_mpi_test/mpitest

        Daemon [0,0,1] checking in as pid 10490 on host 192.0.2.67

        [linux-z0je:08804] [NO-NAME] ORTE_ERROR_LOG: Not found in file runtime/orte_init_stage1.c at line 182

        --------------------------------------------------------------------------

        It looks like orte_init failed for some reason; your parallel process is

        likely to abort. There are many reasons that a parallel process can

        fail during orte_init; some of which are due to configuration or

        environment problems. This failure appears to be an internal failure;

        here's some additional information (which may only be relevant to an

        Open MPI developer):

        orte_rml_base_select failed

        --> Returned value -13 instead of ORTE_SUCCESS

        --------------------------------------------------------------------------

        [linux-z0je:08804] [NO-NAME] ORTE_ERROR_LOG: Not found in file runtime/orte_system_init.c at line 42

        [linux-z0je:08804] [NO-NAME] ORTE_ERROR_LOG: Not found in file runtime/orte_init.c at line 52

        Open RTE was unable to initialize properly. The error occured while

        attempting to orte_init(). Returned value -13 instead of ORTE_SUCCESS.

        [linux-tmpw:10490] [0,0,1] orted_recv_pls: received message from [0,0,0]

        [linux-tmpw:10490] [0,0,1] orted_recv_pls: received kill_local_procs

        [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 275

        [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1158

        [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90

        [linux-tmpw:10489] ERROR: A daemon on node 192.0.2.68 failed to start as expected.

        [linux-tmpw:10489] ERROR: There may be more information available from

        [linux-tmpw:10489] ERROR: the remote shell (see above).

        [linux-tmpw:10489] ERROR: The daemon exited unexpectedly with status 243.

        [linux-tmpw:10490] [0,0,1] orted_recv_pls: received message from [0,0,0]

        [linux-tmpw:10490] [0,0,1] orted_recv_pls: received exit

        [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 188

        [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1190

        --------------------------------------------------------------------------

        mpiexec was unable to cleanly terminate the daemons for this job. Returned value Timeout instead of ORTE_SUCCESS.

        --------------------------------------------------------------------------

        _______________________________________________
        users mailing list
        users_at_[hidden]
        http://www.open-mpi.org/mailman/listinfo.cgi/users


--------------------------------------------------------------------------


      _______________________________________________
      users mailing list
      users_at_[hidden]
      http://www.open-mpi.org/mailman/listinfo.cgi/users
    _______________________________________________
    users mailing list
    users_at_[hidden]
    http://www.open-mpi.org/mailman/listinfo.cgi/users


------------------------------------------------------------------------------


  _______________________________________________
  users mailing list
  users_at_[hidden]
  http://www.open-mpi.org/mailman/listinfo.cgi/users