On 15-Aug-09, at 1:03 AM, Alan wrote:

Thanks Warner,

This is frustrating... I read the ticket. 6 months already and 2 releases postponed... Frankly, I am very skeptical that this will be fixed for 1.3.4. I really hope so, but when 1.3.4 will be released?

I have to think about going with 1.2.x and possible disruptions in my configuration (I use Fink) or wait.

And I offered myself to test any nightly snapshot claiming this bug is fixed.

Hi Alan,

Its not too hard to get PBS/torque up and running.  

The OS/X-specific (?) issues I had:

1) /etc/hosts had to have the server explicitly listed on each of the nodes.
2) $usecp had to be set in mom_config
3) $restricted had to be set on the nodes in mom_priv/config to accept calls from the server.  

I think that is it.  Of course if you are already using xgrid on these machines for other uses it won't play well with PBS, but otherwise all you are missing is the cute tachometer display.  

Cheers,  Jody

Cheers,
Alan

On Fri, Aug 14, 2009 at 17:20, Warner Yuen <wyuen@apple.com> wrote:
Hi Alan,

Xgrid support for Open MPI is currently broken in the latest version of Open MPI. See the ticket below. However, I believe that Xgrid still works with one of the earlier 1.2  versions of Open MPI. I don't recall for sure, but I think that it's Open MPI 1.2.3.

#1777: Xgrid support is broken in the v1.3 series
---------------------+------------------------------------------------------
Reporter:  jsquyres  |        Owner:  brbarret
  Type:  defect    |       Status:  accepted
Priority:  major     |    Milestone:  Open MPI 1.3.4
Version:  trunk     |   Resolution:
Keywords:            |
---------------------+------------------------------------------------------
Changes (by bbenton):

 * milestone:  Open MPI 1.3.3 => Open MPI 1.3.4


Warner Yuen
Scientific Computing
Consulting Engineer
Apple, Inc.
email: wyuen@apple.com
Tel: 408.718.2859




On Aug 14, 2009, at 6:21 AM, users-request@open-mpi.org wrote:


Message: 1
Date: Fri, 14 Aug 2009 14:21:30 +0100
From: Alan <alanwilter@gmail.com>
Subject: [OMPI users] openmpi with xgrid
To: users@open-mpi.org
Message-ID:
       <cf58c8d00908140621v18d384f2wef97ee80ca3ded0c@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"


Hi there,
I saw that http://www.open-mpi.org/community/lists/users/2007/08/3900.php.

I use fink, and so I changed the openmpi.info file in order to get openmpi
with xgrid support.

As you can see:
amadeus[2081]:~/Downloads% /sw/bin/ompi_info
               Package: Open MPI root@amadeus.local Distribution
              Open MPI: 1.3.3
 Open MPI SVN revision: r21666
 Open MPI release date: Jul 14, 2009
              Open RTE: 1.3.3
 Open RTE SVN revision: r21666
 Open RTE release date: Jul 14, 2009
                  OPAL: 1.3.3
     OPAL SVN revision: r21666
     OPAL release date: Jul 14, 2009
          Ident string: 1.3.3
                Prefix: /sw
Configured architecture: x86_64-apple-darwin9
        Configure host: amadeus.local
         Configured by: root
         Configured on: Fri Aug 14 12:58:12 BST 2009
        Configure host: amadeus.local
              Built by:
              Built on: Fri Aug 14 13:07:46 BST 2009
            Built host: amadeus.local
            C bindings: yes
          C++ bindings: yes
    Fortran77 bindings: yes (single underscore)
    Fortran90 bindings: yes
Fortran90 bindings size: small
            C compiler: gcc
   C compiler absolute: /sw/var/lib/fink/path-prefix-10.6/gcc
          C++ compiler: g++
 C++ compiler absolute: /sw/var/lib/fink/path-prefix-10.6/g++
    Fortran77 compiler: gfortran
 Fortran77 compiler abs: /sw/bin/gfortran
    Fortran90 compiler: gfortran
 Fortran90 compiler abs: /sw/bin/gfortran
           C profiling: yes
         C++ profiling: yes
   Fortran77 profiling: yes
   Fortran90 profiling: yes
        C++ exceptions: no
        Thread support: posix (mpi: no, progress: no)
         Sparse Groups: no
 Internal debug support: no
   MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
       libltdl support: yes
 Heterogeneous support: no
mpirun default --prefix: no
       MPI I/O support: yes
     MPI_WTIME support: gettimeofday
Symbol visibility support: yes
 FT Checkpoint support: no  (checkpoint thread: no)
         MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.3.3)
         MCA paffinity: darwin (MCA v2.0, API v2.0, Component v1.3.3)
             MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.3.3)
             MCA carto: file (MCA v2.0, API v2.0, Component v1.3.3)
         MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.3.3)
             MCA timer: darwin (MCA v2.0, API v2.0, Component v1.3.3)
       MCA installdirs: env (MCA v2.0, API v2.0, Component v1.3.3)
       MCA installdirs: config (MCA v2.0, API v2.0, Component v1.3.3)
               MCA dpm: orte (MCA v2.0, API v2.0, Component v1.3.3)
            MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.3.3)
         MCA allocator: basic (MCA v2.0, API v2.0, Component v1.3.3)
         MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.3.3)
              MCA coll: basic (MCA v2.0, API v2.0, Component v1.3.3)
              MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.3.3)
              MCA coll: inter (MCA v2.0, API v2.0, Component v1.3.3)
              MCA coll: self (MCA v2.0, API v2.0, Component v1.3.3)
              MCA coll: sm (MCA v2.0, API v2.0, Component v1.3.3)
              MCA coll: sync (MCA v2.0, API v2.0, Component v1.3.3)
              MCA coll: tuned (MCA v2.0, API v2.0, Component v1.3.3)
                MCA io: romio (MCA v2.0, API v2.0, Component v1.3.3)
             MCA mpool: fake (MCA v2.0, API v2.0, Component v1.3.3)
             MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.3.3)
             MCA mpool: sm (MCA v2.0, API v2.0, Component v1.3.3)
               MCA pml: cm (MCA v2.0, API v2.0, Component v1.3.3)
               MCA pml: csum (MCA v2.0, API v2.0, Component v1.3.3)
               MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.3.3)
               MCA pml: v (MCA v2.0, API v2.0, Component v1.3.3)
               MCA bml: r2 (MCA v2.0, API v2.0, Component v1.3.3)
            MCA rcache: vma (MCA v2.0, API v2.0, Component v1.3.3)
               MCA btl: self (MCA v2.0, API v2.0, Component v1.3.3)
               MCA btl: sm (MCA v2.0, API v2.0, Component v1.3.3)
               MCA btl: tcp (MCA v2.0, API v2.0, Component v1.3.3)
              MCA topo: unity (MCA v2.0, API v2.0, Component v1.3.3)
               MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.3.3)
               MCA osc: rdma (MCA v2.0, API v2.0, Component v1.3.3)
               MCA iof: hnp (MCA v2.0, API v2.0, Component v1.3.3)
               MCA iof: orted (MCA v2.0, API v2.0, Component v1.3.3)
               MCA iof: tool (MCA v2.0, API v2.0, Component v1.3.3)
               MCA oob: tcp (MCA v2.0, API v2.0, Component v1.3.3)
              MCA odls: default (MCA v2.0, API v2.0, Component v1.3.3)
               MCA ras: slurm (MCA v2.0, API v2.0, Component v1.3.3)
             MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.3.3)
             MCA rmaps: round_robin (MCA v2.0, API v2.0, Component v1.3.3)
             MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.3.3)
               MCA rml: oob (MCA v2.0, API v2.0, Component v1.3.3)
            MCA routed: binomial (MCA v2.0, API v2.0, Component v1.3.3)
            MCA routed: direct (MCA v2.0, API v2.0, Component v1.3.3)
            MCA routed: linear (MCA v2.0, API v2.0, Component v1.3.3)
               MCA plm: rsh (MCA v2.0, API v2.0, Component v1.3.3)
               MCA plm: slurm (MCA v2.0, API v2.0, Component v1.3.3)
               MCA plm: xgrid (MCA v2.0, API v2.0, Component v1.3.3)
             MCA filem: rsh (MCA v2.0, API v2.0, Component v1.3.3)
            MCA errmgr: default (MCA v2.0, API v2.0, Component v1.3.3)
               MCA ess: env (MCA v2.0, API v2.0, Component v1.3.3)
               MCA ess: hnp (MCA v2.0, API v2.0, Component v1.3.3)
               MCA ess: singleton (MCA v2.0, API v2.0, Component v1.3.3)
               MCA ess: slurm (MCA v2.0, API v2.0, Component v1.3.3)
               MCA ess: tool (MCA v2.0, API v2.0, Component v1.3.3)
           MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.3.3)
           MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.3.3)

All seemed fine and I also have xgrid controller and agent running in my
laptop, and then when I tried:

/sw/bin/om-mpirun -c 2 mpiapp  # hello world example for mpi
[amadeus.local:40293] [[804,0],0] ORTE_ERROR_LOG: Unknown error: 1 in file
src/plm_xgrid_module.m at line 119
[amadeus.local:40293] [[804,0],0] ORTE_ERROR_LOG: Unknown error: 1 in file
src/plm_xgrid_module.m at line 153
--------------------------------------------------------------------------
om-mpirun was unable to start the specified application as it encountered an
error.
More information may be available above.
--------------------------------------------------------------------------
2009-08-14 14:16:19.715 om-mpirun[40293:10b] *** Terminating app due to
uncaught exception 'NSInvalidArgumentException', reason: '***
-[NSKVONotifying_XGConnection<0x1001164b0> finalize]: called when collecting
not enabled'
2009-08-14 14:16:19.716 om-mpirun[40293:10b] Stack: (
  140735390096156,
  140735366109391,
  140735390122388,
  4295943988,
  4295939168,
  4295171139,
  4295883300,
  4295025321,
  4294973498,
  4295401605,
  4295345774,
  4295056598,
  4295116412,
  4295119970,
  4295401605,
  4294972881,
  4295401605,
  4295345774,
  4295056598,
  4295172615,
  4295938185,
  4294971936,
  4294969401,
  4294969340
)
terminate called after throwing an instance of 'NSException'
[amadeus:40293] *** Process received signal ***
[amadeus:40293] Signal: Abort trap (6)
[amadeus:40293] Signal code:  (0)
[amadeus:40293] [ 0] 2   libSystem.B.dylib
0x00000000831443fa _sigtramp + 26
[amadeus:40293] [ 1] 3   ???
0x000000005fbfb1e8 0x0 + 1606398440
[amadeus:40293] [ 2] 4   libstdc++.6.dylib
0x00000000827f2085 _ZN9__gnu_cxx27__verbose_terminate_handlerEv + 377
[amadeus:40293] [ 3] 5   libobjc.A.dylib
0x0000000081811adf objc_end_catch + 280
[amadeus:40293] [ 4] 6   libstdc++.6.dylib
0x00000000827f0425 __gxx_personality_v0 + 1259
[amadeus:40293] [ 5] 7   libstdc++.6.dylib
0x00000000827f045b _ZSt9terminatev + 19
[amadeus:40293] [ 6] 8   libstdc++.6.dylib
0x00000000827f054c __cxa_rethrow + 0
[amadeus:40293] [ 7] 9   libobjc.A.dylib
0x0000000081811966 objc_exception_rethrow + 0
[amadeus:40293] [ 8] 10  CoreFoundation
0x0000000082ef8194 _CF_forwarding_prep_0 + 5700
[amadeus:40293] [ 9] 11  mca_plm_xgrid.so
0x00000000000ee734 orte_plm_xgrid_finalize + 4884
[amadeus:40293] [10] 12  mca_plm_xgrid.so
0x00000000000ed460 orte_plm_xgrid_finalize + 64
[amadeus:40293] [11] 13  libopen-rte.0.dylib
0x0000000000031c43 orte_plm_base_close + 195
[amadeus:40293] [12] 14  mca_ess_hnp.so
0x00000000000dfa24 0x0 + 916004
[amadeus:40293] [13] 15  libopen-rte.0.dylib
0x000000000000e2a9 orte_finalize + 89
[amadeus:40293] [14] 16  om-mpirun
0x000000000000183a start + 4210
[amadeus:40293] [15] 17  libopen-pal.0.dylib
0x000000000006a085 opal_event_add_i + 1781
[amadeus:40293] [16] 18  libopen-pal.0.dylib
0x000000000005c66e opal_progress + 142
[amadeus:40293] [17] 19  libopen-rte.0.dylib
0x0000000000015cd6 orte_trigger_event + 70
[amadeus:40293] [18] 20  libopen-rte.0.dylib
0x000000000002467c orte_daemon_recv + 4332
[amadeus:40293] [19] 21  libopen-rte.0.dylib
0x0000000000025462 orte_daemon_cmd_processor + 722
[amadeus:40293] [20] 22  libopen-pal.0.dylib
0x000000000006a085 opal_event_add_i + 1781
[amadeus:40293] [21] 23  om-mpirun
0x00000000000015d1 start + 3593
[amadeus:40293] [22] 24  libopen-pal.0.dylib
0x000000000006a085 opal_event_add_i + 1781
[amadeus:40293] [23] 25  libopen-pal.0.dylib
0x000000000005c66e opal_progress + 142
[amadeus:40293] [24] 26  libopen-rte.0.dylib
0x0000000000015cd6 orte_trigger_event + 70
[amadeus:40293] [25] 27  libopen-rte.0.dylib
0x0000000000032207 orte_plm_base_launch_failed + 135
[amadeus:40293] [26] 28  mca_plm_xgrid.so
0x00000000000ed089 orte_plm_xgrid_spawn + 89
[amadeus:40293] [27] 29  om-mpirun
0x0000000000001220 start + 2648
[amadeus:40293] [28] 30  om-mpirun
0x0000000000000839 start + 113
[amadeus:40293] [29] 31  om-mpirun
0x00000000000007fc start + 52
[amadeus:40293] *** End of error message ***
[1]    40293 abort      /sw/bin/om-mpirun -c 2 mpiapp


Is there anyone using openmpi with xgrid successfully keen to share his/her
experience? I am not new to xgrid or mpi, but to both integrated I must say
that I am in uncharted waters.

Any help would be very appreciated.

Many thanks in advance,
Alan
--
Alan Wilter S. da Silva, D.Sc. - CCPN Research Associate
Department of Biochemistry, University of Cambridge.
80 Tennis Court Road, Cambridge CB2 1GA, UK.
http://www.bio.cam.ac.uk/~awd28<<
-------------- next part --------------
HTML attachment scrubbed and removed

------------------------------

_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

End of users Digest, Vol 1318, Issue 2
**************************************

_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Alan Wilter S. da Silva, D.Sc. - CCPN Research Associate
Department of Biochemistry, University of Cambridge.
80 Tennis Court Road, Cambridge CB2 1GA, UK.
>>http://www.bio.cam.ac.uk/~awd28<<
_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users