Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] openmpi with xgrid
From: Warner Yuen (wyuen_at_[hidden])
Date: 2009-08-14 12:20:05


Hi Alan,

Xgrid support for Open MPI is currently broken in the latest version
of Open MPI. See the ticket below. However, I believe that Xgrid still
works with one of the earlier 1.2 versions of Open MPI. I don't
recall for sure, but I think that it's Open MPI 1.2.3.

#1777: Xgrid support is broken in the v1.3 series
---------------------
+------------------------------------------------------
Reporter: jsquyres | Owner: brbarret
    Type: defect | Status: accepted
Priority: major | Milestone: Open MPI 1.3.4
Version: trunk | Resolution:
Keywords: |
---------------------
+------------------------------------------------------
Changes (by bbenton):

  * milestone: Open MPI 1.3.3 => Open MPI 1.3.4

Warner Yuen
Scientific Computing
Consulting Engineer
Apple, Inc.
email: wyuen_at_[hidden]
Tel: 408.718.2859

On Aug 14, 2009, at 6:21 AM, users-request_at_[hidden] wrote:

>
> Message: 1
> Date: Fri, 14 Aug 2009 14:21:30 +0100
> From: Alan <alanwilter_at_[hidden]>
> Subject: [OMPI users] openmpi with xgrid
> To: users_at_[hidden]
> Message-ID:
> <cf58c8d00908140621v18d384f2wef97ee80ca3ded0c_at_[hidden]>
> Content-Type: text/plain; charset="utf-8"
>
> Hi there,
> I saw that http://www.open-mpi.org/community/lists/users/2007/08/3900.php
> .
>
> I use fink, and so I changed the openmpi.info file in order to get
> openmpi
> with xgrid support.
>
> As you can see:
> amadeus[2081]:~/Downloads% /sw/bin/ompi_info
> Package: Open MPI root_at_amadeus.local Distribution
> Open MPI: 1.3.3
> Open MPI SVN revision: r21666
> Open MPI release date: Jul 14, 2009
> Open RTE: 1.3.3
> Open RTE SVN revision: r21666
> Open RTE release date: Jul 14, 2009
> OPAL: 1.3.3
> OPAL SVN revision: r21666
> OPAL release date: Jul 14, 2009
> Ident string: 1.3.3
> Prefix: /sw
> Configured architecture: x86_64-apple-darwin9
> Configure host: amadeus.local
> Configured by: root
> Configured on: Fri Aug 14 12:58:12 BST 2009
> Configure host: amadeus.local
> Built by:
> Built on: Fri Aug 14 13:07:46 BST 2009
> Built host: amadeus.local
> C bindings: yes
> C++ bindings: yes
> Fortran77 bindings: yes (single underscore)
> Fortran90 bindings: yes
> Fortran90 bindings size: small
> C compiler: gcc
> C compiler absolute: /sw/var/lib/fink/path-prefix-10.6/gcc
> C++ compiler: g++
> C++ compiler absolute: /sw/var/lib/fink/path-prefix-10.6/g++
> Fortran77 compiler: gfortran
> Fortran77 compiler abs: /sw/bin/gfortran
> Fortran90 compiler: gfortran
> Fortran90 compiler abs: /sw/bin/gfortran
> C profiling: yes
> C++ profiling: yes
> Fortran77 profiling: yes
> Fortran90 profiling: yes
> C++ exceptions: no
> Thread support: posix (mpi: no, progress: no)
> Sparse Groups: no
> Internal debug support: no
> MPI parameter check: runtime
> Memory profiling support: no
> Memory debugging support: no
> libltdl support: yes
> Heterogeneous support: no
> mpirun default --prefix: no
> MPI I/O support: yes
> MPI_WTIME support: gettimeofday
> Symbol visibility support: yes
> FT Checkpoint support: no (checkpoint thread: no)
> MCA backtrace: execinfo (MCA v2.0, API v2.0, Component
> v1.3.3)
> MCA paffinity: darwin (MCA v2.0, API v2.0, Component v1.3.3)
> MCA carto: auto_detect (MCA v2.0, API v2.0, Component
> v1.3.3)
> MCA carto: file (MCA v2.0, API v2.0, Component v1.3.3)
> MCA maffinity: first_use (MCA v2.0, API v2.0, Component
> v1.3.3)
> MCA timer: darwin (MCA v2.0, API v2.0, Component v1.3.3)
> MCA installdirs: env (MCA v2.0, API v2.0, Component v1.3.3)
> MCA installdirs: config (MCA v2.0, API v2.0, Component v1.3.3)
> MCA dpm: orte (MCA v2.0, API v2.0, Component v1.3.3)
> MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.3.3)
> MCA allocator: basic (MCA v2.0, API v2.0, Component v1.3.3)
> MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.3.3)
> MCA coll: basic (MCA v2.0, API v2.0, Component v1.3.3)
> MCA coll: hierarch (MCA v2.0, API v2.0, Component
> v1.3.3)
> MCA coll: inter (MCA v2.0, API v2.0, Component v1.3.3)
> MCA coll: self (MCA v2.0, API v2.0, Component v1.3.3)
> MCA coll: sm (MCA v2.0, API v2.0, Component v1.3.3)
> MCA coll: sync (MCA v2.0, API v2.0, Component v1.3.3)
> MCA coll: tuned (MCA v2.0, API v2.0, Component v1.3.3)
> MCA io: romio (MCA v2.0, API v2.0, Component v1.3.3)
> MCA mpool: fake (MCA v2.0, API v2.0, Component v1.3.3)
> MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.3.3)
> MCA mpool: sm (MCA v2.0, API v2.0, Component v1.3.3)
> MCA pml: cm (MCA v2.0, API v2.0, Component v1.3.3)
> MCA pml: csum (MCA v2.0, API v2.0, Component v1.3.3)
> MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.3.3)
> MCA pml: v (MCA v2.0, API v2.0, Component v1.3.3)
> MCA bml: r2 (MCA v2.0, API v2.0, Component v1.3.3)
> MCA rcache: vma (MCA v2.0, API v2.0, Component v1.3.3)
> MCA btl: self (MCA v2.0, API v2.0, Component v1.3.3)
> MCA btl: sm (MCA v2.0, API v2.0, Component v1.3.3)
> MCA btl: tcp (MCA v2.0, API v2.0, Component v1.3.3)
> MCA topo: unity (MCA v2.0, API v2.0, Component v1.3.3)
> MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.3.3)
> MCA osc: rdma (MCA v2.0, API v2.0, Component v1.3.3)
> MCA iof: hnp (MCA v2.0, API v2.0, Component v1.3.3)
> MCA iof: orted (MCA v2.0, API v2.0, Component v1.3.3)
> MCA iof: tool (MCA v2.0, API v2.0, Component v1.3.3)
> MCA oob: tcp (MCA v2.0, API v2.0, Component v1.3.3)
> MCA odls: default (MCA v2.0, API v2.0, Component
> v1.3.3)
> MCA ras: slurm (MCA v2.0, API v2.0, Component v1.3.3)
> MCA rmaps: rank_file (MCA v2.0, API v2.0, Component
> v1.3.3)
> MCA rmaps: round_robin (MCA v2.0, API v2.0, Component
> v1.3.3)
> MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.3.3)
> MCA rml: oob (MCA v2.0, API v2.0, Component v1.3.3)
> MCA routed: binomial (MCA v2.0, API v2.0, Component
> v1.3.3)
> MCA routed: direct (MCA v2.0, API v2.0, Component v1.3.3)
> MCA routed: linear (MCA v2.0, API v2.0, Component v1.3.3)
> MCA plm: rsh (MCA v2.0, API v2.0, Component v1.3.3)
> MCA plm: slurm (MCA v2.0, API v2.0, Component v1.3.3)
> MCA plm: xgrid (MCA v2.0, API v2.0, Component v1.3.3)
> MCA filem: rsh (MCA v2.0, API v2.0, Component v1.3.3)
> MCA errmgr: default (MCA v2.0, API v2.0, Component
> v1.3.3)
> MCA ess: env (MCA v2.0, API v2.0, Component v1.3.3)
> MCA ess: hnp (MCA v2.0, API v2.0, Component v1.3.3)
> MCA ess: singleton (MCA v2.0, API v2.0, Component
> v1.3.3)
> MCA ess: slurm (MCA v2.0, API v2.0, Component v1.3.3)
> MCA ess: tool (MCA v2.0, API v2.0, Component v1.3.3)
> MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.3.3)
> MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.3.3)
>
> All seemed fine and I also have xgrid controller and agent running
> in my
> laptop, and then when I tried:
>
> /sw/bin/om-mpirun -c 2 mpiapp # hello world example for mpi
> [amadeus.local:40293] [[804,0],0] ORTE_ERROR_LOG: Unknown error: 1
> in file
> src/plm_xgrid_module.m at line 119
> [amadeus.local:40293] [[804,0],0] ORTE_ERROR_LOG: Unknown error: 1
> in file
> src/plm_xgrid_module.m at line 153
> --------------------------------------------------------------------------
> om-mpirun was unable to start the specified application as it
> encountered an
> error.
> More information may be available above.
> --------------------------------------------------------------------------
> 2009-08-14 14:16:19.715 om-mpirun[40293:10b] *** Terminating app due
> to
> uncaught exception 'NSInvalidArgumentException', reason: '***
> -[NSKVONotifying_XGConnection<0x1001164b0> finalize]: called when
> collecting
> not enabled'
> 2009-08-14 14:16:19.716 om-mpirun[40293:10b] Stack: (
> 140735390096156,
> 140735366109391,
> 140735390122388,
> 4295943988,
> 4295939168,
> 4295171139,
> 4295883300,
> 4295025321,
> 4294973498,
> 4295401605,
> 4295345774,
> 4295056598,
> 4295116412,
> 4295119970,
> 4295401605,
> 4294972881,
> 4295401605,
> 4295345774,
> 4295056598,
> 4295172615,
> 4295938185,
> 4294971936,
> 4294969401,
> 4294969340
> )
> terminate called after throwing an instance of 'NSException'
> [amadeus:40293] *** Process received signal ***
> [amadeus:40293] Signal: Abort trap (6)
> [amadeus:40293] Signal code: (0)
> [amadeus:40293] [ 0] 2 libSystem.B.dylib
> 0x00000000831443fa _sigtramp + 26
> [amadeus:40293] [ 1] 3 ???
> 0x000000005fbfb1e8 0x0 + 1606398440
> [amadeus:40293] [ 2] 4 libstdc++.6.dylib
> 0x00000000827f2085 _ZN9__gnu_cxx27__verbose_terminate_handlerEv + 377
> [amadeus:40293] [ 3] 5 libobjc.A.dylib
> 0x0000000081811adf objc_end_catch + 280
> [amadeus:40293] [ 4] 6 libstdc++.6.dylib
> 0x00000000827f0425 __gxx_personality_v0 + 1259
> [amadeus:40293] [ 5] 7 libstdc++.6.dylib
> 0x00000000827f045b _ZSt9terminatev + 19
> [amadeus:40293] [ 6] 8 libstdc++.6.dylib
> 0x00000000827f054c __cxa_rethrow + 0
> [amadeus:40293] [ 7] 9 libobjc.A.dylib
> 0x0000000081811966 objc_exception_rethrow + 0
> [amadeus:40293] [ 8] 10 CoreFoundation
> 0x0000000082ef8194 _CF_forwarding_prep_0 + 5700
> [amadeus:40293] [ 9] 11 mca_plm_xgrid.so
> 0x00000000000ee734 orte_plm_xgrid_finalize + 4884
> [amadeus:40293] [10] 12 mca_plm_xgrid.so
> 0x00000000000ed460 orte_plm_xgrid_finalize + 64
> [amadeus:40293] [11] 13 libopen-rte.0.dylib
> 0x0000000000031c43 orte_plm_base_close + 195
> [amadeus:40293] [12] 14 mca_ess_hnp.so
> 0x00000000000dfa24 0x0 + 916004
> [amadeus:40293] [13] 15 libopen-rte.0.dylib
> 0x000000000000e2a9 orte_finalize + 89
> [amadeus:40293] [14] 16 om-mpirun
> 0x000000000000183a start + 4210
> [amadeus:40293] [15] 17 libopen-pal.0.dylib
> 0x000000000006a085 opal_event_add_i + 1781
> [amadeus:40293] [16] 18 libopen-pal.0.dylib
> 0x000000000005c66e opal_progress + 142
> [amadeus:40293] [17] 19 libopen-rte.0.dylib
> 0x0000000000015cd6 orte_trigger_event + 70
> [amadeus:40293] [18] 20 libopen-rte.0.dylib
> 0x000000000002467c orte_daemon_recv + 4332
> [amadeus:40293] [19] 21 libopen-rte.0.dylib
> 0x0000000000025462 orte_daemon_cmd_processor + 722
> [amadeus:40293] [20] 22 libopen-pal.0.dylib
> 0x000000000006a085 opal_event_add_i + 1781
> [amadeus:40293] [21] 23 om-mpirun
> 0x00000000000015d1 start + 3593
> [amadeus:40293] [22] 24 libopen-pal.0.dylib
> 0x000000000006a085 opal_event_add_i + 1781
> [amadeus:40293] [23] 25 libopen-pal.0.dylib
> 0x000000000005c66e opal_progress + 142
> [amadeus:40293] [24] 26 libopen-rte.0.dylib
> 0x0000000000015cd6 orte_trigger_event + 70
> [amadeus:40293] [25] 27 libopen-rte.0.dylib
> 0x0000000000032207 orte_plm_base_launch_failed + 135
> [amadeus:40293] [26] 28 mca_plm_xgrid.so
> 0x00000000000ed089 orte_plm_xgrid_spawn + 89
> [amadeus:40293] [27] 29 om-mpirun
> 0x0000000000001220 start + 2648
> [amadeus:40293] [28] 30 om-mpirun
> 0x0000000000000839 start + 113
> [amadeus:40293] [29] 31 om-mpirun
> 0x00000000000007fc start + 52
> [amadeus:40293] *** End of error message ***
> [1] 40293 abort /sw/bin/om-mpirun -c 2 mpiapp
>
>
> Is there anyone using openmpi with xgrid successfully keen to share
> his/her
> experience? I am not new to xgrid or mpi, but to both integrated I
> must say
> that I am in uncharted waters.
>
> Any help would be very appreciated.
>
> Many thanks in advance,
> Alan
> --
> Alan Wilter S. da Silva, D.Sc. - CCPN Research Associate
> Department of Biochemistry, University of Cambridge.
> 80 Tennis Court Road, Cambridge CB2 1GA, UK.
>>> http://www.bio.cam.ac.uk/~awd28<<
> -------------- next part --------------
> HTML attachment scrubbed and removed
>
> ------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> End of users Digest, Vol 1318, Issue 2
> **************************************