Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] openmpi with xgrid
From: Warner Yuen (wyuen_at_[hidden])
Date: 2009-08-14 12:20:05


Hi Alan,

Xgrid support for Open MPI is currently broken in the latest version
of Open MPI. See the ticket below. However, I believe that Xgrid still
works with one of the earlier 1.2 versions of Open MPI. I don't
recall for sure, but I think that it's Open MPI 1.2.3.

#1777: Xgrid support is broken in the v1.3 series
---------------------
+------------------------------------------------------
Reporter: jsquyres | Owner: brbarret
    Type: defect | Status: accepted
Priority: major | Milestone: Open MPI 1.3.4
Version: trunk | Resolution:
Keywords: |
---------------------
+------------------------------------------------------
Changes (by bbenton):

  * milestone: Open MPI 1.3.3 => Open MPI 1.3.4

Warner Yuen
Scientific Computing
Consulting Engineer
Apple, Inc.
email: wyuen_at_[hidden]
Tel: 408.718.2859

On Aug 14, 2009, at 6:21 AM, users-request_at_[hidden] wrote:

>
> Message: 1
> Date: Fri, 14 Aug 2009 14:21:30 +0100
> From: Alan <alanwilter_at_[hidden]>
> Subject: [OMPI users] openmpi with xgrid
> To: users_at_[hidden]
> Message-ID:
> <cf58c8d00908140621v18d384f2wef97ee80ca3ded0c_at_[hidden]>
> Content-Type: text/plain; charset="utf-8"
>
> Hi there,
> I saw that http://www.open-mpi.org/community/lists/users/2007/08/3900.php
> .
>
> I use fink, and so I changed the openmpi.info file in order to get
> openmpi
> with xgrid support.
>
> As you can see:
> amadeus[2081]:~/Downloads% /sw/bin/ompi_info
> Package: Open MPI root_at_amadeus.local Distribution
> Open MPI: 1.3.3
> Open MPI SVN revision: r21666
> Open MPI release date: Jul 14, 2009
> Open RTE: 1.3.3
> Open RTE SVN revision: r21666
> Open RTE release date: Jul 14, 2009
> OPAL: 1.3.3
> OPAL SVN revision: r21666
> OPAL release date: Jul 14, 2009
> Ident string: 1.3.3
> Prefix: /sw
> Configured architecture: x86_64-apple-darwin9
> Configure host: amadeus.local
> Configured by: root
> Configured on: Fri Aug 14 12:58:12 BST 2009
> Configure host: amadeus.local
> Built by:
> Built on: Fri Aug 14 13:07:46 BST 2009
> Built host: amadeus.local
> C bindings: yes
> C++ bindings: yes
> Fortran77 bindings: yes (single underscore)
> Fortran90 bindings: yes
> Fortran90 bindings size: small
> C compiler: gcc
> C compiler absolute: /sw/var/lib/fink/path-prefix-10.6/gcc
> C++ compiler: g++
> C++ compiler absolute: /sw/var/lib/fink/path-prefix-10.6/g++
> Fortran77 compiler: gfortran
> Fortran77 compiler abs: /sw/bin/gfortran
> Fortran90 compiler: gfortran
> Fortran90 compiler abs: /sw/bin/gfortran
> C profiling: yes
> C++ profiling: yes
> Fortran77 profiling: yes
> Fortran90 profiling: yes
> C++ exceptions: no
> Thread support: posix (mpi: no, progress: no)
> Sparse Groups: no
> Internal debug support: no
> MPI parameter check: runtime
> Memory profiling support: no
> Memory debugging support: no
> libltdl support: yes
> Heterogeneous support: no
> mpirun default --prefix: no
> MPI I/O support: yes
> MPI_WTIME support: gettimeofday
> Symbol visibility support: yes
> FT Checkpoint support: no (checkpoint thread: no)
> MCA backtrace: execinfo (MCA v2.0, API v2.0, Component
> v1.3.3)
> MCA paffinity: darwin (MCA v2.0, API v2.0, Component v1.3.3)
> MCA carto: auto_detect (MCA v2.0, API v2.0, Component
> v1.3.3)
> MCA carto: file (MCA v2.0, API v2.0, Component v1.3.3)
> MCA maffinity: first_use (MCA v2.0, API v2.0, Component
> v1.3.3)
> MCA timer: darwin (MCA v2.0, API v2.0, Component v1.3.3)
> MCA installdirs: env (MCA v2.0, API v2.0, Component v1.3.3)
> MCA installdirs: config (MCA v2.0, API v2.0, Component v1.3.3)
> MCA dpm: orte (MCA v2.0, API v2.0, Component v1.3.3)
> MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.3.3)
> MCA allocator: basic (MCA v2.0, API v2.0, Component v1.3.3)
> MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.3.3)
> MCA coll: basic (MCA v2.0, API v2.0, Component v1.3.3)
> MCA coll: hierarch (MCA v2.0, API v2.0, Component
> v1.3.3)
> MCA coll: inter (MCA v2.0, API v2.0, Component v1.3.3)
> MCA coll: self (MCA v2.0, API v2.0, Component v1.3.3)
> MCA coll: sm (MCA v2.0, API v2.0, Component v1.3.3)
> MCA coll: sync (MCA v2.0, API v2.0, Component v1.3.3)
> MCA coll: tuned (MCA v2.0, API v2.0, Component v1.3.3)
> MCA io: romio (MCA v2.0, API v2.0, Component v1.3.3)
> MCA mpool: fake (MCA v2.0, API v2.0, Component v1.3.3)
> MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.3.3)
> MCA mpool: sm (MCA v2.0, API v2.0, Component v1.3.3)
> MCA pml: cm (MCA v2.0, API v2.0, Component v1.3.3)
> MCA pml: csum (MCA v2.0, API v2.0, Component v1.3.3)
> MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.3.3)
> MCA pml: v (MCA v2.0, API v2.0, Component v1.3.3)
> MCA bml: r2 (MCA v2.0, API v2.0, Component v1.3.3)
> MCA rcache: vma (MCA v2.0, API v2.0, Component v1.3.3)
> MCA btl: self (MCA v2.0, API v2.0, Component v1.3.3)
> MCA btl: sm (MCA v2.0, API v2.0, Component v1.3.3)
> MCA btl: tcp (MCA v2.0, API v2.0, Component v1.3.3)
> MCA topo: unity (MCA v2.0, API v2.0, Component v1.3.3)
> MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.3.3)
> MCA osc: rdma (MCA v2.0, API v2.0, Component v1.3.3)
> MCA iof: hnp (MCA v2.0, API v2.0, Component v1.3.3)
> MCA iof: orted (MCA v2.0, API v2.0, Component v1.3.3)
> MCA iof: tool (MCA v2.0, API v2.0, Component v1.3.3)
> MCA oob: tcp (MCA v2.0, API v2.0, Component v1.3.3)
> MCA odls: default (MCA v2.0, API v2.0, Component
> v1.3.3)
> MCA ras: slurm (MCA v2.0, API v2.0, Component v1.3.3)
> MCA rmaps: rank_file (MCA v2.0, API v2.0, Component
> v1.3.3)
> MCA rmaps: round_robin (MCA v2.0, API v2.0, Component
> v1.3.3)
> MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.3.3)
> MCA rml: oob (MCA v2.0, API v2.0, Component v1.3.3)
> MCA routed: binomial (MCA v2.0, API v2.0, Component
> v1.3.3)
> MCA routed: direct (MCA v2.0, API v2.0, Component v1.3.3)
> MCA routed: linear (MCA v2.0, API v2.0, Component v1.3.3)
> MCA plm: rsh (MCA v2.0, API v2.0, Component v1.3.3)
> MCA plm: slurm (MCA v2.0, API v2.0, Component v1.3.3)
> MCA plm: xgrid (MCA v2.0, API v2.0, Component v1.3.3)
> MCA filem: rsh (MCA v2.0, API v2.0, Component v1.3.3)
> MCA errmgr: default (MCA v2.0, API v2.0, Component
> v1.3.3)
> MCA ess: env (MCA v2.0, API v2.0, Component v1.3.3)
> MCA ess: hnp (MCA v2.0, API v2.0, Component v1.3.3)
> MCA ess: singleton (MCA v2.0, API v2.0, Component
> v1.3.3)
> MCA ess: slurm (MCA v2.0, API v2.0, Component v1.3.3)
> MCA ess: tool (MCA v2.0, API v2.0, Component v1.3.3)
> MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.3.3)
> MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.3.3)
>
> All seemed fine and I also have xgrid controller and agent running
> in my
> laptop, and then when I tried:
>
> /sw/bin/om-mpirun -c 2 mpiapp # hello world example for mpi
> [amadeus.local:40293] [[804,0],0] ORTE_ERROR_LOG: Unknown error: 1
> in file
> src/plm_xgrid_module.m at line 119
> [amadeus.local:40293] [[804,0],0] ORTE_ERROR_LOG: Unknown error: 1
> in file
> src/plm_xgrid_module.m at line 153
> --------------------------------------------------------------------------
> om-mpirun was unable to start the specified application as it
> encountered an
> error.
> More information may be available above.
> --------------------------------------------------------------------------
> 2009-08-14 14:16:19.715 om-mpirun[40293:10b] *** Terminating app due
> to
> uncaught exception 'NSInvalidArgumentException', reason: '***
> -[NSKVONotifying_XGConnection<0x1001164b0> finalize]: called when
> collecting
> not enabled'
> 2009-08-14 14:16:19.716 om-mpirun[40293:10b] Stack: (
> 140735390096156,
> 140735366109391,
> 140735390122388,
> 4295943988,
> 4295939168,
> 4295171139,
> 4295883300,
> 4295025321,
> 4294973498,
> 4295401605,
> 4295345774,
> 4295056598,
> 4295116412,
> 4295119970,
> 4295401605,
> 4294972881,
> 4295401605,
> 4295345774,
> 4295056598,
> 4295172615,
> 4295938185,
> 4294971936,
> 4294969401,
> 4294969340
> )
> terminate called after throwing an instance of 'NSException'
> [amadeus:40293] *** Process received signal ***
> [amadeus:40293] Signal: Abort trap (6)
> [amadeus:40293] Signal code: (0)
> [amadeus:40293] [ 0] 2 libSystem.B.dylib
> 0x00000000831443fa _sigtramp + 26
> [amadeus:40293] [ 1] 3 ???
> 0x000000005fbfb1e8 0x0 + 1606398440
> [amadeus:40293] [ 2] 4 libstdc++.6.dylib
> 0x00000000827f2085 _ZN9__gnu_cxx27__verbose_terminate_handlerEv + 377
> [amadeus:40293] [ 3] 5 libobjc.A.dylib
> 0x0000000081811adf objc_end_catch + 280
> [amadeus:40293] [ 4] 6 libstdc++.6.dylib
> 0x00000000827f0425 __gxx_personality_v0 + 1259
> [amadeus:40293] [ 5] 7 libstdc++.6.dylib
> 0x00000000827f045b _ZSt9terminatev + 19
> [amadeus:40293] [ 6] 8 libstdc++.6.dylib
> 0x00000000827f054c __cxa_rethrow + 0
> [amadeus:40293] [ 7] 9 libobjc.A.dylib
> 0x0000000081811966 objc_exception_rethrow + 0
> [amadeus:40293] [ 8] 10 CoreFoundation
> 0x0000000082ef8194 _CF_forwarding_prep_0 + 5700
> [amadeus:40293] [ 9] 11 mca_plm_xgrid.so
> 0x00000000000ee734 orte_plm_xgrid_finalize + 4884
> [amadeus:40293] [10] 12 mca_plm_xgrid.so
> 0x00000000000ed460 orte_plm_xgrid_finalize + 64
> [amadeus:40293] [11] 13 libopen-rte.0.dylib
> 0x0000000000031c43 orte_plm_base_close + 195
> [amadeus:40293] [12] 14 mca_ess_hnp.so
> 0x00000000000dfa24 0x0 + 916004
> [amadeus:40293] [13] 15 libopen-rte.0.dylib
> 0x000000000000e2a9 orte_finalize + 89
> [amadeus:40293] [14] 16 om-mpirun
> 0x000000000000183a start + 4210
> [amadeus:40293] [15] 17 libopen-pal.0.dylib
> 0x000000000006a085 opal_event_add_i + 1781
> [amadeus:40293] [16] 18 libopen-pal.0.dylib
> 0x000000000005c66e opal_progress + 142
> [amadeus:40293] [17] 19 libopen-rte.0.dylib
> 0x0000000000015cd6 orte_trigger_event + 70
> [amadeus:40293] [18] 20 libopen-rte.0.dylib
> 0x000000000002467c orte_daemon_recv + 4332
> [amadeus:40293] [19] 21 libopen-rte.0.dylib
> 0x0000000000025462 orte_daemon_cmd_processor + 722
> [amadeus:40293] [20] 22 libopen-pal.0.dylib
> 0x000000000006a085 opal_event_add_i + 1781
> [amadeus:40293] [21] 23 om-mpirun
> 0x00000000000015d1 start + 3593
> [amadeus:40293] [22] 24 libopen-pal.0.dylib
> 0x000000000006a085 opal_event_add_i + 1781
> [amadeus:40293] [23] 25 libopen-pal.0.dylib
> 0x000000000005c66e opal_progress + 142
> [amadeus:40293] [24] 26 libopen-rte.0.dylib
> 0x0000000000015cd6 orte_trigger_event + 70
> [amadeus:40293] [25] 27 libopen-rte.0.dylib
> 0x0000000000032207 orte_plm_base_launch_failed + 135
> [amadeus:40293] [26] 28 mca_plm_xgrid.so
> 0x00000000000ed089 orte_plm_xgrid_spawn + 89
> [amadeus:40293] [27] 29 om-mpirun
> 0x0000000000001220 start + 2648
> [amadeus:40293] [28] 30 om-mpirun
> 0x0000000000000839 start + 113
> [amadeus:40293] [29] 31 om-mpirun
> 0x00000000000007fc start + 52
> [amadeus:40293] *** End of error message ***
> [1] 40293 abort /sw/bin/om-mpirun -c 2 mpiapp
>
>
> Is there anyone using openmpi with xgrid successfully keen to share
> his/her
> experience? I am not new to xgrid or mpi, but to both integrated I
> must say
> that I am in uncharted waters.
>
> Any help would be very appreciated.
>
> Many thanks in advance,
> Alan
> --
> Alan Wilter S. da Silva, D.Sc. - CCPN Research Associate
> Department of Biochemistry, University of Cambridge.
> 80 Tennis Court Road, Cambridge CB2 1GA, UK.
>>> http://www.bio.cam.ac.uk/~awd28<<
> -------------- next part --------------
> HTML attachment scrubbed and removed
>
> ------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> End of users Digest, Vol 1318, Issue 2
> **************************************