Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] openmpi 1.2.8 on Xgrid noob issue
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2011-08-04 20:48:30


I'm afraid our Xgrid support has lagged, and Apple hasn't show much interest in MPI + Xgrid support -- much less HPC. :-\

Have you see the FAQ items about Xgrid?

    http://www.open-mpi.org/faq/?category=osx#xgrid-howto

On Aug 4, 2011, at 4:16 AM, Christopher Jones wrote:

> Hi there,
>
> I'm currently trying to set up a small xgrid between two mac pros (a single quadcore and a 2 duo core), where both are directly connected via an ethernet cable. I've set up xgrid using the password authentication (rather than the kerberos), and from what I can tell in the Xgrid admin tool it seems to be working. However, once I try a simple hello world program, I get this error:
>
> chris-joness-mac-pro:~ chrisjones$ mpirun -np 4 ./test_hello
> mpirun noticed that job rank 0 with PID 381 on node xgrid-node-0 exited on signal 15 (Terminated).
> 1 additional process aborted (not shown)
> 2011-08-04 10:02:16.329 mpirun[350:903] *** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: '*** -[NSKVONotifying_XGConnection<0x1001325a0> finalize]: called when collecting not enabled'
> *** Call stack at first throw:
> (
> 0 CoreFoundation 0x00007fff814237b4 __exceptionPreprocess + 180
> 1 libobjc.A.dylib 0x00007fff84fe8f03 objc_exception_throw + 45
> 2 CoreFoundation 0x00007fff8143e631 -[NSObject(NSObject) finalize] + 129
> 3 mca_pls_xgrid.so 0x00000001002a9ce3 -[PlsXGridClient dealloc] + 419
> 4 mca_pls_xgrid.so 0x00000001002a9837 orte_pls_xgrid_finalize + 40
> 5 libopen-rte.0.dylib 0x000000010002d0f9 orte_pls_base_close + 249
> 6 libopen-rte.0.dylib 0x0000000100012027 orte_system_finalize + 119
> 7 libopen-rte.0.dylib 0x000000010000e968 orte_finalize + 40
> 8 mpirun 0x00000001000011ff orterun + 2042
> 9 mpirun 0x0000000100000a03 main + 27
> 10 mpirun 0x00000001000009e0 start + 52
> 11 ??? 0x0000000000000004 0x0 + 4
> )
> terminate called after throwing an instance of 'NSException'
> [chris-joness-mac-pro:00350] *** Process received signal ***
> [chris-joness-mac-pro:00350] Signal: Abort trap (6)
> [chris-joness-mac-pro:00350] Signal code: (0)
> [chris-joness-mac-pro:00350] [ 0] 2 libSystem.B.dylib 0x00007fff81ca51ba _sigtramp + 26
> [chris-joness-mac-pro:00350] [ 1] 3 ??? 0x00000001000cd400 0x0 + 4295808000
> [chris-joness-mac-pro:00350] [ 2] 4 libstdc++.6.dylib 0x00007fff830965d2 __tcf_0 + 0
> [chris-joness-mac-pro:00350] [ 3] 5 libobjc.A.dylib 0x00007fff84fecb39 _objc_terminate + 100
> [chris-joness-mac-pro:00350] [ 4] 6 libstdc++.6.dylib 0x00007fff83094ae1 _ZN10__cxxabiv111__terminateEPFvvE + 11
> [chris-joness-mac-pro:00350] [ 5] 7 libstdc++.6.dylib 0x00007fff83094b16 _ZN10__cxxabiv112__unexpectedEPFvvE + 0
> [chris-joness-mac-pro:00350] [ 6] 8 libstdc++.6.dylib 0x00007fff83094bfc _ZL23__gxx_exception_cleanup19_Unwind_Reason_CodeP17_Unwind_Exception + 0
> [chris-joness-mac-pro:00350] [ 7] 9 libobjc.A.dylib 0x00007fff84fe8fa2 object_getIvar + 0
> [chris-joness-mac-pro:00350] [ 8] 10 CoreFoundation 0x00007fff8143e631 -[NSObject(NSObject) finalize] + 129
> [chris-joness-mac-pro:00350] [ 9] 11 mca_pls_xgrid.so 0x00000001002a9ce3 -[PlsXGridClient dealloc] + 419
> [chris-joness-mac-pro:00350] [10] 12 mca_pls_xgrid.so 0x00000001002a9837 orte_pls_xgrid_finalize + 40
> [chris-joness-mac-pro:00350] [11] 13 libopen-rte.0.dylib 0x000000010002d0f9 orte_pls_base_close + 249
> [chris-joness-mac-pro:00350] [12] 14 libopen-rte.0.dylib 0x0000000100012027 orte_system_finalize + 119
> [chris-joness-mac-pro:00350] [13] 15 libopen-rte.0.dylib 0x000000010000e968 orte_finalize + 40
> [chris-joness-mac-pro:00350] [14] 16 mpirun 0x00000001000011ff orterun + 2042
> [chris-joness-mac-pro:00350] [15] 17 mpirun 0x0000000100000a03 main + 27
> [chris-joness-mac-pro:00350] [16] 18 mpirun 0x00000001000009e0 start + 52
> [chris-joness-mac-pro:00350] [17] 19 ??? 0x0000000000000004 0x0 + 4
> [chris-joness-mac-pro:00350] *** End of error message ***
> Abort trap
>
>
> I've seen this error in a previous mailing, and it seems that the issue has something to do with forcing everything to use kerberos (SSO). However, I noticed that in the computer being used as an agent, this option is grayed on in the Xgrid sharing configuration (I have no idea why). I would therefore ask if it is absolutely necessary to use SSO to get openmpi to run with xgrid, or am I missing something with the password setup. Seems that the kerberos option is much more complicated, and I may even want to switch to just using openmpi with ssh.
>
> Many thanks,
> Chris
>
>
> Chris Jones
> Post-doctoral Research Assistant,
>
> Department of Microbiology
> Swedish University of Agricultural Sciences
> Uppsala, Sweden
> phone: +46 (0)18 67 3222
> email: chris.jones_at_[hidden]
>
> Department of Soil and Environmental Microbiology
> National Institute for Agronomic Research
> Dijon, France
>
>
>
>
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/