Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Want to find LogGP parameters. Please help
From: Mudassar Majeed (mudassarm30_at_[hidden])
Date: 2011-10-26 12:48:14


Dear MPI people,                                    I want to use LogGP model with MPI to find a message with K bytes will take how much time. In this, I need to find Latency L, Overhead o and Gap G. Can somebody tell me how can I measure these three parameters of the underlying network ? and how often should I measure these parameters so that the predication of time for sending a message of K bytes remains accurate. regards, Mudassar ________________________________ From: "users-request_at_[hidden]" <users-request_at_[hidden]> To: users_at_[hidden] Sent: Wednesday, October 26, 2011 6:00 PM Subject: users Digest, Vol 2052, Issue 1 Send users mailing list submissions to     users_at_[hidden] To subscribe or unsubscribe via the World Wide Web, visit     http://www.open-mpi.org/mailman/listinfo.cgi/users or, via email, send a message with subject or body 'help' to     users-request_at_[hidden] You can reach the person managing the list at     users-owner_at_[hidden] When replying, please edit your Subject line so it is more specific than "Re: Contents of users digest..." Today's Topics:   1. Re: Problem-Bug with MPI_Intercomm_create() (Ralph Castain)   2. Re: Checkpoint from inside MPI program with OpenMPI 1.4.2 ?       (Josh Hursey)   3. Subnet routing (1.2.x) not working in 1.4.3 anymore (Mirco Wahab)   4. Re: mpirun should run with just the localhost    interface on       win? (MM)   5. Re: Checkpoint from inside MPI program with OpenMPI 1.4.2 ?       (Nguyen Toan)   6. Re: exited on signal 11 (Segmentation fault).       (Mouhamad Al-Sayed-Ali)   7. Changing plm_rsh_agent system wide (Patrick Begou)   8. Re: Checkpoint from inside MPI program with OpenMPI 1.4.2 ?       (Josh Hursey)   9. Re: Changing plm_rsh_agent system wide (Ralph Castain)   10. Re: Changing plm_rsh_agent system wide (TERRY DONTJE)   11. Re: Changing plm_rsh_agent system wide (TERRY DONTJE)   12. Re: Changing plm_rsh_agent system wide (Patrick Begou) ---------------------------------------------------------------------- Message: 1 Date: Tue, 25 Oct 2011 10:08:00 -0600 From: Ralph Castain <rhc_at_[hidden]> Subject: Re: [OMPI users] Problem-Bug with MPI_Intercomm_create() To: Open MPI Users <users_at_[hidden]> Message-ID: <30D41149-6683-41C2-ACE0-776C64E5C83C_at_[hidden]> Content-Type: text/plain; charset=iso-8859-1 FWIW: I have tracked this problem down. The fix is a little more complicated then I'd like, so I'm going to have to ping some other folks to ensure we concur on the approach before doing something. On Oct 25, 2011, at 8:20 AM, Ralph Castain wrote: > I still see it failing the test George provided on the trunk. I'm unaware of anyone looking further into it, though, as the prior discussion seemed to just end. > > On Oct 25, 2011, at 7:01 AM, orel wrote: > >> Dears, >> >> I try from several days to use advanced MPI2 features in the following scenario : >> >> 1) a master code A (of size NPA) spawns (MPI_Comm_spawn()) two slave >>    codes B (of size NPB) and C (of size NPC), providing intercomms A-B and A-C ; >> 2) i create intracomm AB and AC by merging intercomms ; >> 3) then i create intercomm AB-C by calling MPI_Intercomm_create() by using AC as bridge... >> >>  MPI_Comm intercommABC; A: MPI_Intercomm_create(intracommAB, 0, intracommAC, NPA, TAG,&intercommABC); >> B: MPI_Intercomm_create(intracommAB, 0, MPI_COMM_NULL, 0,TAG,&intercommABC); >> C: MPI_Intercomm_create(intracommC, 0, intracommAC, 0, TAG,&intercommABC); >> >>    In these calls, A0 and C0 play the role of local leader for AB and C respectively. >>    C0 and A0 play the roles of remote leader in bridge intracomm AC. >> >> 3)  MPI_Barrier(intercommABC); >> 4)  i merge intercomm AB-C into intracomm ABC$ >> 5)  MPI_Barrier(intracommABC); >> >> My BUG: These calls success, but when i try to use intracommABC for a collective communication like MPI_Barrier(), >>              i got the following error : >> >> *** An error occurred in MPI_Barrier >> *** on communicator >> *** MPI_ERR_INTERN: internal error >> *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort >> >> >> I try with OpenMPI trunk, 1.5.3, 1.5.4 and Mpich2-1.4.1p1 >> >> My code works perfectly if intracomm A, B and C are obtained by MPI_Comm_split() instead of MPI_Comm_spawn() !!!! >> >> >> I found same problem in a previous thread of the OMPI Users mailing list : >> >> => http://www.open-mpi.org/community/lists/users/2011/06/16711.php >> >> Is that bug/problem is currently under investigation ? :-) >> >> i can give detailed code, but the one provided by George Bosilca in this previous thread provides same error... >> >> Thank you to help me... >> >> -- >> Aur?lien Esnard >> University Bordeaux 1 / LaBRI / INRIA (France) >> _______________________________________________ >> users mailing list >> users_at_[hidden] >> http://www.open-mpi.org/mailman/listinfo.cgi/users > ------------------------------ Message: 2 Date: Tue, 25 Oct 2011 13:25:27 -0500 From: Josh Hursey <jjhursey_at_[hidden]> Subject: Re: [OMPI users] Checkpoint from inside MPI program with     OpenMPI 1.4.2 ? To: Open MPI Users <users_at_[hidden]> Message-ID:     <CAANzjEnOdwva5J4fFBmXtsK6Kj3yGE9j=dKdtaWuZs=wHzGbQg_at_[hidden]> Content-Type: text/plain; charset=ISO-8859-1 Open MPI (trunk/1.7 - not 1.4 or 1.5) provides an application level interface to request a checkpoint of an application. This API is defined on the following website:   http://osl.iu.edu/research/ft/ompi-cr/api.php#api-cr_checkpoint This will behave the same as if you requested the checkpoint of the job from the command line. -- Josh On Mon, Oct 24, 2011 at 12:37 PM, Nguyen Toan <nguyentoan1508_at_[hidden]> wrote: > Dear all, > I want to automatically checkpoint an MPI program with OpenMPI ( I'm > currently using 1.4.2 version with BLCR 0.8.2), > not by manually typing ompi-checkpoint command line from another terminal. > So I would like to know if there is a way to call checkpoint function from > inside an MPI program > with OpenMPI or how to do that. > Any ideas are very appreciated. > Regards, > Nguyen Toan > _______________________________________________ > users mailing list > users_at_[hidden] > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Joshua Hursey Postdoctoral Research Associate Oak Ridge National Laboratory http://users.nccs.gov/~jjhursey ------------------------------ Message: 3 Date: Tue, 25 Oct 2011 22:15:12 +0200 From: Mirco Wahab <mirco.wahab_at_[hidden]> Subject: [OMPI users] Subnet routing (1.2.x) not working in 1.4.3     anymore To: users_at_[hidden] Message-ID: <4EA718D0.5060005_at_[hidden]> Content-Type: text/plain; charset=ISO-8859-1; format=flowed In the last few years, it has been very simple to set up high-performance (GbE) multiple back-to-back connections between three nodes (triangular topology) or four nodes (tetrahedral topology). The only things you had to do was - use 3 (or 4) cheap compute nodes w/Linux and connect   each of them via standard GbE router (onboard GbE NIC)   to a file server, - put 2 (trigonal topol.) or 3 (tetrahedral topol.)   $25 PCIe-GbE-NICs into *each* node, - connect the nodes with 3 (trigonal) or 4 (tetrahedral)   short crossover Cat5e cables, - configure the extra NICs into different subnets   according to their "edge index", eg.   for 3 nodes (node10, node11, node12)     node10       onboard NIC: 192.168.0.10 on eth0 (to router/server)       extra NIC: 10.0.1.10 on eth1 (edge 1 to 10.0.1.11)       extra NIC: 10.0.2.10 on eth2 (edge 2 to 10.0.2.12)     node11       onboard NIC: 192.168.0.11 on eth0 (to router/server)       extra NIC: 10.0.1.11 on eth1 (edge 1 to 10.0.1.10)       extra NIC: 10.0.3.11 on eth3 (edge 3 to 10.0.3.12)     node12       onboard NIC: 192.168.0.12 on eth0 (to router/server)       extra NIC: 10.0.2.12 on eth2 (edge 2 to 10.0.2.10)       extra NIC: 10.0.3.12 on eth3 (edge 3 to 10.0.3.11) - that's it. I mean, that *was* it, with 1.2.x. OMPI 1.2.x would then ingeniously discover the routable edges and open communication ports accordingly without any additional explicit host routing, eg. invoked by $> mpirun -np 12 --host c10,c11,c12 --mca btl_tcp_if_exclude lo,eth0  my_mpi_app and (measured by iftop) saturate the available edges with about 100MB/sec duplex on each of them. It would not stumble on the fact, that some interfaces are not reacheable by every NIC directly. And this was very convenient over the years. With 1.4.3 (which comes out of the box) w/actual Linux distributions, this won't work. It would hang and complain after timeout about failed endpoint connects, eg: [node12][[52378,1],2][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] connect() to 10.0.1.11 failed: Connection timed out (110) * Can the intelligent behaviour of 1.2.x be "configured back"? * How should the topology look like to work with 1,4,x painlessly? Thanks & regards M. ------------------------------ Message: 4 Date: Tue, 25 Oct 2011 21:33:54 +0100 From: "MM" <finjulhich_at_[hidden]> Subject: Re: [OMPI users] mpirun should run with just the localhost     interface on win? To: "'openmpi mailing list'" <users_at_[hidden]> Message-ID: <00d601cc9355$6af47290$40dd57b0$@com> Content-Type: text/plain;    charset="us-ascii" -----Original Message----- if the interface is down, should localhost still allow mpirun to run mpi processes? ------------------------------ Message: 5 Date: Wed, 26 Oct 2011 13:52:17 +0900 From: Nguyen Toan <nguyentoan1508_at_[hidden]> Subject: Re: [OMPI users] Checkpoint from inside MPI program with     OpenMPI 1.4.2 ? To: Open MPI Users <users_at_[hidden]> Message-ID:     <CAFiEserJ0U9m9euy1-CA8m=_KihMM5s73qaJiii_N=p7f3Kdug_at_[hidden]> Content-Type: text/plain; charset="iso-8859-1" Dear Josh, Thank you. I will test the 1.7 trunk as you suggested. Also I want to ask if we can add this interface to OpenMPI 1.4.2, because my applications are mainly involved in this version. Regards, Nguyen Toan On Wed, Oct 26, 2011 at 3:25 AM, Josh Hursey <jjhursey_at_[hidden]> wrote: > Open MPI (trunk/1.7 - not 1.4 or 1.5) provides an application level > interface to request a checkpoint of an application. This API is > defined on the following website: >  http://osl.iu.edu/research/ft/ompi-cr/api.php#api-cr_checkpoint > > This will behave the same as if you requested the checkpoint of the > job from the command line. > > -- Josh > > On Mon, Oct 24, 2011 at 12:37 PM, Nguyen Toan <nguyentoan1508_at_[hidden]> > wrote: > > Dear all, > > I want to automatically checkpoint an MPI program with OpenMPI ( I'm > > currently using 1.4.2 version with BLCR 0.8.2), > > not by manually typing ompi-checkpoint command line from another > terminal. > > So I would like to know if there is a way to call checkpoint function > from > > inside an MPI program > > with OpenMPI or how to do that. > > Any ideas are very appreciated. > > Regards, > > Nguyen Toan > > _______________________________________________ > > users mailing list > > users_at_[hidden] > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > -- > Joshua Hursey > Postdoctoral Research Associate > Oak Ridge National Laboratory > http://users.nccs.gov/~jjhursey > _______________________________________________ > users mailing list > users_at_[hidden] > http://www.open-mpi.org/mailman/listinfo.cgi/users > -------------- next part -------------- HTML attachment scrubbed and removed ------------------------------ Message: 6 Date: Wed, 26 Oct 2011 09:57:38 +0200 From: Mouhamad Al-Sayed-Ali <Mouhamad.Al-Sayed-Ali_at_[hidden]> Subject: Re: [OMPI users] exited on signal 11 (Segmentation fault). To: Gus Correa <gus_at_[hidden]> Cc: Open MPI Users <users_at_[hidden]> Message-ID: <20111026095738.119675e8nwvpxhss_at_[hidden]> Content-Type: text/plain; charset=ISO-8859-1; DelSp="Yes";     format="flowed" Hi Gus Correa,   the output of ulimit -a    is ---- file(blocks)        unlimited coredump(blocks)    2048 data(kbytes)        unlimited stack(kbytes)        10240 lockedmem(kbytes)    unlimited memory(kbytes)      unlimited nofiles(descriptors) 1024 processes            256 -------- Thanks Mouhamad Gus Correa <gus_at_[hidden]> a ?crit?: > Hi Mouhamad > > The locked memory is set to unlimited, but the lines > about the stack are commented out. > Have you tried to add this line: > > *  -  stack      -1 > > then run wrf again? [Note no "#" hash character] > > Also, if you login to the compute nodes, > what is the output of 'limit' [csh,tcsh] or 'ulimit -a' [sh,bash]? > This should tell you what limits are actually set. > > I hope this helps, > Gus Correa > > Mouhamad Al-Sayed-Ali wrote: >> Hi all, >> >>  I've checked the "limits.conf", and it contains theses lines >> >> >> # Jcb 29.06.2007 : pbs wrf (Siji) >> #*      hard    stack  1000000 >> #*      soft    stack  1000000 >> >> # Dr 14.02.2008 : pour voltaire mpi >> *      hard    memlock unlimited >> *      soft    memlock unlimited >> >> >> >> Many thanks for your help >> Mouhamad >> >> Gus Correa <gus_at_[hidden]> a ?crit : >> >>> Hi Mouhamad, Ralph, Terry >>> >>> Very often big programs like wrf crash with segfault because they >>> can't allocate memory on the stack, and assume the system doesn't >>> impose any limits for it.  This has nothing to do with MPI. >>> >>> Mouhamad:  Check if your stack size is set to unlimited on all compute >>> nodes.  The easy way to get it done >>> is to change /etc/security/limits.conf, >>> where you or your system administrator could add these lines: >>> >>> *  -  memlock    -1 >>> *  -  stack      -1 >>> *  -  nofile      4096 >>> >>> My two cents, >>> Gus Correa >>> >>> Ralph Castain wrote: >>>> Looks like you are crashing in wrf - have you asked them for help? >>>> >>>> On Oct 25, 2011, at 7:53 AM, Mouhamad Al-Sayed-Ali wrote: >>>> >>>>> Hi again, >>>>> >>>>> This is exactly the error I have: >>>>> >>>>> ---- >>>>> taskid: 0 hostname: part034.u-bourgogne.fr >>>>> [part034:21443] *** Process received signal *** >>>>> [part034:21443] Signal: Segmentation fault (11) >>>>> [part034:21443] Signal code: Address not mapped (1) >>>>> [part034:21443] Failing at address: 0xfffffffe01eeb340 >>>>> [part034:21443] [ 0] /lib64/libpthread.so.0 [0x3612c0de70] >>>>> [part034:21443] [ 1] wrf.exe(__module_ra_rrtm_MOD_taugb3+0x418)  >>>>> [0x11cc9d8] >>>>> [part034:21443] [ 2] wrf.exe(__module_ra_rrtm_MOD_gasabs+0x260)  >>>>> [0x11cfca0] >>>>> [part034:21443] [ 3] wrf.exe(__module_ra_rrtm_MOD_rrtm+0xb31) [0x11e6e41] >>>>> [part034:21443] [ 4]  >>>>> wrf.exe(__module_ra_rrtm_MOD_rrtmlwrad+0x25ec) [0x11e9bcc] >>>>> [part034:21443] [ 5]  >>>>> wrf.exe(__module_radiation_driver_MOD_radiation_driver+0xe573)  >>>>> [0xcc4ed3] >>>>> [part034:21443] [ 6]  >>>>> wrf.exe(__module_first_rk_step_part1_MOD_first_rk_step_part1+0x40c5)  >>>>> [0xe0e4f5] >>>>> [part034:21443] [ 7] wrf.exe(solve_em_+0x22e58) [0x9b45c8] >>>>> [part034:21443] [ 8] wrf.exe(solve_interface_+0x80a) [0x902dda] >>>>> [part034:21443] [ 9]  >>>>> wrf.exe(__module_integrate_MOD_integrate+0x236) [0x4b2c4a] >>>>> [part034:21443] [10] wrf.exe(__module_wrf_top_MOD_wrf_run+0x24)  >>>>> [0x47a924] >>>>> [part034:21443] [11] wrf.exe(main+0x41) [0x4794d1] >>>>> [part034:21443] [12] /lib64/libc.so.6(__libc_start_main+0xf4)  >>>>> [0x361201d8b4] >>>>> [part034:21443] [13] wrf.exe [0x4793c9] >>>>> [part034:21443] *** End of error message *** >>>>> ------- >>>>> >>>>> Mouhamad >>>>> _______________________________________________ >>>>> users mailing list >>>>> users_at_[hidden] >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> users_at_[hidden] >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> _______________________________________________ >>> users mailing list >>> users_at_[hidden] >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >> > > _______________________________________________ > users mailing list > users_at_[hidden] > http://www.open-mpi.org/mailman/listinfo.cgi/users > > ------------------------------ Message: 7 Date: Wed, 26 Oct 2011 11:11:08 +0200 From: Patrick Begou <Patrick.Begou_at_[hidden]> Subject: [OMPI users] Changing plm_rsh_agent system wide To: Open MPI Users <users_at_[hidden]> Message-ID: <4EA7CEAC.3080800_at_[hidden]> Content-Type: text/plain; charset=ISO-8859-15; format=flowed I need to change system wide how OpenMPI launch the jobs on the nodes of my cluster. Setting: export OMPI_MCA_plm_rsh_agent=oarsh works fine but I would like this config to be the default with OpenMPI. I've read several threads (discussions, FAQ) about this but none of the provided solutions seams to work. I have two files: /usr/lib/openmpi/1.4-gcc/etc/openmpi-mca-params.conf /usr/lib64/openmpi/1.4-gcc/etc/openmpi-mca-params.conf In these files I've set various flavor of the syntax (only one at a time, and the same in each file of course!): test 1) plm_rsh_agent = oarsh test 2) pls_rsh_agent = oarsh test 3) orte_rsh_agent = oarsh But each time when I run "ompi_info --param plm rsh" I get: MCA plm: parameter "plm_rsh_agent" (current value: "ssh : rsh", data source: default value, synonyms:                   pls_rsh_agent)                   The command used to launch executables on remote nodes (typically either "ssh" or "rsh") With the exported variable it works fine. Any suggestion ? The rpm package of my linux Rocks Cluster provides:     Package: Open MPI root_at_build-x86-64 Distribution     Open MPI: 1.4.3     Open MPI SVN revision: r23834     Open MPI release date: Oct 05, 2010 Thanks Patrick   -- =============================================================== |  Equipe M.O.S.T.        | http://most.hmg.inpg.fr          | |  Patrick BEGOU          |      ------------              | |  LEGI                    | mailto:Patrick.Begou_at_[hidden] | |  BP 53 X                | Tel 04 76 82 51 35              | |  38041 GRENOBLE CEDEX    | Fax 04 76 82 52 71              | =============================================================== ------------------------------ Message: 8 Date: Wed, 26 Oct 2011 07:20:38 -0500 From: Josh Hursey <jjhursey_at_[hidden]> Subject: Re: [OMPI users] Checkpoint from inside MPI program with     OpenMPI 1.4.2 ? To: Open MPI Users <users_at_[hidden]> Message-ID:     <CAANzjEmx=sO_9mtzVM+WiPLWFhPSiM6UxeosxNPgdd8QUZObCw_at_[hidden]> Content-Type: text/plain; charset=ISO-8859-1 Since this would be a new feature for 1.4, we cannot move it since the 1.4 branch is for bug fixes only. However, we may be able to add it to 1.5. I filed a ticket if you want to track that progress:   https://svn.open-mpi.org/trac/ompi/ticket/2895 -- Josh On Tue, Oct 25, 2011 at 11:52 PM, Nguyen Toan <nguyentoan1508_at_[hidden]> wrote: > Dear Josh, > Thank you. I will test the 1.7 trunk as you suggested. > Also I want to ask if we can add this interface to OpenMPI 1.4.2, > because my applications are mainly involved in this version. > Regards, > Nguyen Toan > On Wed, Oct 26, 2011 at 3:25 AM, Josh Hursey <jjhursey_at_[hidden]> wrote: >> >> Open MPI (trunk/1.7 - not 1.4 or 1.5) provides an application level >> interface to request a checkpoint of an application. This API is >> defined on the following website: >> ?http://osl.iu.edu/research/ft/ompi-cr/api.php#api-cr_checkpoint >> >> This will behave the same as if you requested the checkpoint of the >> job from the command line. >> >> -- Josh >> >> On Mon, Oct 24, 2011 at 12:37 PM, Nguyen Toan <nguyentoan1508_at_[hidden]> >> wrote: >> > Dear all, >> > I want to automatically checkpoint an MPI program with OpenMPI ( I'm >> > currently using 1.4.2 version with BLCR 0.8.2), >> > not by manually typing ompi-checkpoint command line from another >> > terminal. >> > So I would like to know if there is a way to call checkpoint function >> > from >> > inside an MPI program >> > with OpenMPI or how to do that. >> > Any ideas are very appreciated. >> > Regards, >> > Nguyen Toan >> > _______________________________________________ >> > users mailing list >> > users_at_[hidden] >> > http://www.open-mpi.org/mailman/listinfo.cgi/users >> > >> >> >> >> -- >> Joshua Hursey >> Postdoctoral Research Associate >> Oak Ridge National Laboratory >> http://users.nccs.gov/~jjhursey >> _______________________________________________ >> users mailing list >> users_at_[hidden] >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > users_at_[hidden] > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Joshua Hursey Postdoctoral Research Associate Oak Ridge National Laboratory http://users.nccs.gov/~jjhursey ------------------------------ Message: 9 Date: Wed, 26 Oct 2011 08:44:45 -0600 From: Ralph Castain <rhc_at_[hidden]> Subject: Re: [OMPI users] Changing plm_rsh_agent system wide To: Open MPI Users <users_at_[hidden]> Message-ID: <F188CF99-9A7A-4327-AF9C-51D578CD54C4_at_[hidden]> Content-Type: text/plain; charset=us-ascii Did the version you are running get installed in /usr? Sounds like you are picking up a different version when running a command - i.e., that your PATH is finding a different installation than the one in /usr. On Oct 26, 2011, at 3:11 AM, Patrick Begou wrote: > I need to change system wide how OpenMPI launch the jobs on the nodes of my cluster. > > Setting: > export OMPI_MCA_plm_rsh_agent=oarsh > > works fine but I would like this config to be the default with OpenMPI. I've read several threads (discussions, FAQ) about this but none of the provided solutions seams to work. > > I have two files: > /usr/lib/openmpi/1.4-gcc/etc/openmpi-mca-params.conf > /usr/lib64/openmpi/1.4-gcc/etc/openmpi-mca-params.conf > > In these files I've set various flavor of the syntax (only one at a time, and the same in each file of course!): > test 1) plm_rsh_agent = oarsh > test 2) pls_rsh_agent = oarsh > test 3) orte_rsh_agent = oarsh > > But each time when I run "ompi_info --param plm rsh" I get: > MCA plm: parameter "plm_rsh_agent" (current value: "ssh : rsh", data source: default value, synonyms: >                  pls_rsh_agent) >                  The command used to launch executables on remote nodes (typically either "ssh" or "rsh") > > With the exported variable it works fine. > Any suggestion ? > > The rpm package of my linux Rocks Cluster provides: >  Package: Open MPI root_at_build-x86-64 Distribution >  Open MPI: 1.4.3 >  Open MPI SVN revision: r23834 >  Open MPI release date: Oct 05, 2010 > > Thanks > > Patrick > > > > -- > =============================================================== > |  Equipe M.O.S.T.        | http://most.hmg.inpg.fr          | > |  Patrick BEGOU          |      ------------              | > |  LEGI                    | mailto:Patrick.Begou_at_[hidden] | > |  BP 53 X                | Tel 04 76 82 51 35              | > |  38041 GRENOBLE CEDEX    | Fax 04 76 82 52 71              | > =============================================================== > > _______________________________________________ > users mailing list > users_at_[hidden] > http://www.open-mpi.org/mailman/listinfo.cgi/users ------------------------------ Message: 10 Date: Wed, 26 Oct 2011 10:49:38 -0400 From: TERRY DONTJE <terry.dontje_at_[hidden]> Subject: Re: [OMPI users] Changing plm_rsh_agent system wide To: Open MPI Users <users_at_[hidden]> Message-ID: <4EA81E02.6080609_at_[hidden]> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed" I am using prefix configuration so no it does not exist in /usr. --td On 10/26/2011 10:44 AM, Ralph Castain wrote: > Did the version you are running get installed in /usr? Sounds like you are picking up a different version when running a command - i.e., that your PATH is finding a different installation than the one in /usr. > > > On Oct 26, 2011, at 3:11 AM, Patrick Begou wrote: > >> I need to change system wide how OpenMPI launch the jobs on the nodes of my cluster. >> >> Setting: >> export OMPI_MCA_plm_rsh_agent=oarsh >> >> works fine but I would like this config to be the default with OpenMPI. I've read several threads (discussions, FAQ) about this but none of the provided solutions seams to work. >> >> I have two files: >> /usr/lib/openmpi/1.4-gcc/etc/openmpi-mca-params.conf >> /usr/lib64/openmpi/1.4-gcc/etc/openmpi-mca-params.conf >> >> In these files I've set various flavor of the syntax (only one at a time, and the same in each file of course!): >> test 1) plm_rsh_agent = oarsh >> test 2) pls_rsh_agent = oarsh >> test 3) orte_rsh_agent = oarsh >> >> But each time when I run "ompi_info --param plm rsh" I get: >> MCA plm: parameter "plm_rsh_agent" (current value: "ssh : rsh", data source: default value, synonyms: >>                  pls_rsh_agent) >>                  The command used to launch executables on remote nodes (typically either "ssh" or "rsh") >> >> With the exported variable it works fine. >> Any suggestion ? >> >> The rpm package of my linux Rocks Cluster provides: >>    Package: Open MPI root_at_build-x86-64 Distribution >>    Open MPI: 1.4.3 >>    Open MPI SVN revision: r23834 >>    Open MPI release date: Oct 05, 2010 >> >> Thanks >> >> Patrick >> >> >> >> -- >> =============================================================== >> |  Equipe M.O.S.T.        | http://most.hmg.inpg.fr          | >> |  Patrick BEGOU          |      ------------              | >> |  LEGI                    | mailto:Patrick.Begou_at_[hidden] | >> |  BP 53 X                | Tel 04 76 82 51 35              | >> |  38041 GRENOBLE CEDEX    | Fax 04 76 82 52 71              | >> =============================================================== >> >> _______________________________________________ >> users mailing list >> users_at_[hidden] >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > users_at_[hidden] > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Oracle Terry D. Dontje | Principal Software Engineer Developer Tools Engineering | +1.781.442.2631 Oracle *- Performance Technologies* 95 Network Drive, Burlington, MA 01803 Email terry.dontje_at_[hidden] <mailto:terry.dontje_at_[hidden]> -------------- next part -------------- HTML attachment scrubbed and removed -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 2059 bytes Desc: not available URL: <http://www.open-mpi.org/MailArchives/users/attachments/20111026/2e811d83/attachment.gif> ------------------------------ Message: 11 Date: Wed, 26 Oct 2011 10:51:06 -0400 From: TERRY DONTJE <terry.dontje_at_[hidden]> Subject: Re: [OMPI users] Changing plm_rsh_agent system wide To: Open MPI Users <users_at_[hidden]> Message-ID: <4EA81E5A.3030606_at_[hidden]> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed" Sorry please disregard my reply to this email. :-) --td On 10/26/2011 10:44 AM, Ralph Castain wrote: > Did the version you are running get installed in /usr? Sounds like you are picking up a different version when running a command - i.e., that your PATH is finding a different installation than the one in /usr. > > > On Oct 26, 2011, at 3:11 AM, Patrick Begou wrote: > >> I need to change system wide how OpenMPI launch the jobs on the nodes of my cluster. >> >> Setting: >> export OMPI_MCA_plm_rsh_agent=oarsh >> >> works fine but I would like this config to be the default with OpenMPI. I've read several threads (discussions, FAQ) about this but none of the provided solutions seams to work. >> >> I have two files: >> /usr/lib/openmpi/1.4-gcc/etc/openmpi-mca-params.conf >> /usr/lib64/openmpi/1.4-gcc/etc/openmpi-mca-params.conf >> >> In these files I've set various flavor of the syntax (only one at a time, and the same in each file of course!): >> test 1) plm_rsh_agent = oarsh >> test 2) pls_rsh_agent = oarsh >> test 3) orte_rsh_agent = oarsh >> >> But each time when I run "ompi_info --param plm rsh" I get: >> MCA plm: parameter "plm_rsh_agent" (current value: "ssh : rsh", data source: default value, synonyms: >>                  pls_rsh_agent) >>                  The command used to launch executables on remote nodes (typically either "ssh" or "rsh") >> >> With the exported variable it works fine. >> Any suggestion ? >> >> The rpm package of my linux Rocks Cluster provides: >>    Package: Open MPI root_at_build-x86-64 Distribution >>    Open MPI: 1.4.3 >>    Open MPI SVN revision: r23834 >>    Open MPI release date: Oct 05, 2010 >> >> Thanks >> >> Patrick >> >> >> >> -- >> =============================================================== >> |  Equipe M.O.S.T.        | http://most.hmg.inpg.fr          | >> |  Patrick BEGOU          |      ------------              | >> |  LEGI                    | mailto:Patrick.Begou_at_[hidden] | >> |  BP 53 X                | Tel 04 76 82 51 35              | >> |  38041 GRENOBLE CEDEX    | Fax 04 76 82 52 71              | >> =============================================================== >> >> _______________________________________________ >> users mailing list >> users_at_[hidden] >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > users_at_[hidden] > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Oracle Terry D. Dontje | Principal Software Engineer Developer Tools Engineering | +1.781.442.2631 Oracle *- Performance Technologies* 95 Network Drive, Burlington, MA 01803 Email terry.dontje_at_[hidden] <mailto:terry.dontje_at_[hidden]> -------------- next part -------------- HTML attachment scrubbed and removed -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 2059 bytes Desc: not available URL: <http://www.open-mpi.org/MailArchives/users/attachments/20111026/5d399085/attachment.gif> ------------------------------ Message: 12 Date: Wed, 26 Oct 2011 17:57:54 +0200 From: Patrick Begou <Patrick.Begou_at_[hidden]> Subject: Re: [OMPI users] Changing plm_rsh_agent system wide To: Open MPI Users <users_at_[hidden]> Message-ID: <4EA82E02.9020107_at_[hidden]> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Ralph Castain a ?crit : > Did the version you are running get installed in /usr? Sounds like you are picking up a different version when running a command - i.e., that your PATH is finding a different installation than the one in /usr. Right! I'm using OpenMPI with Rocks Cluster distribution. There is:   openmpi-1.4-4.el5 rpm installed with /usr/lib*/openmpi/1.4-gcc/etc/openmpi-mca-params.conf but there is also  rocks-openmpi-1.4.3-1 with /opt/openmpi/etc/openmpi-mca-params.conf I never notice this double default install of OpenMPI in this linux distribution. Thanks a lot for the suggestion, I was fixed on a syntax error in my config... Patrick > > > On Oct 26, 2011, at 3:11 AM, Patrick Begou wrote: > >> I need to change system wide how OpenMPI launch the jobs on the nodes of my cluster. >> >> Setting: >> export OMPI_MCA_plm_rsh_agent=oarsh >> >> works fine but I would like this config to be the default with OpenMPI. I've read several threads (discussions, FAQ) about this but none of the provided solutions seams to work. >> >> I have two files: >> /usr/lib/openmpi/1.4-gcc/etc/openmpi-mca-params.conf >> /usr/lib64/openmpi/1.4-gcc/etc/openmpi-mca-params.conf >> >> In these files I've set various flavor of the syntax (only one at a time, and the same in each file of course!): >> test 1) plm_rsh_agent = oarsh >> test 2) pls_rsh_agent = oarsh >> test 3) orte_rsh_agent = oarsh >> >> But each time when I run "ompi_info --param plm rsh" I get: >> MCA plm: parameter "plm_rsh_agent" (current value: "ssh : rsh", data source: default value, synonyms: >>                  pls_rsh_agent) >>                  The command used to launch executables on remote nodes (typically either "ssh" or "rsh") >> >> With the exported variable it works fine. >> Any suggestion ? >> >> The rpm package of my linux Rocks Cluster provides: >>    Package: Open MPI root_at_build-x86-64 Distribution >>    Open MPI: 1.4.3 >>    Open MPI SVN revision: r23834 >>    Open MPI release date: Oct 05, 2010 >> >> Thanks >> >> Patrick >> >> >> >> -- >> =============================================================== >> |  Equipe M.O.S.T.        | http://most.hmg.inpg.fr          | >> |  Patrick BEGOU          |      ------------              | >> |  LEGI                    | mailto:Patrick.Begou_at_[hidden] | >> |  BP 53 X                | Tel 04 76 82 51 35              | >> |  38041 GRENOBLE CEDEX    | Fax 04 76 82 52 71              | >> =============================================================== >> >> _______________________________________________ >> users mailing list >> users_at_[hidden] >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > users_at_[hidden] > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- =============================================================== |  Equipe M.O.S.T.        | http://most.hmg.inpg.fr          | |  Patrick BEGOU          |      ------------              | |  LEGI                    | mailto:Patrick.Begou_at_[hidden] | |  BP 53 X                | Tel 04 76 82 51 35              | |  38041 GRENOBLE CEDEX    | Fax 04 76 82 52 71              | =============================================================== ------------------------------ _______________________________________________ users mailing list users_at_[hidden] http://www.open-mpi.org/mailman/listinfo.cgi/users End of users Digest, Vol 2052, Issue 1 **************************************