Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-05-04 11:42:48


FWIW, I don't use Xcode, but I use the precompiled gcc/gfortran from
here with good success:

     http://hpc.sourceforge.net/

On May 4, 2009, at 11:38 AM, Warner Yuen wrote:

> Have you installed a Fortran compiler? Mac OS X's developer tools do
> not come with a Fortran compiler, so you'll need to install one if you
> haven't already done so. I routinely use the Intel IFORT compilers
> with success. However, I hear many good things about the gfortran
> compilers on Mac OS X, you can't beat the price of gfortran!
>
>
> Warner Yuen
> Scientific Computing
> Consulting Engineer
> Apple, Inc.
> email: wyuen_at_[hidden]
> Tel: 408.718.2859
>
>
>
>
> On May 4, 2009, at 7:28 AM, users-request_at_[hidden] wrote:
>
> > Send users mailing list submissions to
> > users_at_[hidden]
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > or, via email, send a message with subject or body 'help' to
> > users-request_at_[hidden]
> >
> > You can reach the person managing the list at
> > users-owner_at_[hidden]
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of users digest..."
> >
> >
> > Today's Topics:
> >
> > 1. How do I compile OpenMPI in Xcode 3.1 (Vicente)
> > 2. Re: 1.3.1 -rf rankfile behaviour ?? (Ralph Castain)
> >
> >
> >
> ----------------------------------------------------------------------
> >
> > Message: 1
> > Date: Mon, 4 May 2009 16:12:44 +0200
> > From: Vicente <vpuibor_at_[hidden]>
> > Subject: [OMPI users] How do I compile OpenMPI in Xcode 3.1
> > To: users_at_[hidden]
> > Message-ID: <1C2C0085-940F-43BB-910F-975871AE2F09_at_[hidden]>
> > Content-Type: text/plain; charset="windows-1252"; Format="flowed";
> > DelSp="yes"
> >
> > Hi, I've seen the FAQ "How do I use Open MPI wrapper compilers in
> > Xcode", but it's only for MPICC. I am using MPIF90, so I did the
> same,
> > but changing MPICC for MPIF90, and also the path, but it did not
> work.
> >
> > Building target ?fortran? of project ?fortran? with configuration
> > ?Debug?
> >
> >
> > Checking Dependencies
> > Invalid value 'MPIF90' for GCC_VERSION
> >
> >
> > The file "MPIF90.cpcompspec" looks like this:
> >
> > 1 /**
> > 2 Xcode Coompiler Specification for MPIF90
> > 3
> > 4 */
> > 5
> > 6 { Type = Compiler;
> > 7 Identifier = com.apple.compilers.mpif90;
> > 8 BasedOn = com.apple.compilers.gcc.4_0;
> > 9 Name = "MPIF90";
> > 10 Version = "Default";
> > 11 Description = "MPI GNU C/C++ Compiler 4.0";
> > 12 ExecPath = "/usr/local/bin/mpif90"; // This gets
> > converted to the g++ variant automatically
> > 13 PrecompStyle = pch;
> > 14 }
> >
> > and is located in "/Developer/Library/Xcode/Plug-ins"
> >
> > and when I do mpif90 -v on terminal it works well:
> >
> > Using built-in specs.
> > Target: i386-apple-darwin8.10.1
> > Configured with: /tmp/gfortran-20090321/ibin/../gcc/configure --
> > prefix=/usr/local/gfortran --enable-languages=c,fortran --with-gmp=/
> > tmp/gfortran-20090321/gfortran_libs --enable-bootstrap
> > Thread model: posix
> > gcc version 4.4.0 20090321 (experimental) [trunk revision 144983]
> > (GCC)
> >
> >
> > Any idea??
> >
> > Thanks.
> >
> > Vincent
> > -------------- next part --------------
> > HTML attachment scrubbed and removed
> >
> > ------------------------------
> >
> > Message: 2
> > Date: Mon, 4 May 2009 08:28:26 -0600
> > From: Ralph Castain <rhc_at_[hidden]>
> > Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
> > To: Open MPI Users <users_at_[hidden]>
> > Message-ID:
> > <71d2d8cc0905040728h2002f4d7s4c49219eee29e86f_at_[hidden]>
> > Content-Type: text/plain; charset="iso-8859-1"
> >
> > Unfortunately, I didn't write any of that code - I was just fixing
> the
> > mapper so it would properly map the procs. From what I can tell, the
> > proper
> > things are happening there.
> >
> > I'll have to dig into the code that specifically deals with parsing
> > the
> > results to bind the processes. Afraid that will take awhile longer -
> > pretty
> > dark in that hole.
> >
> >
> > On Mon, May 4, 2009 at 8:04 AM, Geoffroy Pignot
> > <geopignot_at_[hidden]> wrote:
> >
> >> Hi,
> >>
> >> So, there are no more crashes with my "crazy" mpirun command. But
> the
> >> paffinity feature seems to be broken. Indeed I am not able to pin
> my
> >> processes.
> >>
> >> Simple test with a program using your plpa library :
> >>
> >> r011n006% cat hostf
> >> r011n006 slots=4
> >>
> >> r011n006% cat rankf
> >> rank 0=r011n006 slot=0 ----> bind to CPU 0 , exact ?
> >>
> >> r011n006% /tmp/HALMPI/openmpi-1.4a/bin/mpirun --hostfile hostf --
> >> rankfile
> >> rankf --wdir /tmp -n 1 a.out
> >>>>> PLPA Number of processors online: 4
> >>>>> PLPA Number of processor sockets: 2
> >>>>> PLPA Socket 0 (ID 0): 2 cores
> >>>>> PLPA Socket 1 (ID 3): 2 cores
> >>
> >> Ctrl+Z
> >> r011n006%bg
> >>
> >> r011n006% ps axo stat,user,psr,pid,pcpu,comm | grep gpignot
> >> R+ gpignot 3 9271 97.8 a.out
> >>
> >> In fact whatever the slot number I put in my rankfile , a.out
> >> always runs
> >> on the CPU 3. I was looking for it on CPU 0 accordind to my cpuinfo
> >> file
> >> (see below)
> >> The result is the same if I try another syntax (rank 0=r011n006
> >> slot=0:0
> >> bind to socket 0 - core 0 , exact ? )
> >>
> >> Thanks in advance
> >>
> >> Geoffroy
> >>
> >> PS: I run on rhel5
> >>
> >> r011n006% uname -a
> >> Linux r011n006 2.6.18-92.1.1NOMAP32.el5 #1 SMP Sat Mar 15 01:46:39
> >> CDT 2008
> >> x86_64 x86_64 x86_64 GNU/Linux
> >>
> >> My configure is :
> >> ./configure --prefix=/tmp/openmpi-1.4a --libdir='${exec_prefix}/
> >> lib64'
> >> --disable-dlopen --disable-mpi-cxx --enable-heterogeneous
> >>
> >>
> >> r011n006% cat /proc/cpuinfo
> >> processor : 0
> >> vendor_id : GenuineIntel
> >> cpu family : 6
> >> model : 15
> >> model name : Intel(R) Xeon(R) CPU 5150 @ 2.66GHz
> >> stepping : 6
> >> cpu MHz : 2660.007
> >> cache size : 4096 KB
> >> physical id : 0
> >> siblings : 2
> >> core id : 0
> >> cpu cores : 2
> >> fpu : yes
> >> fpu_exception : yes
> >> cpuid level : 10
> >> wp : yes
> >> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
> >> pge mca
> >> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall
> >> nx lm
> >> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
> >> bogomips : 5323.68
> >> clflush size : 64
> >> cache_alignment : 64
> >> address sizes : 36 bits physical, 48 bits virtual
> >> power management:
> >>
> >> processor : 1
> >> vendor_id : GenuineIntel
> >> cpu family : 6
> >> model : 15
> >> model name : Intel(R) Xeon(R) CPU 5150 @ 2.66GHz
> >> stepping : 6
> >> cpu MHz : 2660.007
> >> cache size : 4096 KB
> >> physical id : 3
> >> siblings : 2
> >> core id : 0
> >> cpu cores : 2
> >> fpu : yes
> >> fpu_exception : yes
> >> cpuid level : 10
> >> wp : yes
> >> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
> >> pge mca
> >> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall
> >> nx lm
> >> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
> >> bogomips : 5320.03
> >> clflush size : 64
> >> cache_alignment : 64
> >> address sizes : 36 bits physical, 48 bits virtual
> >> power management:
> >>
> >> processor : 2
> >> vendor_id : GenuineIntel
> >> cpu family : 6
> >> model : 15
> >> model name : Intel(R) Xeon(R) CPU 5150 @ 2.66GHz
> >> stepping : 6
> >> cpu MHz : 2660.007
> >> cache size : 4096 KB
> >> physical id : 0
> >> siblings : 2
> >> core id : 1
> >> cpu cores : 2
> >> fpu : yes
> >> fpu_exception : yes
> >> cpuid level : 10
> >> wp : yes
> >> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
> >> pge mca
> >> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall
> >> nx lm
> >> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
> >> bogomips : 5319.39
> >> clflush size : 64
> >> cache_alignment : 64
> >> address sizes : 36 bits physical, 48 bits virtual
> >> power management:
> >>
> >> processor : 3
> >> vendor_id : GenuineIntel
> >> cpu family : 6
> >> model : 15
> >> model name : Intel(R) Xeon(R) CPU 5150 @ 2.66GHz
> >> stepping : 6
> >> cpu MHz : 2660.007
> >> cache size : 4096 KB
> >> physical id : 3
> >> siblings : 2
> >> core id : 1
> >> cpu cores : 2
> >> fpu : yes
> >> fpu_exception : yes
> >> cpuid level : 10
> >> wp : yes
> >> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
> >> pge mca
> >> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall
> >> nx lm
> >> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
> >> bogomips : 5320.03
> >> clflush size : 64
> >> cache_alignment : 64
> >> address sizes : 36 bits physical, 48 bits virtual
> >> power management:
> >>
> >>
> >>> ------------------------------
> >>>
> >>> Message: 2
> >>> Date: Mon, 4 May 2009 04:45:57 -0600
> >>> From: Ralph Castain <rhc_at_[hidden]>
> >>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
> >>> To: Open MPI Users <users_at_[hidden]>
> >>> Message-ID: <D01D7B16-4B47-46F3-AD41-D1A90B2E4927_at_[hidden]>
> >>>
> >>> Content-Type: text/plain; charset="us-ascii"; Format="flowed";
> >>> DelSp="yes"
> >>>
> >>> My apologies - I wasn't clear enough. You need a tarball from
> r21111
> >>> or greater...such as:
> >>>
> >>> http://www.open-mpi.org/nightly/trunk/openmpi-1.4a1r21142.tar.gz
> >>>
> >>> HTH
> >>> Ralph
> >>>
> >>>
> >>> On May 4, 2009, at 2:14 AM, Geoffroy Pignot wrote:
> >>>
> >>>> Hi ,
> >>>>
> >>>> I got the openmpi-1.4a1r21095.tar.gz tarball, but unfortunately
> my
> >>>> command doesn't work
> >>>>
> >>>> cat rankf:
> >>>> rank 0=node1 slot=*
> >>>> rank 1=node2 slot=*
> >>>>
> >>>> cat hostf:
> >>>> node1 slots=2
> >>>> node2 slots=2
> >>>>
> >>>> mpirun --rankfile rankf --hostfile hostf --host node1 -n 1
> >>>> hostname : --host node2 -n 1 hostname
> >>>>
> >>>> Error, invalid rank (1) in the rankfile (rankf)
> >>>>
> >>>>
> >>>
> --------------------------------------------------------------------------
> >>>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in
> >>>> file
> >>>> rmaps_rank_file.c at line 403
> >>>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in
> >>>> file
> >>>> base/rmaps_base_map_job.c at line 86
> >>>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in
> >>>> file
> >>>> base/plm_base_launch_support.c at line 86
> >>>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in
> >>>> file
> >>>> plm_rsh_module.c at line 1016
> >>>>
> >>>>
> >>>> Ralph, could you tell me if my command syntax is correct or not ?
> >>>> if
> >>>> not, give me the expected one ?
> >>>>
> >>>> Regards
> >>>>
> >>>> Geoffroy
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> 2009/4/30 Geoffroy Pignot <geopignot_at_[hidden]>
> >>>> Immediately Sir !!! :)
> >>>>
> >>>> Thanks again Ralph
> >>>>
> >>>> Geoffroy
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> ------------------------------
> >>>>
> >>>> Message: 2
> >>>> Date: Thu, 30 Apr 2009 06:45:39 -0600
> >>>> From: Ralph Castain <rhc_at_[hidden]>
> >>>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
> >>>> To: Open MPI Users <users_at_[hidden]>
> >>>> Message-ID:
> >>>> <71d2d8cc0904300545v61a42fe1k50086d2704d0f7e6_at_[hidden]
> >
> >>>> Content-Type: text/plain; charset="iso-8859-1"
> >>>>
> >>>> I believe this is fixed now in our development trunk - you can
> >>>> download any
> >>>> tarball starting from last night and give it a try, if you like.
> >>>> Any
> >>>> feedback would be appreciated.
> >>>>
> >>>> Ralph
> >>>>
> >>>>
> >>>> On Apr 14, 2009, at 7:57 AM, Ralph Castain wrote:
> >>>>
> >>>> Ah now, I didn't say it -worked-, did I? :-)
> >>>>
> >>>> Clearly a bug exists in the program. I'll try to take a look at
> it
> >>>> (if Lenny
> >>>> doesn't get to it first), but it won't be until later in the
> week.
> >>>>
> >>>> On Apr 14, 2009, at 7:18 AM, Geoffroy Pignot wrote:
> >>>>
> >>>> I agree with you Ralph , and that 's what I expect from openmpi
> >>>> but my
> >>>> second example shows that it's not working
> >>>>
> >>>> cat hostfile.0
> >>>> r011n002 slots=4
> >>>> r011n003 slots=4
> >>>>
> >>>> cat rankfile.0
> >>>> rank 0=r011n002 slot=0
> >>>> rank 1=r011n003 slot=1
> >>>>
> >>>> mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname : -n 1
> >>>> hostname
> >>>> ### CRASHED
> >>>>
> >>>>>> Error, invalid rank (1) in the rankfile (rankfile.0)
> >>>>>>
> >>>>>
> >>>>
> >>>
> --------------------------------------------------------------------------
> >>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in
> >>>> file
> >>>>>> rmaps_rank_file.c at line 404
> >>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in
> >>>> file
> >>>>>> base/rmaps_base_map_job.c at line 87
> >>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in
> >>>> file
> >>>>>> base/plm_base_launch_support.c at line 77
> >>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in
> >>>> file
> >>>>>> plm_rsh_module.c at line 985
> >>>>>>
> >>>>>
> >>>>
> >>>
> --------------------------------------------------------------------------
> >>>>>> A daemon (pid unknown) died unexpectedly on signal 1 while
> >>>>> attempting to
> >>>>>> launch so we are aborting.
> >>>>>>
> >>>>>> There may be more information reported by the environment (see
> >>>>> above).
> >>>>>>
> >>>>>> This may be because the daemon was unable to find all the
> needed
> >>>>> shared
> >>>>>> libraries on the remote node. You may set your
> LD_LIBRARY_PATH to
> >>>>> have the
> >>>>>> location of the shared libraries on the remote nodes and this
> >>>>>> will
> >>>>>> automatically be forwarded to the remote nodes.
> >>>>>>
> >>>>>
> >>>>
> >>>
> --------------------------------------------------------------------------
> >>>>>>
> >>>>>
> >>>>
> >>>
> --------------------------------------------------------------------------
> >>>>>> orterun noticed that the job aborted, but has no info as to the
> >>>>> process
> >>>>>> that caused that situation.
> >>>>>>
> >>>>>
> >>>>
> >>>
> --------------------------------------------------------------------------
> >>>>>> orterun: clean termination accomplished
> >>>>
> >>>>
> >>>>
> >>>> Message: 4
> >>>> Date: Tue, 14 Apr 2009 06:55:58 -0600
> >>>> From: Ralph Castain <rhc_at_[hidden]>
> >>>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
> >>>> To: Open MPI Users <users_at_[hidden]>
> >>>> Message-ID: <F6290ADA-A196-43F0-A853-CBCB802D8D9C_at_[hidden]>
> >>>> Content-Type: text/plain; charset="us-ascii"; Format="flowed";
> >>>> DelSp="yes"
> >>>>
> >>>> The rankfile cuts across the entire job - it isn't applied on an
> >>>> app_context basis. So the ranks in your rankfile must
> correspond to
> >>>> the eventual rank of each process in the cmd line.
> >>>>
> >>>> Unfortunately, that means you have to count ranks. In your case,
> >>>> you
> >>>> only have four, so that makes life easier. Your rankfile would
> look
> >>>> something like this:
> >>>>
> >>>> rank 0=r001n001 slot=0
> >>>> rank 1=r001n002 slot=1
> >>>> rank 2=r001n001 slot=1
> >>>> rank 3=r001n002 slot=2
> >>>>
> >>>> HTH
> >>>> Ralph
> >>>>
> >>>> On Apr 14, 2009, at 12:19 AM, Geoffroy Pignot wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> I agree that my examples are not very clear. What I want to do
> >>>>> is to
> >>>>> launch a multiexes application (masters-slaves) and benefit from
> >>>>> the
> >>>>> processor affinity.
> >>>>> Could you show me how to convert this command , using -rf option
> >>>>> (whatever the affinity is)
> >>>>>
> >>>>> mpirun -n 1 -host r001n001 master.x options1 : -n 1 -host
> >>>>> r001n002
> >>>>> master.x options2 : -n 1 -host r001n001 slave.x options3 : -n
> 1 -
> >>>>> host r001n002 slave.x options4
> >>>>>
> >>>>> Thanks for your help
> >>>>>
> >>>>> Geoffroy
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> Message: 2
> >>>>> Date: Sun, 12 Apr 2009 18:26:35 +0300
> >>>>> From: Lenny Verkhovsky <lenny.verkhovsky_at_[hidden]>
> >>>>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
> >>>>> To: Open MPI Users <users_at_[hidden]>
> >>>>> Message-ID:
> >>>>>
> >>>>> <453d39990904120826t2e1d1d33l7bb1fe3de65b5361_at_[hidden]>
> >>>>> Content-Type: text/plain; charset="iso-8859-1"
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> The first "crash" is OK, since your rankfile has ranks 0 and 1
> >>>>> defined,
> >>>>> while n=1, which means only rank 0 is present and can be
> >>>>> allocated.
> >>>>>
> >>>>> NP must be >= the largest rank in rankfile.
> >>>>>
> >>>>> What exactly are you trying to do ?
> >>>>>
> >>>>> I tried to recreate your seqv but all I got was
> >>>>>
> >>>>> ~/work/svn/ompi/trunk/build_x86-64/install/bin/mpirun --hostfile
> >>>>> hostfile.0
> >>>>> -rf rankfile.0 -n 1 hostname : -rf rankfile.1 -n 1 hostname
> >>>>> [witch19:30798] mca: base: component_find: paffinity
> >>>>> "mca_paffinity_linux"
> >>>>> uses an MCA interface that is not recognized (component MCA
> >>>> v1.0.0 !=
> >>>>> supported MCA v2.0.0) -- ignored
> >>>>>
> >>>>
> >>>
> --------------------------------------------------------------------------
> >>>>> It looks like opal_init failed for some reason; your parallel
> >>>>> process is
> >>>>> likely to abort. There are many reasons that a parallel process
> >>>>> can
> >>>>> fail during opal_init; some of which are due to configuration or
> >>>>> environment problems. This failure appears to be an internal
> >>>> failure;
> >>>>> here's some additional information (which may only be relevant
> >>>>> to an
> >>>>> Open MPI developer):
> >>>>>
> >>>>> opal_carto_base_select failed
> >>>>> --> Returned value -13 instead of OPAL_SUCCESS
> >>>>>
> >>>>
> >>>
> --------------------------------------------------------------------------
> >>>>> [witch19:30798] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in
> >>>> file
> >>>>> ../../orte/runtime/orte_init.c at line 78
> >>>>> [witch19:30798] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in
> >>>> file
> >>>>> ../../orte/orted/orted_main.c at line 344
> >>>>>
> >>>>
> >>>
> --------------------------------------------------------------------------
> >>>>> A daemon (pid 11629) died unexpectedly with status 243 while
> >>>>> attempting
> >>>>> to launch so we are aborting.
> >>>>>
> >>>>> There may be more information reported by the environment (see
> >>>> above).
> >>>>>
> >>>>> This may be because the daemon was unable to find all the needed
> >>>>> shared
> >>>>> libraries on the remote node. You may set your LD_LIBRARY_PATH
> to
> >>>>> have the
> >>>>> location of the shared libraries on the remote nodes and this
> will
> >>>>> automatically be forwarded to the remote nodes.
> >>>>>
> >>>>
> >>>
> --------------------------------------------------------------------------
> >>>>>
> >>>>
> >>>
> --------------------------------------------------------------------------
> >>>>> mpirun noticed that the job aborted, but has no info as to the
> >>>> process
> >>>>> that caused that situation.
> >>>>>
> >>>>
> >>>
> --------------------------------------------------------------------------
> >>>>> mpirun: clean termination accomplished
> >>>>>
> >>>>>
> >>>>> Lenny.
> >>>>>
> >>>>>
> >>>>> On 4/10/09, Geoffroy Pignot <geopignot_at_[hidden]> wrote:
> >>>>>>
> >>>>>> Hi ,
> >>>>>>
> >>>>>> I am currently testing the process affinity capabilities of
> >>>>> openmpi and I
> >>>>>> would like to know if the rankfile behaviour I will describe
> >>>>>> below
> >>>>> is normal
> >>>>>> or not ?
> >>>>>>
> >>>>>> cat hostfile.0
> >>>>>> r011n002 slots=4
> >>>>>> r011n003 slots=4
> >>>>>>
> >>>>>> cat rankfile.0
> >>>>>> rank 0=r011n002 slot=0
> >>>>>> rank 1=r011n003 slot=1
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> ##################################################################################
> >>>>>>
> >>>>>> mpirun --hostfile hostfile.0 -rf rankfile.0 -n 2 hostname
> ### OK
> >>>>>> r011n002
> >>>>>> r011n003
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> ##################################################################################
> >>>>>> but
> >>>>>> mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname : -
> n 1
> >>>>> hostname
> >>>>>> ### CRASHED
> >>>>>> *
> >>>>>>
> >>>>>
> >>>>
> >>>
> --------------------------------------------------------------------------
> >>>>>> Error, invalid rank (1) in the rankfile (rankfile.0)
> >>>>>>
> >>>>>
> >>>>
> >>>
> --------------------------------------------------------------------------
> >>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in
> >>>> file
> >>>>>> rmaps_rank_file.c at line 404
> >>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in
> >>>> file
> >>>>>> base/rmaps_base_map_job.c at line 87
> >>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in
> >>>> file
> >>>>>> base/plm_base_launch_support.c at line 77
> >>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in
> >>>> file
> >>>>>> plm_rsh_module.c at line 985
> >>>>>>
> >>>>>
> >>>>
> >>>
> --------------------------------------------------------------------------
> >>>>>> A daemon (pid unknown) died unexpectedly on signal 1 while
> >>>>> attempting to
> >>>>>> launch so we are aborting.
> >>>>>>
> >>>>>> There may be more information reported by the environment (see
> >>>>> above).
> >>>>>>
> >>>>>> This may be because the daemon was unable to find all the
> needed
> >>>>> shared
> >>>>>> libraries on the remote node. You may set your
> LD_LIBRARY_PATH to
> >>>>> have the
> >>>>>> location of the shared libraries on the remote nodes and this
> >>>>>> will
> >>>>>> automatically be forwarded to the remote nodes.
> >>>>>>
> >>>>>
> >>>>
> >>>
> --------------------------------------------------------------------------
> >>>>>>
> >>>>>
> >>>>
> >>>
> --------------------------------------------------------------------------
> >>>>>> orterun noticed that the job aborted, but has no info as to the
> >>>>> process
> >>>>>> that caused that situation.
> >>>>>>
> >>>>>
> >>>>
> >>>
> --------------------------------------------------------------------------
> >>>>>> orterun: clean termination accomplished
> >>>>>> *
> >>>>>> It seems that the rankfile option is not propagted to the
> second
> >>>>> command
> >>>>>> line ; there is no global understanding of the ranking inside a
> >>>>> mpirun
> >>>>>> command.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> ##################################################################################
> >>>>>>
> >>>>>> Assuming that , I tried to provide a rankfile to each command
> >>>> line:
> >>>>>>
> >>>>>> cat rankfile.0
> >>>>>> rank 0=r011n002 slot=0
> >>>>>>
> >>>>>> cat rankfile.1
> >>>>>> rank 0=r011n003 slot=1
> >>>>>>
> >>>>>> mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname : -rf
> >>>>> rankfile.1
> >>>>>> -n 1 hostname ### CRASHED
> >>>>>> *[r011n002:28778] *** Process received signal ***
> >>>>>> [r011n002:28778] Signal: Segmentation fault (11)
> >>>>>> [r011n002:28778] Signal code: Address not mapped (1)
> >>>>>> [r011n002:28778] Failing at address: 0x34
> >>>>>> [r011n002:28778] [ 0] [0xffffe600]
> >>>>>> [r011n002:28778] [ 1]
> >>>>>> /tmp/HALMPI/openmpi-1.3.1/lib/libopen-rte.so.
> >>>>> 0(orte_odls_base_default_get_add_procs_data+0x55d)
> >>>>>> [0x5557decd]
> >>>>>> [r011n002:28778] [ 2]
> >>>>>> /tmp/HALMPI/openmpi-1.3.1/lib/libopen-rte.so.
> >>>>> 0(orte_plm_base_launch_apps+0x117)
> >>>>>> [0x555842a7]
> >>>>>> [r011n002:28778] [ 3] /tmp/HALMPI/openmpi-1.3.1/lib/openmpi/
> >>>>> mca_plm_rsh.so
> >>>>>> [0x556098c0]
> >>>>>> [r011n002:28778] [ 4] /tmp/HALMPI/openmpi-1.3.1/bin/orterun
> >>>>> [0x804aa27]
> >>>>>> [r011n002:28778] [ 5] /tmp/HALMPI/openmpi-1.3.1/bin/orterun
> >>>>> [0x804a022]
> >>>>>> [r011n002:28778] [ 6] /lib/libc.so.6(__libc_start_main+0xdc)
> >>>>> [0x9f1dec]
> >>>>>> [r011n002:28778] [ 7] /tmp/HALMPI/openmpi-1.3.1/bin/orterun
> >>>>> [0x8049f71]
> >>>>>> [r011n002:28778] *** End of error message ***
> >>>>>> Segmentation fault (core dumped)*
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> I hope that I've found a bug because it would be very important
> >>>>> for me to
> >>>>>> have this kind of capabiliy .
> >>>>>> Launch a multiexe mpirun command line and be able to bind my
> exes
> >>>>> and
> >>>>>> sockets together.
> >>>>>>
> >>>>>> Thanks in advance for your help
> >>>>>>
> >>>>>> Geoffroy
> >>>>> _______________________________________________
> >>>>> users mailing list
> >>>>> users_at_[hidden]
> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>>
> >>>> -------------- next part --------------
> >>>> HTML attachment scrubbed and removed
> >>>>
> >>>> ------------------------------
> >>>>
> >>>> _______________________________________________
> >>>> users mailing list
> >>>> users_at_[hidden]
> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>>
> >>>> End of users Digest, Vol 1202, Issue 2
> >>>> **************************************
> >>>>
> >>>> _______________________________________________
> >>>> users mailing list
> >>>> users_at_[hidden]
> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>>
> >>>> _______________________________________________
> >>>> users mailing list
> >>>> users_at_[hidden]
> >>>> -------------- next part --------------
> >>>> HTML attachment scrubbed and removed
> >>>>
> >>>> ------------------------------
> >>>>
> >>>> _______________________________________________
> >>>> users mailing list
> >>>> users_at_[hidden]
> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>>
> >>>> End of users Digest, Vol 1218, Issue 2
> >>>> **************************************
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> users mailing list
> >>>> users_at_[hidden]
> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>
> >>> -------------- next part --------------
> >>> HTML attachment scrubbed and removed
> >>>
> >>> ------------------------------
> >>>
> >>> _______________________________________________
> >>> users mailing list
> >>> users_at_[hidden]
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>
> >>> End of users Digest, Vol 1221, Issue 3
> >>> **************************************
> >>>
> >>
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> > -------------- next part --------------
> > HTML attachment scrubbed and removed
> >
> > ------------------------------
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > End of users Digest, Vol 1221, Issue 6
> > **************************************
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems