Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1
From: Vicente Puig (vpuibor_at_[hidden])
Date: 2009-05-04 12:13:45


If I can not make it work with Xcode, which one could I use?, which one do
you use to compile and debug OpenMPI?.
Thanks

Vincent

2009/5/4 Jeff Squyres <jsquyres_at_[hidden]>

> Open MPI comes pre-installed in Leopard; as Warner noted, since Leopard
> doesn't ship with a Fortran compiler, the Open MPI that Apple ships has
> non-functional mpif77 and mpif90 wrapper compilers.
>
> So the Open MPI that you installed manually will use your Fortran
> compilers, and therefore will have functional mpif77 and mpif90 wrapper
> compilers. Hence, you probably need to be sure to use the "right" wrapper
> compilers. It looks like you specified the full path specified to ExecPath,
> so I'm not sure why Xcode wouldn't work with that (like I mentioned, I
> unfortunately don't use Xcode myself, so I don't know why that wouldn't
> work).
>
>
>
>
> On May 4, 2009, at 11:53 AM, Vicente wrote:
>
> Yes, I already have gfortran compiler on /usr/local/bin, the same path
>> as my mpif90 compiler. But I've seen when I use the mpif90 on /usr/bin
>> and on /Developer/usr/bin says it:
>>
>> "Unfortunately, this installation of Open MPI was not compiled with
>> Fortran 90 support. As such, the mpif90 compiler is non-functional."
>>
>>
>> That should be the problem, I will have to change the path to use the
>> gfortran I have installed.
>> How could I do it? (Sorry, I am beginner)
>>
>> Thanks.
>>
>>
>> El 04/05/2009, a las 17:38, Warner Yuen escribió:
>>
>> > Have you installed a Fortran compiler? Mac OS X's developer tools do
>> > not come with a Fortran compiler, so you'll need to install one if
>> > you haven't already done so. I routinely use the Intel IFORT
>> > compilers with success. However, I hear many good things about the
>> > gfortran compilers on Mac OS X, you can't beat the price of gfortran!
>> >
>> >
>> > Warner Yuen
>> > Scientific Computing
>> > Consulting Engineer
>> > Apple, Inc.
>> > email: wyuen_at_[hidden]
>> > Tel: 408.718.2859
>> >
>> >
>> >
>> >
>> > On May 4, 2009, at 7:28 AM, users-request_at_[hidden] wrote:
>> >
>> >> Send users mailing list submissions to
>> >> users_at_[hidden]
>> >>
>> >> To subscribe or unsubscribe via the World Wide Web, visit
>> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >> or, via email, send a message with subject or body 'help' to
>> >> users-request_at_[hidden]
>> >>
>> >> You can reach the person managing the list at
>> >> users-owner_at_[hidden]
>> >>
>> >> When replying, please edit your Subject line so it is more specific
>> >> than "Re: Contents of users digest..."
>> >>
>> >>
>> >> Today's Topics:
>> >>
>> >> 1. How do I compile OpenMPI in Xcode 3.1 (Vicente)
>> >> 2. Re: 1.3.1 -rf rankfile behaviour ?? (Ralph Castain)
>> >>
>> >>
>> >> ----------------------------------------------------------------------
>> >>
>> >> Message: 1
>> >> Date: Mon, 4 May 2009 16:12:44 +0200
>> >> From: Vicente <vpuibor_at_[hidden]>
>> >> Subject: [OMPI users] How do I compile OpenMPI in Xcode 3.1
>> >> To: users_at_[hidden]
>> >> Message-ID: <1C2C0085-940F-43BB-910F-975871AE2F09_at_[hidden]>
>> >> Content-Type: text/plain; charset="windows-1252"; Format="flowed";
>> >> DelSp="yes"
>> >>
>> >> Hi, I've seen the FAQ "How do I use Open MPI wrapper compilers in
>> >> Xcode", but it's only for MPICC. I am using MPIF90, so I did the
>> >> same,
>> >> but changing MPICC for MPIF90, and also the path, but it did not
>> >> work.
>> >>
>> >> Building target ?fortran? of project ?fortran? with configuration
>> >> ?Debug?
>> >>
>> >>
>> >> Checking Dependencies
>> >> Invalid value 'MPIF90' for GCC_VERSION
>> >>
>> >>
>> >> The file "MPIF90.cpcompspec" looks like this:
>> >>
>> >> 1 /**
>> >> 2 Xcode Coompiler Specification for MPIF90
>> >> 3
>> >> 4 */
>> >> 5
>> >> 6 { Type = Compiler;
>> >> 7 Identifier = com.apple.compilers.mpif90;
>> >> 8 BasedOn = com.apple.compilers.gcc.4_0;
>> >> 9 Name = "MPIF90";
>> >> 10 Version = "Default";
>> >> 11 Description = "MPI GNU C/C++ Compiler 4.0";
>> >> 12 ExecPath = "/usr/local/bin/mpif90"; // This gets
>> >> converted to the g++ variant automatically
>> >> 13 PrecompStyle = pch;
>> >> 14 }
>> >>
>> >> and is located in "/Developer/Library/Xcode/Plug-ins"
>> >>
>> >> and when I do mpif90 -v on terminal it works well:
>> >>
>> >> Using built-in specs.
>> >> Target: i386-apple-darwin8.10.1
>> >> Configured with: /tmp/gfortran-20090321/ibin/../gcc/configure --
>> >> prefix=/usr/local/gfortran --enable-languages=c,fortran --with-gmp=/
>> >> tmp/gfortran-20090321/gfortran_libs --enable-bootstrap
>> >> Thread model: posix
>> >> gcc version 4.4.0 20090321 (experimental) [trunk revision 144983]
>> >> (GCC)
>> >>
>> >>
>> >> Any idea??
>> >>
>> >> Thanks.
>> >>
>> >> Vincent
>> >> -------------- next part --------------
>> >> HTML attachment scrubbed and removed
>> >>
>> >> ------------------------------
>> >>
>> >> Message: 2
>> >> Date: Mon, 4 May 2009 08:28:26 -0600
>> >> From: Ralph Castain <rhc_at_[hidden]>
>> >> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
>> >> To: Open MPI Users <users_at_[hidden]>
>> >> Message-ID:
>> >> <71d2d8cc0905040728h2002f4d7s4c49219eee29e86f_at_[hidden]>
>> >> Content-Type: text/plain; charset="iso-8859-1"
>> >>
>> >> Unfortunately, I didn't write any of that code - I was just fixing
>> >> the
>> >> mapper so it would properly map the procs. From what I can tell,
>> >> the proper
>> >> things are happening there.
>> >>
>> >> I'll have to dig into the code that specifically deals with parsing
>> >> the
>> >> results to bind the processes. Afraid that will take awhile longer
>> >> - pretty
>> >> dark in that hole.
>> >>
>> >>
>> >> On Mon, May 4, 2009 at 8:04 AM, Geoffroy Pignot
>> >> <geopignot_at_[hidden]> wrote:
>> >>
>> >>> Hi,
>> >>>
>> >>> So, there are no more crashes with my "crazy" mpirun command. But
>> >>> the
>> >>> paffinity feature seems to be broken. Indeed I am not able to pin my
>> >>> processes.
>> >>>
>> >>> Simple test with a program using your plpa library :
>> >>>
>> >>> r011n006% cat hostf
>> >>> r011n006 slots=4
>> >>>
>> >>> r011n006% cat rankf
>> >>> rank 0=r011n006 slot=0 ----> bind to CPU 0 , exact ?
>> >>>
>> >>> r011n006% /tmp/HALMPI/openmpi-1.4a/bin/mpirun --hostfile hostf --
>> >>> rankfile
>> >>> rankf --wdir /tmp -n 1 a.out
>> >>>>>> PLPA Number of processors online: 4
>> >>>>>> PLPA Number of processor sockets: 2
>> >>>>>> PLPA Socket 0 (ID 0): 2 cores
>> >>>>>> PLPA Socket 1 (ID 3): 2 cores
>> >>>
>> >>> Ctrl+Z
>> >>> r011n006%bg
>> >>>
>> >>> r011n006% ps axo stat,user,psr,pid,pcpu,comm | grep gpignot
>> >>> R+ gpignot 3 9271 97.8 a.out
>> >>>
>> >>> In fact whatever the slot number I put in my rankfile , a.out
>> >>> always runs
>> >>> on the CPU 3. I was looking for it on CPU 0 accordind to my
>> >>> cpuinfo file
>> >>> (see below)
>> >>> The result is the same if I try another syntax (rank 0=r011n006
>> >>> slot=0:0
>> >>> bind to socket 0 - core 0 , exact ? )
>> >>>
>> >>> Thanks in advance
>> >>>
>> >>> Geoffroy
>> >>>
>> >>> PS: I run on rhel5
>> >>>
>> >>> r011n006% uname -a
>> >>> Linux r011n006 2.6.18-92.1.1NOMAP32.el5 #1 SMP Sat Mar 15 01:46:39
>> >>> CDT 2008
>> >>> x86_64 x86_64 x86_64 GNU/Linux
>> >>>
>> >>> My configure is :
>> >>> ./configure --prefix=/tmp/openmpi-1.4a --libdir='${exec_prefix}/
>> >>> lib64'
>> >>> --disable-dlopen --disable-mpi-cxx --enable-heterogeneous
>> >>>
>> >>>
>> >>> r011n006% cat /proc/cpuinfo
>> >>> processor : 0
>> >>> vendor_id : GenuineIntel
>> >>> cpu family : 6
>> >>> model : 15
>> >>> model name : Intel(R) Xeon(R) CPU 5150 @ 2.66GHz
>> >>> stepping : 6
>> >>> cpu MHz : 2660.007
>> >>> cache size : 4096 KB
>> >>> physical id : 0
>> >>> siblings : 2
>> >>> core id : 0
>> >>> cpu cores : 2
>> >>> fpu : yes
>> >>> fpu_exception : yes
>> >>> cpuid level : 10
>> >>> wp : yes
>> >>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
>> >>> pge mca
>> >>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall
>> >>> nx lm
>> >>> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
>> >>> bogomips : 5323.68
>> >>> clflush size : 64
>> >>> cache_alignment : 64
>> >>> address sizes : 36 bits physical, 48 bits virtual
>> >>> power management:
>> >>>
>> >>> processor : 1
>> >>> vendor_id : GenuineIntel
>> >>> cpu family : 6
>> >>> model : 15
>> >>> model name : Intel(R) Xeon(R) CPU 5150 @ 2.66GHz
>> >>> stepping : 6
>> >>> cpu MHz : 2660.007
>> >>> cache size : 4096 KB
>> >>> physical id : 3
>> >>> siblings : 2
>> >>> core id : 0
>> >>> cpu cores : 2
>> >>> fpu : yes
>> >>> fpu_exception : yes
>> >>> cpuid level : 10
>> >>> wp : yes
>> >>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
>> >>> pge mca
>> >>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall
>> >>> nx lm
>> >>> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
>> >>> bogomips : 5320.03
>> >>> clflush size : 64
>> >>> cache_alignment : 64
>> >>> address sizes : 36 bits physical, 48 bits virtual
>> >>> power management:
>> >>>
>> >>> processor : 2
>> >>> vendor_id : GenuineIntel
>> >>> cpu family : 6
>> >>> model : 15
>> >>> model name : Intel(R) Xeon(R) CPU 5150 @ 2.66GHz
>> >>> stepping : 6
>> >>> cpu MHz : 2660.007
>> >>> cache size : 4096 KB
>> >>> physical id : 0
>> >>> siblings : 2
>> >>> core id : 1
>> >>> cpu cores : 2
>> >>> fpu : yes
>> >>> fpu_exception : yes
>> >>> cpuid level : 10
>> >>> wp : yes
>> >>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
>> >>> pge mca
>> >>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall
>> >>> nx lm
>> >>> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
>> >>> bogomips : 5319.39
>> >>> clflush size : 64
>> >>> cache_alignment : 64
>> >>> address sizes : 36 bits physical, 48 bits virtual
>> >>> power management:
>> >>>
>> >>> processor : 3
>> >>> vendor_id : GenuineIntel
>> >>> cpu family : 6
>> >>> model : 15
>> >>> model name : Intel(R) Xeon(R) CPU 5150 @ 2.66GHz
>> >>> stepping : 6
>> >>> cpu MHz : 2660.007
>> >>> cache size : 4096 KB
>> >>> physical id : 3
>> >>> siblings : 2
>> >>> core id : 1
>> >>> cpu cores : 2
>> >>> fpu : yes
>> >>> fpu_exception : yes
>> >>> cpuid level : 10
>> >>> wp : yes
>> >>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
>> >>> pge mca
>> >>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall
>> >>> nx lm
>> >>> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
>> >>> bogomips : 5320.03
>> >>> clflush size : 64
>> >>> cache_alignment : 64
>> >>> address sizes : 36 bits physical, 48 bits virtual
>> >>> power management:
>> >>>
>> >>>
>> >>>> ------------------------------
>> >>>>
>> >>>> Message: 2
>> >>>> Date: Mon, 4 May 2009 04:45:57 -0600
>> >>>> From: Ralph Castain <rhc_at_[hidden]>
>> >>>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
>> >>>> To: Open MPI Users <users_at_[hidden]>
>> >>>> Message-ID: <D01D7B16-4B47-46F3-AD41-D1A90B2E4927_at_[hidden]>
>> >>>>
>> >>>> Content-Type: text/plain; charset="us-ascii"; Format="flowed";
>> >>>> DelSp="yes"
>> >>>>
>> >>>> My apologies - I wasn't clear enough. You need a tarball from
>> >>>> r21111
>> >>>> or greater...such as:
>> >>>>
>> >>>> http://www.open-mpi.org/nightly/trunk/openmpi-1.4a1r21142.tar.gz
>> >>>>
>> >>>> HTH
>> >>>> Ralph
>> >>>>
>> >>>>
>> >>>> On May 4, 2009, at 2:14 AM, Geoffroy Pignot wrote:
>> >>>>
>> >>>>> Hi ,
>> >>>>>
>> >>>>> I got the openmpi-1.4a1r21095.tar.gz tarball, but unfortunately my
>> >>>>> command doesn't work
>> >>>>>
>> >>>>> cat rankf:
>> >>>>> rank 0=node1 slot=*
>> >>>>> rank 1=node2 slot=*
>> >>>>>
>> >>>>> cat hostf:
>> >>>>> node1 slots=2
>> >>>>> node2 slots=2
>> >>>>>
>> >>>>> mpirun --rankfile rankf --hostfile hostf --host node1 -n 1
>> >>>>> hostname : --host node2 -n 1 hostname
>> >>>>>
>> >>>>> Error, invalid rank (1) in the rankfile (rankf)
>> >>>>>
>> >>>>>
>> >>>>
>> --------------------------------------------------------------------------
>> >>>>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in
>> >>>>> file
>> >>>>> rmaps_rank_file.c at line 403
>> >>>>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in
>> >>>>> file
>> >>>>> base/rmaps_base_map_job.c at line 86
>> >>>>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in
>> >>>>> file
>> >>>>> base/plm_base_launch_support.c at line 86
>> >>>>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in
>> >>>>> file
>> >>>>> plm_rsh_module.c at line 1016
>> >>>>>
>> >>>>>
>> >>>>> Ralph, could you tell me if my command syntax is correct or
>> >>>>> not ? if
>> >>>>> not, give me the expected one ?
>> >>>>>
>> >>>>> Regards
>> >>>>>
>> >>>>> Geoffroy
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> 2009/4/30 Geoffroy Pignot <geopignot_at_[hidden]>
>> >>>>> Immediately Sir !!! :)
>> >>>>>
>> >>>>> Thanks again Ralph
>> >>>>>
>> >>>>> Geoffroy
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> ------------------------------
>> >>>>>
>> >>>>> Message: 2
>> >>>>> Date: Thu, 30 Apr 2009 06:45:39 -0600
>> >>>>> From: Ralph Castain <rhc_at_[hidden]>
>> >>>>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
>> >>>>> To: Open MPI Users <users_at_[hidden]>
>> >>>>> Message-ID:
>> >>>>> <71d2d8cc0904300545v61a42fe1k50086d2704d0f7e6_at_[hidden]>
>> >>>>> Content-Type: text/plain; charset="iso-8859-1"
>> >>>>>
>> >>>>> I believe this is fixed now in our development trunk - you can
>> >>>>> download any
>> >>>>> tarball starting from last night and give it a try, if you like.
>> >>>>> Any
>> >>>>> feedback would be appreciated.
>> >>>>>
>> >>>>> Ralph
>> >>>>>
>> >>>>>
>> >>>>> On Apr 14, 2009, at 7:57 AM, Ralph Castain wrote:
>> >>>>>
>> >>>>> Ah now, I didn't say it -worked-, did I? :-)
>> >>>>>
>> >>>>> Clearly a bug exists in the program. I'll try to take a look at it
>> >>>>> (if Lenny
>> >>>>> doesn't get to it first), but it won't be until later in the week.
>> >>>>>
>> >>>>> On Apr 14, 2009, at 7:18 AM, Geoffroy Pignot wrote:
>> >>>>>
>> >>>>> I agree with you Ralph , and that 's what I expect from openmpi
>> >>>>> but my
>> >>>>> second example shows that it's not working
>> >>>>>
>> >>>>> cat hostfile.0
>> >>>>> r011n002 slots=4
>> >>>>> r011n003 slots=4
>> >>>>>
>> >>>>> cat rankfile.0
>> >>>>> rank 0=r011n002 slot=0
>> >>>>> rank 1=r011n003 slot=1
>> >>>>>
>> >>>>> mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname : -n 1
>> >>>>> hostname
>> >>>>> ### CRASHED
>> >>>>>
>> >>>>>>> Error, invalid rank (1) in the rankfile (rankfile.0)
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> --------------------------------------------------------------------------
>> >>>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in
>> >>>>> file
>> >>>>>>> rmaps_rank_file.c at line 404
>> >>>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in
>> >>>>> file
>> >>>>>>> base/rmaps_base_map_job.c at line 87
>> >>>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in
>> >>>>> file
>> >>>>>>> base/plm_base_launch_support.c at line 77
>> >>>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in
>> >>>>> file
>> >>>>>>> plm_rsh_module.c at line 985
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> --------------------------------------------------------------------------
>> >>>>>>> A daemon (pid unknown) died unexpectedly on signal 1 while
>> >>>>>> attempting to
>> >>>>>>> launch so we are aborting.
>> >>>>>>>
>> >>>>>>> There may be more information reported by the environment (see
>> >>>>>> above).
>> >>>>>>>
>> >>>>>>> This may be because the daemon was unable to find all the needed
>> >>>>>> shared
>> >>>>>>> libraries on the remote node. You may set your LD_LIBRARY_PATH
>> >>>>>>> to
>> >>>>>> have the
>> >>>>>>> location of the shared libraries on the remote nodes and this
>> >>>>>>> will
>> >>>>>>> automatically be forwarded to the remote nodes.
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> --------------------------------------------------------------------------
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> --------------------------------------------------------------------------
>> >>>>>>> orterun noticed that the job aborted, but has no info as to the
>> >>>>>> process
>> >>>>>>> that caused that situation.
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> --------------------------------------------------------------------------
>> >>>>>>> orterun: clean termination accomplished
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> Message: 4
>> >>>>> Date: Tue, 14 Apr 2009 06:55:58 -0600
>> >>>>> From: Ralph Castain <rhc_at_[hidden]>
>> >>>>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
>> >>>>> To: Open MPI Users <users_at_[hidden]>
>> >>>>> Message-ID: <F6290ADA-A196-43F0-A853-CBCB802D8D9C_at_[hidden]>
>> >>>>> Content-Type: text/plain; charset="us-ascii"; Format="flowed";
>> >>>>> DelSp="yes"
>> >>>>>
>> >>>>> The rankfile cuts across the entire job - it isn't applied on an
>> >>>>> app_context basis. So the ranks in your rankfile must correspond
>> >>>>> to
>> >>>>> the eventual rank of each process in the cmd line.
>> >>>>>
>> >>>>> Unfortunately, that means you have to count ranks. In your case,
>> >>>>> you
>> >>>>> only have four, so that makes life easier. Your rankfile would
>> >>>>> look
>> >>>>> something like this:
>> >>>>>
>> >>>>> rank 0=r001n001 slot=0
>> >>>>> rank 1=r001n002 slot=1
>> >>>>> rank 2=r001n001 slot=1
>> >>>>> rank 3=r001n002 slot=2
>> >>>>>
>> >>>>> HTH
>> >>>>> Ralph
>> >>>>>
>> >>>>> On Apr 14, 2009, at 12:19 AM, Geoffroy Pignot wrote:
>> >>>>>
>> >>>>>> Hi,
>> >>>>>>
>> >>>>>> I agree that my examples are not very clear. What I want to do
>> >>>>>> is to
>> >>>>>> launch a multiexes application (masters-slaves) and benefit
>> >>>>>> from the
>> >>>>>> processor affinity.
>> >>>>>> Could you show me how to convert this command , using -rf option
>> >>>>>> (whatever the affinity is)
>> >>>>>>
>> >>>>>> mpirun -n 1 -host r001n001 master.x options1 : -n 1 -host
>> >>>>>> r001n002
>> >>>>>> master.x options2 : -n 1 -host r001n001 slave.x options3 : -n 1 -
>> >>>>>> host r001n002 slave.x options4
>> >>>>>>
>> >>>>>> Thanks for your help
>> >>>>>>
>> >>>>>> Geoffroy
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> Message: 2
>> >>>>>> Date: Sun, 12 Apr 2009 18:26:35 +0300
>> >>>>>> From: Lenny Verkhovsky <lenny.verkhovsky_at_[hidden]>
>> >>>>>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
>> >>>>>> To: Open MPI Users <users_at_[hidden]>
>> >>>>>> Message-ID:
>> >>>>>>
>> >>>>>> <453d39990904120826t2e1d1d33l7bb1fe3de65b5361_at_[hidden]>
>> >>>>>> Content-Type: text/plain; charset="iso-8859-1"
>> >>>>>>
>> >>>>>> Hi,
>> >>>>>>
>> >>>>>> The first "crash" is OK, since your rankfile has ranks 0 and 1
>> >>>>>> defined,
>> >>>>>> while n=1, which means only rank 0 is present and can be
>> >>>>>> allocated.
>> >>>>>>
>> >>>>>> NP must be >= the largest rank in rankfile.
>> >>>>>>
>> >>>>>> What exactly are you trying to do ?
>> >>>>>>
>> >>>>>> I tried to recreate your seqv but all I got was
>> >>>>>>
>> >>>>>> ~/work/svn/ompi/trunk/build_x86-64/install/bin/mpirun --hostfile
>> >>>>>> hostfile.0
>> >>>>>> -rf rankfile.0 -n 1 hostname : -rf rankfile.1 -n 1 hostname
>> >>>>>> [witch19:30798] mca: base: component_find: paffinity
>> >>>>>> "mca_paffinity_linux"
>> >>>>>> uses an MCA interface that is not recognized (component MCA
>> >>>>> v1.0.0 !=
>> >>>>>> supported MCA v2.0.0) -- ignored
>> >>>>>>
>> >>>>>
>> >>>>
>> --------------------------------------------------------------------------
>> >>>>>> It looks like opal_init failed for some reason; your parallel
>> >>>>>> process is
>> >>>>>> likely to abort. There are many reasons that a parallel process
>> >>>>>> can
>> >>>>>> fail during opal_init; some of which are due to configuration or
>> >>>>>> environment problems. This failure appears to be an internal
>> >>>>> failure;
>> >>>>>> here's some additional information (which may only be relevant
>> >>>>>> to an
>> >>>>>> Open MPI developer):
>> >>>>>>
>> >>>>>> opal_carto_base_select failed
>> >>>>>> --> Returned value -13 instead of OPAL_SUCCESS
>> >>>>>>
>> >>>>>
>> >>>>
>> --------------------------------------------------------------------------
>> >>>>>> [witch19:30798] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in
>> >>>>> file
>> >>>>>> ../../orte/runtime/orte_init.c at line 78
>> >>>>>> [witch19:30798] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in
>> >>>>> file
>> >>>>>> ../../orte/orted/orted_main.c at line 344
>> >>>>>>
>> >>>>>
>> >>>>
>> --------------------------------------------------------------------------
>> >>>>>> A daemon (pid 11629) died unexpectedly with status 243 while
>> >>>>>> attempting
>> >>>>>> to launch so we are aborting.
>> >>>>>>
>> >>>>>> There may be more information reported by the environment (see
>> >>>>> above).
>> >>>>>>
>> >>>>>> This may be because the daemon was unable to find all the needed
>> >>>>>> shared
>> >>>>>> libraries on the remote node. You may set your LD_LIBRARY_PATH to
>> >>>>>> have the
>> >>>>>> location of the shared libraries on the remote nodes and this
>> >>>>>> will
>> >>>>>> automatically be forwarded to the remote nodes.
>> >>>>>>
>> >>>>>
>> >>>>
>> --------------------------------------------------------------------------
>> >>>>>>
>> >>>>>
>> >>>>
>> --------------------------------------------------------------------------
>> >>>>>> mpirun noticed that the job aborted, but has no info as to the
>> >>>>> process
>> >>>>>> that caused that situation.
>> >>>>>>
>> >>>>>
>> >>>>
>> --------------------------------------------------------------------------
>> >>>>>> mpirun: clean termination accomplished
>> >>>>>>
>> >>>>>>
>> >>>>>> Lenny.
>> >>>>>>
>> >>>>>>
>> >>>>>> On 4/10/09, Geoffroy Pignot <geopignot_at_[hidden]> wrote:
>> >>>>>>>
>> >>>>>>> Hi ,
>> >>>>>>>
>> >>>>>>> I am currently testing the process affinity capabilities of
>> >>>>>> openmpi and I
>> >>>>>>> would like to know if the rankfile behaviour I will describe
>> >>>>>>> below
>> >>>>>> is normal
>> >>>>>>> or not ?
>> >>>>>>>
>> >>>>>>> cat hostfile.0
>> >>>>>>> r011n002 slots=4
>> >>>>>>> r011n003 slots=4
>> >>>>>>>
>> >>>>>>> cat rankfile.0
>> >>>>>>> rank 0=r011n002 slot=0
>> >>>>>>> rank 1=r011n003 slot=1
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> ##################################################################################
>> >>>>>>>
>> >>>>>>> mpirun --hostfile hostfile.0 -rf rankfile.0 -n 2 hostname ###
>> >>>>>>> OK
>> >>>>>>> r011n002
>> >>>>>>> r011n003
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> ##################################################################################
>> >>>>>>> but
>> >>>>>>> mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname : -n 1
>> >>>>>> hostname
>> >>>>>>> ### CRASHED
>> >>>>>>> *
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> --------------------------------------------------------------------------
>> >>>>>>> Error, invalid rank (1) in the rankfile (rankfile.0)
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> --------------------------------------------------------------------------
>> >>>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in
>> >>>>> file
>> >>>>>>> rmaps_rank_file.c at line 404
>> >>>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in
>> >>>>> file
>> >>>>>>> base/rmaps_base_map_job.c at line 87
>> >>>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in
>> >>>>> file
>> >>>>>>> base/plm_base_launch_support.c at line 77
>> >>>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in
>> >>>>> file
>> >>>>>>> plm_rsh_module.c at line 985
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> --------------------------------------------------------------------------
>> >>>>>>> A daemon (pid unknown) died unexpectedly on signal 1 while
>> >>>>>> attempting to
>> >>>>>>> launch so we are aborting.
>> >>>>>>>
>> >>>>>>> There may be more information reported by the environment (see
>> >>>>>> above).
>> >>>>>>>
>> >>>>>>> This may be because the daemon was unable to find all the needed
>> >>>>>> shared
>> >>>>>>> libraries on the remote node. You may set your LD_LIBRARY_PATH
>> >>>>>>> to
>> >>>>>> have the
>> >>>>>>> location of the shared libraries on the remote nodes and this
>> >>>>>>> will
>> >>>>>>> automatically be forwarded to the remote nodes.
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> --------------------------------------------------------------------------
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> --------------------------------------------------------------------------
>> >>>>>>> orterun noticed that the job aborted, but has no info as to the
>> >>>>>> process
>> >>>>>>> that caused that situation.
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> --------------------------------------------------------------------------
>> >>>>>>> orterun: clean termination accomplished
>> >>>>>>> *
>> >>>>>>> It seems that the rankfile option is not propagted to the second
>> >>>>>> command
>> >>>>>>> line ; there is no global understanding of the ranking inside a
>> >>>>>> mpirun
>> >>>>>>> command.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> ##################################################################################
>> >>>>>>>
>> >>>>>>> Assuming that , I tried to provide a rankfile to each command
>> >>>>> line:
>> >>>>>>>
>> >>>>>>> cat rankfile.0
>> >>>>>>> rank 0=r011n002 slot=0
>> >>>>>>>
>> >>>>>>> cat rankfile.1
>> >>>>>>> rank 0=r011n003 slot=1
>> >>>>>>>
>> >>>>>>> mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname : -rf
>> >>>>>> rankfile.1
>> >>>>>>> -n 1 hostname ### CRASHED
>> >>>>>>> *[r011n002:28778] *** Process received signal ***
>> >>>>>>> [r011n002:28778] Signal: Segmentation fault (11)
>> >>>>>>> [r011n002:28778] Signal code: Address not mapped (1)
>> >>>>>>> [r011n002:28778] Failing at address: 0x34
>> >>>>>>> [r011n002:28778] [ 0] [0xffffe600]
>> >>>>>>> [r011n002:28778] [ 1]
>> >>>>>>> /tmp/HALMPI/openmpi-1.3.1/lib/libopen-rte.so.
>> >>>>>> 0(orte_odls_base_default_get_add_procs_data+0x55d)
>> >>>>>>> [0x5557decd]
>> >>>>>>> [r011n002:28778] [ 2]
>> >>>>>>> /tmp/HALMPI/openmpi-1.3.1/lib/libopen-rte.so.
>> >>>>>> 0(orte_plm_base_launch_apps+0x117)
>> >>>>>>> [0x555842a7]
>> >>>>>>> [r011n002:28778] [ 3] /tmp/HALMPI/openmpi-1.3.1/lib/openmpi/
>> >>>>>> mca_plm_rsh.so
>> >>>>>>> [0x556098c0]
>> >>>>>>> [r011n002:28778] [ 4] /tmp/HALMPI/openmpi-1.3.1/bin/orterun
>> >>>>>> [0x804aa27]
>> >>>>>>> [r011n002:28778] [ 5] /tmp/HALMPI/openmpi-1.3.1/bin/orterun
>> >>>>>> [0x804a022]
>> >>>>>>> [r011n002:28778] [ 6] /lib/libc.so.6(__libc_start_main+0xdc)
>> >>>>>> [0x9f1dec]
>> >>>>>>> [r011n002:28778] [ 7] /tmp/HALMPI/openmpi-1.3.1/bin/orterun
>> >>>>>> [0x8049f71]
>> >>>>>>> [r011n002:28778] *** End of error message ***
>> >>>>>>> Segmentation fault (core dumped)*
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> I hope that I've found a bug because it would be very important
>> >>>>>> for me to
>> >>>>>>> have this kind of capabiliy .
>> >>>>>>> Launch a multiexe mpirun command line and be able to bind my
>> >>>>>>> exes
>> >>>>>> and
>> >>>>>>> sockets together.
>> >>>>>>>
>> >>>>>>> Thanks in advance for your help
>> >>>>>>>
>> >>>>>>> Geoffroy
>> >>>>>> _______________________________________________
>> >>>>>> users mailing list
>> >>>>>> users_at_[hidden]
>> >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >>>>>
>> >>>>> -------------- next part --------------
>> >>>>> HTML attachment scrubbed and removed
>> >>>>>
>> >>>>> ------------------------------
>> >>>>>
>> >>>>> _______________________________________________
>> >>>>> users mailing list
>> >>>>> users_at_[hidden]
>> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >>>>>
>> >>>>> End of users Digest, Vol 1202, Issue 2
>> >>>>> **************************************
>> >>>>>
>> >>>>> _______________________________________________
>> >>>>> users mailing list
>> >>>>> users_at_[hidden]
>> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >>>>>
>> >>>>> _______________________________________________
>> >>>>> users mailing list
>> >>>>> users_at_[hidden]
>> >>>>> -------------- next part --------------
>> >>>>> HTML attachment scrubbed and removed
>> >>>>>
>> >>>>> ------------------------------
>> >>>>>
>> >>>>> _______________________________________________
>> >>>>> users mailing list
>> >>>>> users_at_[hidden]
>> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >>>>>
>> >>>>> End of users Digest, Vol 1218, Issue 2
>> >>>>> **************************************
>> >>>>>
>> >>>>>
>> >>>>> _______________________________________________
>> >>>>> users mailing list
>> >>>>> users_at_[hidden]
>> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >>>>
>> >>>> -------------- next part --------------
>> >>>> HTML attachment scrubbed and removed
>> >>>>
>> >>>> ------------------------------
>> >>>>
>> >>>> _______________________________________________
>> >>>> users mailing list
>> >>>> users_at_[hidden]
>> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >>>>
>> >>>> End of users Digest, Vol 1221, Issue 3
>> >>>> **************************************
>> >>>>
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> users mailing list
>> >>> users_at_[hidden]
>> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >>>
>> >> -------------- next part --------------
>> >> HTML attachment scrubbed and removed
>> >>
>> >> ------------------------------
>> >>
>> >> _______________________________________________
>> >> users mailing list
>> >> users_at_[hidden]
>> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >>
>> >> End of users Digest, Vol 1221, Issue 6
>> >> **************************************
>> >
>> > _______________________________________________
>> > users mailing list
>> > users_at_[hidden]
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
> --
> Jeff Squyres
> Cisco Systems
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>