Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1
From: Vicente (vpuibor_at_[hidden])
Date: 2009-05-04 11:53:02


Yes, I already have gfortran compiler on /usr/local/bin, the same path
as my mpif90 compiler. But I've seen when I use the mpif90 on /usr/bin
and on /Developer/usr/bin says it:

"Unfortunately, this installation of Open MPI was not compiled with
Fortran 90 support. As such, the mpif90 compiler is non-functional."

That should be the problem, I will have to change the path to use the
gfortran I have installed.
How could I do it? (Sorry, I am beginner)

Thanks.

El 04/05/2009, a las 17:38, Warner Yuen escribió:

> Have you installed a Fortran compiler? Mac OS X's developer tools do
> not come with a Fortran compiler, so you'll need to install one if
> you haven't already done so. I routinely use the Intel IFORT
> compilers with success. However, I hear many good things about the
> gfortran compilers on Mac OS X, you can't beat the price of gfortran!
>
>
> Warner Yuen
> Scientific Computing
> Consulting Engineer
> Apple, Inc.
> email: wyuen_at_[hidden]
> Tel: 408.718.2859
>
>
>
>
> On May 4, 2009, at 7:28 AM, users-request_at_[hidden] wrote:
>
>> Send users mailing list submissions to
>> users_at_[hidden]
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> or, via email, send a message with subject or body 'help' to
>> users-request_at_[hidden]
>>
>> You can reach the person managing the list at
>> users-owner_at_[hidden]
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of users digest..."
>>
>>
>> Today's Topics:
>>
>> 1. How do I compile OpenMPI in Xcode 3.1 (Vicente)
>> 2. Re: 1.3.1 -rf rankfile behaviour ?? (Ralph Castain)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Mon, 4 May 2009 16:12:44 +0200
>> From: Vicente <vpuibor_at_[hidden]>
>> Subject: [OMPI users] How do I compile OpenMPI in Xcode 3.1
>> To: users_at_[hidden]
>> Message-ID: <1C2C0085-940F-43BB-910F-975871AE2F09_at_[hidden]>
>> Content-Type: text/plain; charset="windows-1252"; Format="flowed";
>> DelSp="yes"
>>
>> Hi, I've seen the FAQ "How do I use Open MPI wrapper compilers in
>> Xcode", but it's only for MPICC. I am using MPIF90, so I did the
>> same,
>> but changing MPICC for MPIF90, and also the path, but it did not
>> work.
>>
>> Building target ?fortran? of project ?fortran? with configuration
>> ?Debug?
>>
>>
>> Checking Dependencies
>> Invalid value 'MPIF90' for GCC_VERSION
>>
>>
>> The file "MPIF90.cpcompspec" looks like this:
>>
>> 1 /**
>> 2 Xcode Coompiler Specification for MPIF90
>> 3
>> 4 */
>> 5
>> 6 { Type = Compiler;
>> 7 Identifier = com.apple.compilers.mpif90;
>> 8 BasedOn = com.apple.compilers.gcc.4_0;
>> 9 Name = "MPIF90";
>> 10 Version = "Default";
>> 11 Description = "MPI GNU C/C++ Compiler 4.0";
>> 12 ExecPath = "/usr/local/bin/mpif90"; // This gets
>> converted to the g++ variant automatically
>> 13 PrecompStyle = pch;
>> 14 }
>>
>> and is located in "/Developer/Library/Xcode/Plug-ins"
>>
>> and when I do mpif90 -v on terminal it works well:
>>
>> Using built-in specs.
>> Target: i386-apple-darwin8.10.1
>> Configured with: /tmp/gfortran-20090321/ibin/../gcc/configure --
>> prefix=/usr/local/gfortran --enable-languages=c,fortran --with-gmp=/
>> tmp/gfortran-20090321/gfortran_libs --enable-bootstrap
>> Thread model: posix
>> gcc version 4.4.0 20090321 (experimental) [trunk revision 144983]
>> (GCC)
>>
>>
>> Any idea??
>>
>> Thanks.
>>
>> Vincent
>> -------------- next part --------------
>> HTML attachment scrubbed and removed
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Mon, 4 May 2009 08:28:26 -0600
>> From: Ralph Castain <rhc_at_[hidden]>
>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
>> To: Open MPI Users <users_at_[hidden]>
>> Message-ID:
>> <71d2d8cc0905040728h2002f4d7s4c49219eee29e86f_at_[hidden]>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> Unfortunately, I didn't write any of that code - I was just fixing
>> the
>> mapper so it would properly map the procs. From what I can tell,
>> the proper
>> things are happening there.
>>
>> I'll have to dig into the code that specifically deals with parsing
>> the
>> results to bind the processes. Afraid that will take awhile longer
>> - pretty
>> dark in that hole.
>>
>>
>> On Mon, May 4, 2009 at 8:04 AM, Geoffroy Pignot
>> <geopignot_at_[hidden]> wrote:
>>
>>> Hi,
>>>
>>> So, there are no more crashes with my "crazy" mpirun command. But
>>> the
>>> paffinity feature seems to be broken. Indeed I am not able to pin my
>>> processes.
>>>
>>> Simple test with a program using your plpa library :
>>>
>>> r011n006% cat hostf
>>> r011n006 slots=4
>>>
>>> r011n006% cat rankf
>>> rank 0=r011n006 slot=0 ----> bind to CPU 0 , exact ?
>>>
>>> r011n006% /tmp/HALMPI/openmpi-1.4a/bin/mpirun --hostfile hostf --
>>> rankfile
>>> rankf --wdir /tmp -n 1 a.out
>>>>>> PLPA Number of processors online: 4
>>>>>> PLPA Number of processor sockets: 2
>>>>>> PLPA Socket 0 (ID 0): 2 cores
>>>>>> PLPA Socket 1 (ID 3): 2 cores
>>>
>>> Ctrl+Z
>>> r011n006%bg
>>>
>>> r011n006% ps axo stat,user,psr,pid,pcpu,comm | grep gpignot
>>> R+ gpignot 3 9271 97.8 a.out
>>>
>>> In fact whatever the slot number I put in my rankfile , a.out
>>> always runs
>>> on the CPU 3. I was looking for it on CPU 0 accordind to my
>>> cpuinfo file
>>> (see below)
>>> The result is the same if I try another syntax (rank 0=r011n006
>>> slot=0:0
>>> bind to socket 0 - core 0 , exact ? )
>>>
>>> Thanks in advance
>>>
>>> Geoffroy
>>>
>>> PS: I run on rhel5
>>>
>>> r011n006% uname -a
>>> Linux r011n006 2.6.18-92.1.1NOMAP32.el5 #1 SMP Sat Mar 15 01:46:39
>>> CDT 2008
>>> x86_64 x86_64 x86_64 GNU/Linux
>>>
>>> My configure is :
>>> ./configure --prefix=/tmp/openmpi-1.4a --libdir='${exec_prefix}/
>>> lib64'
>>> --disable-dlopen --disable-mpi-cxx --enable-heterogeneous
>>>
>>>
>>> r011n006% cat /proc/cpuinfo
>>> processor : 0
>>> vendor_id : GenuineIntel
>>> cpu family : 6
>>> model : 15
>>> model name : Intel(R) Xeon(R) CPU 5150 @ 2.66GHz
>>> stepping : 6
>>> cpu MHz : 2660.007
>>> cache size : 4096 KB
>>> physical id : 0
>>> siblings : 2
>>> core id : 0
>>> cpu cores : 2
>>> fpu : yes
>>> fpu_exception : yes
>>> cpuid level : 10
>>> wp : yes
>>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
>>> pge mca
>>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall
>>> nx lm
>>> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
>>> bogomips : 5323.68
>>> clflush size : 64
>>> cache_alignment : 64
>>> address sizes : 36 bits physical, 48 bits virtual
>>> power management:
>>>
>>> processor : 1
>>> vendor_id : GenuineIntel
>>> cpu family : 6
>>> model : 15
>>> model name : Intel(R) Xeon(R) CPU 5150 @ 2.66GHz
>>> stepping : 6
>>> cpu MHz : 2660.007
>>> cache size : 4096 KB
>>> physical id : 3
>>> siblings : 2
>>> core id : 0
>>> cpu cores : 2
>>> fpu : yes
>>> fpu_exception : yes
>>> cpuid level : 10
>>> wp : yes
>>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
>>> pge mca
>>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall
>>> nx lm
>>> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
>>> bogomips : 5320.03
>>> clflush size : 64
>>> cache_alignment : 64
>>> address sizes : 36 bits physical, 48 bits virtual
>>> power management:
>>>
>>> processor : 2
>>> vendor_id : GenuineIntel
>>> cpu family : 6
>>> model : 15
>>> model name : Intel(R) Xeon(R) CPU 5150 @ 2.66GHz
>>> stepping : 6
>>> cpu MHz : 2660.007
>>> cache size : 4096 KB
>>> physical id : 0
>>> siblings : 2
>>> core id : 1
>>> cpu cores : 2
>>> fpu : yes
>>> fpu_exception : yes
>>> cpuid level : 10
>>> wp : yes
>>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
>>> pge mca
>>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall
>>> nx lm
>>> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
>>> bogomips : 5319.39
>>> clflush size : 64
>>> cache_alignment : 64
>>> address sizes : 36 bits physical, 48 bits virtual
>>> power management:
>>>
>>> processor : 3
>>> vendor_id : GenuineIntel
>>> cpu family : 6
>>> model : 15
>>> model name : Intel(R) Xeon(R) CPU 5150 @ 2.66GHz
>>> stepping : 6
>>> cpu MHz : 2660.007
>>> cache size : 4096 KB
>>> physical id : 3
>>> siblings : 2
>>> core id : 1
>>> cpu cores : 2
>>> fpu : yes
>>> fpu_exception : yes
>>> cpuid level : 10
>>> wp : yes
>>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
>>> pge mca
>>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall
>>> nx lm
>>> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
>>> bogomips : 5320.03
>>> clflush size : 64
>>> cache_alignment : 64
>>> address sizes : 36 bits physical, 48 bits virtual
>>> power management:
>>>
>>>
>>>> ------------------------------
>>>>
>>>> Message: 2
>>>> Date: Mon, 4 May 2009 04:45:57 -0600
>>>> From: Ralph Castain <rhc_at_[hidden]>
>>>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
>>>> To: Open MPI Users <users_at_[hidden]>
>>>> Message-ID: <D01D7B16-4B47-46F3-AD41-D1A90B2E4927_at_[hidden]>
>>>>
>>>> Content-Type: text/plain; charset="us-ascii"; Format="flowed";
>>>> DelSp="yes"
>>>>
>>>> My apologies - I wasn't clear enough. You need a tarball from
>>>> r21111
>>>> or greater...such as:
>>>>
>>>> http://www.open-mpi.org/nightly/trunk/openmpi-1.4a1r21142.tar.gz
>>>>
>>>> HTH
>>>> Ralph
>>>>
>>>>
>>>> On May 4, 2009, at 2:14 AM, Geoffroy Pignot wrote:
>>>>
>>>>> Hi ,
>>>>>
>>>>> I got the openmpi-1.4a1r21095.tar.gz tarball, but unfortunately my
>>>>> command doesn't work
>>>>>
>>>>> cat rankf:
>>>>> rank 0=node1 slot=*
>>>>> rank 1=node2 slot=*
>>>>>
>>>>> cat hostf:
>>>>> node1 slots=2
>>>>> node2 slots=2
>>>>>
>>>>> mpirun --rankfile rankf --hostfile hostf --host node1 -n 1
>>>>> hostname : --host node2 -n 1 hostname
>>>>>
>>>>> Error, invalid rank (1) in the rankfile (rankf)
>>>>>
>>>>>
>>>> --------------------------------------------------------------------------
>>>>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in
>>>>> file
>>>>> rmaps_rank_file.c at line 403
>>>>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in
>>>>> file
>>>>> base/rmaps_base_map_job.c at line 86
>>>>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in
>>>>> file
>>>>> base/plm_base_launch_support.c at line 86
>>>>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in
>>>>> file
>>>>> plm_rsh_module.c at line 1016
>>>>>
>>>>>
>>>>> Ralph, could you tell me if my command syntax is correct or
>>>>> not ? if
>>>>> not, give me the expected one ?
>>>>>
>>>>> Regards
>>>>>
>>>>> Geoffroy
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2009/4/30 Geoffroy Pignot <geopignot_at_[hidden]>
>>>>> Immediately Sir !!! :)
>>>>>
>>>>> Thanks again Ralph
>>>>>
>>>>> Geoffroy
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------
>>>>>
>>>>> Message: 2
>>>>> Date: Thu, 30 Apr 2009 06:45:39 -0600
>>>>> From: Ralph Castain <rhc_at_[hidden]>
>>>>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
>>>>> To: Open MPI Users <users_at_[hidden]>
>>>>> Message-ID:
>>>>> <71d2d8cc0904300545v61a42fe1k50086d2704d0f7e6_at_[hidden]>
>>>>> Content-Type: text/plain; charset="iso-8859-1"
>>>>>
>>>>> I believe this is fixed now in our development trunk - you can
>>>>> download any
>>>>> tarball starting from last night and give it a try, if you like.
>>>>> Any
>>>>> feedback would be appreciated.
>>>>>
>>>>> Ralph
>>>>>
>>>>>
>>>>> On Apr 14, 2009, at 7:57 AM, Ralph Castain wrote:
>>>>>
>>>>> Ah now, I didn't say it -worked-, did I? :-)
>>>>>
>>>>> Clearly a bug exists in the program. I'll try to take a look at it
>>>>> (if Lenny
>>>>> doesn't get to it first), but it won't be until later in the week.
>>>>>
>>>>> On Apr 14, 2009, at 7:18 AM, Geoffroy Pignot wrote:
>>>>>
>>>>> I agree with you Ralph , and that 's what I expect from openmpi
>>>>> but my
>>>>> second example shows that it's not working
>>>>>
>>>>> cat hostfile.0
>>>>> r011n002 slots=4
>>>>> r011n003 slots=4
>>>>>
>>>>> cat rankfile.0
>>>>> rank 0=r011n002 slot=0
>>>>> rank 1=r011n003 slot=1
>>>>>
>>>>> mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname : -n 1
>>>>> hostname
>>>>> ### CRASHED
>>>>>
>>>>>>> Error, invalid rank (1) in the rankfile (rankfile.0)
>>>>>>>
>>>>>>
>>>>>
>>>> --------------------------------------------------------------------------
>>>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in
>>>>> file
>>>>>>> rmaps_rank_file.c at line 404
>>>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in
>>>>> file
>>>>>>> base/rmaps_base_map_job.c at line 87
>>>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in
>>>>> file
>>>>>>> base/plm_base_launch_support.c at line 77
>>>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in
>>>>> file
>>>>>>> plm_rsh_module.c at line 985
>>>>>>>
>>>>>>
>>>>>
>>>> --------------------------------------------------------------------------
>>>>>>> A daemon (pid unknown) died unexpectedly on signal 1 while
>>>>>> attempting to
>>>>>>> launch so we are aborting.
>>>>>>>
>>>>>>> There may be more information reported by the environment (see
>>>>>> above).
>>>>>>>
>>>>>>> This may be because the daemon was unable to find all the needed
>>>>>> shared
>>>>>>> libraries on the remote node. You may set your LD_LIBRARY_PATH
>>>>>>> to
>>>>>> have the
>>>>>>> location of the shared libraries on the remote nodes and this
>>>>>>> will
>>>>>>> automatically be forwarded to the remote nodes.
>>>>>>>
>>>>>>
>>>>>
>>>> --------------------------------------------------------------------------
>>>>>>>
>>>>>>
>>>>>
>>>> --------------------------------------------------------------------------
>>>>>>> orterun noticed that the job aborted, but has no info as to the
>>>>>> process
>>>>>>> that caused that situation.
>>>>>>>
>>>>>>
>>>>>
>>>> --------------------------------------------------------------------------
>>>>>>> orterun: clean termination accomplished
>>>>>
>>>>>
>>>>>
>>>>> Message: 4
>>>>> Date: Tue, 14 Apr 2009 06:55:58 -0600
>>>>> From: Ralph Castain <rhc_at_[hidden]>
>>>>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
>>>>> To: Open MPI Users <users_at_[hidden]>
>>>>> Message-ID: <F6290ADA-A196-43F0-A853-CBCB802D8D9C_at_[hidden]>
>>>>> Content-Type: text/plain; charset="us-ascii"; Format="flowed";
>>>>> DelSp="yes"
>>>>>
>>>>> The rankfile cuts across the entire job - it isn't applied on an
>>>>> app_context basis. So the ranks in your rankfile must correspond
>>>>> to
>>>>> the eventual rank of each process in the cmd line.
>>>>>
>>>>> Unfortunately, that means you have to count ranks. In your case,
>>>>> you
>>>>> only have four, so that makes life easier. Your rankfile would
>>>>> look
>>>>> something like this:
>>>>>
>>>>> rank 0=r001n001 slot=0
>>>>> rank 1=r001n002 slot=1
>>>>> rank 2=r001n001 slot=1
>>>>> rank 3=r001n002 slot=2
>>>>>
>>>>> HTH
>>>>> Ralph
>>>>>
>>>>> On Apr 14, 2009, at 12:19 AM, Geoffroy Pignot wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I agree that my examples are not very clear. What I want to do
>>>>>> is to
>>>>>> launch a multiexes application (masters-slaves) and benefit
>>>>>> from the
>>>>>> processor affinity.
>>>>>> Could you show me how to convert this command , using -rf option
>>>>>> (whatever the affinity is)
>>>>>>
>>>>>> mpirun -n 1 -host r001n001 master.x options1 : -n 1 -host
>>>>>> r001n002
>>>>>> master.x options2 : -n 1 -host r001n001 slave.x options3 : -n 1 -
>>>>>> host r001n002 slave.x options4
>>>>>>
>>>>>> Thanks for your help
>>>>>>
>>>>>> Geoffroy
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Message: 2
>>>>>> Date: Sun, 12 Apr 2009 18:26:35 +0300
>>>>>> From: Lenny Verkhovsky <lenny.verkhovsky_at_[hidden]>
>>>>>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
>>>>>> To: Open MPI Users <users_at_[hidden]>
>>>>>> Message-ID:
>>>>>>
>>>>>> <453d39990904120826t2e1d1d33l7bb1fe3de65b5361_at_[hidden]>
>>>>>> Content-Type: text/plain; charset="iso-8859-1"
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> The first "crash" is OK, since your rankfile has ranks 0 and 1
>>>>>> defined,
>>>>>> while n=1, which means only rank 0 is present and can be
>>>>>> allocated.
>>>>>>
>>>>>> NP must be >= the largest rank in rankfile.
>>>>>>
>>>>>> What exactly are you trying to do ?
>>>>>>
>>>>>> I tried to recreate your seqv but all I got was
>>>>>>
>>>>>> ~/work/svn/ompi/trunk/build_x86-64/install/bin/mpirun --hostfile
>>>>>> hostfile.0
>>>>>> -rf rankfile.0 -n 1 hostname : -rf rankfile.1 -n 1 hostname
>>>>>> [witch19:30798] mca: base: component_find: paffinity
>>>>>> "mca_paffinity_linux"
>>>>>> uses an MCA interface that is not recognized (component MCA
>>>>> v1.0.0 !=
>>>>>> supported MCA v2.0.0) -- ignored
>>>>>>
>>>>>
>>>> --------------------------------------------------------------------------
>>>>>> It looks like opal_init failed for some reason; your parallel
>>>>>> process is
>>>>>> likely to abort. There are many reasons that a parallel process
>>>>>> can
>>>>>> fail during opal_init; some of which are due to configuration or
>>>>>> environment problems. This failure appears to be an internal
>>>>> failure;
>>>>>> here's some additional information (which may only be relevant
>>>>>> to an
>>>>>> Open MPI developer):
>>>>>>
>>>>>> opal_carto_base_select failed
>>>>>> --> Returned value -13 instead of OPAL_SUCCESS
>>>>>>
>>>>>
>>>> --------------------------------------------------------------------------
>>>>>> [witch19:30798] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in
>>>>> file
>>>>>> ../../orte/runtime/orte_init.c at line 78
>>>>>> [witch19:30798] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in
>>>>> file
>>>>>> ../../orte/orted/orted_main.c at line 344
>>>>>>
>>>>>
>>>> --------------------------------------------------------------------------
>>>>>> A daemon (pid 11629) died unexpectedly with status 243 while
>>>>>> attempting
>>>>>> to launch so we are aborting.
>>>>>>
>>>>>> There may be more information reported by the environment (see
>>>>> above).
>>>>>>
>>>>>> This may be because the daemon was unable to find all the needed
>>>>>> shared
>>>>>> libraries on the remote node. You may set your LD_LIBRARY_PATH to
>>>>>> have the
>>>>>> location of the shared libraries on the remote nodes and this
>>>>>> will
>>>>>> automatically be forwarded to the remote nodes.
>>>>>>
>>>>>
>>>> --------------------------------------------------------------------------
>>>>>>
>>>>>
>>>> --------------------------------------------------------------------------
>>>>>> mpirun noticed that the job aborted, but has no info as to the
>>>>> process
>>>>>> that caused that situation.
>>>>>>
>>>>>
>>>> --------------------------------------------------------------------------
>>>>>> mpirun: clean termination accomplished
>>>>>>
>>>>>>
>>>>>> Lenny.
>>>>>>
>>>>>>
>>>>>> On 4/10/09, Geoffroy Pignot <geopignot_at_[hidden]> wrote:
>>>>>>>
>>>>>>> Hi ,
>>>>>>>
>>>>>>> I am currently testing the process affinity capabilities of
>>>>>> openmpi and I
>>>>>>> would like to know if the rankfile behaviour I will describe
>>>>>>> below
>>>>>> is normal
>>>>>>> or not ?
>>>>>>>
>>>>>>> cat hostfile.0
>>>>>>> r011n002 slots=4
>>>>>>> r011n003 slots=4
>>>>>>>
>>>>>>> cat rankfile.0
>>>>>>> rank 0=r011n002 slot=0
>>>>>>> rank 1=r011n003 slot=1
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>> ##################################################################################
>>>>>>>
>>>>>>> mpirun --hostfile hostfile.0 -rf rankfile.0 -n 2 hostname ###
>>>>>>> OK
>>>>>>> r011n002
>>>>>>> r011n003
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>> ##################################################################################
>>>>>>> but
>>>>>>> mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname : -n 1
>>>>>> hostname
>>>>>>> ### CRASHED
>>>>>>> *
>>>>>>>
>>>>>>
>>>>>
>>>> --------------------------------------------------------------------------
>>>>>>> Error, invalid rank (1) in the rankfile (rankfile.0)
>>>>>>>
>>>>>>
>>>>>
>>>> --------------------------------------------------------------------------
>>>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in
>>>>> file
>>>>>>> rmaps_rank_file.c at line 404
>>>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in
>>>>> file
>>>>>>> base/rmaps_base_map_job.c at line 87
>>>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in
>>>>> file
>>>>>>> base/plm_base_launch_support.c at line 77
>>>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in
>>>>> file
>>>>>>> plm_rsh_module.c at line 985
>>>>>>>
>>>>>>
>>>>>
>>>> --------------------------------------------------------------------------
>>>>>>> A daemon (pid unknown) died unexpectedly on signal 1 while
>>>>>> attempting to
>>>>>>> launch so we are aborting.
>>>>>>>
>>>>>>> There may be more information reported by the environment (see
>>>>>> above).
>>>>>>>
>>>>>>> This may be because the daemon was unable to find all the needed
>>>>>> shared
>>>>>>> libraries on the remote node. You may set your LD_LIBRARY_PATH
>>>>>>> to
>>>>>> have the
>>>>>>> location of the shared libraries on the remote nodes and this
>>>>>>> will
>>>>>>> automatically be forwarded to the remote nodes.
>>>>>>>
>>>>>>
>>>>>
>>>> --------------------------------------------------------------------------
>>>>>>>
>>>>>>
>>>>>
>>>> --------------------------------------------------------------------------
>>>>>>> orterun noticed that the job aborted, but has no info as to the
>>>>>> process
>>>>>>> that caused that situation.
>>>>>>>
>>>>>>
>>>>>
>>>> --------------------------------------------------------------------------
>>>>>>> orterun: clean termination accomplished
>>>>>>> *
>>>>>>> It seems that the rankfile option is not propagted to the second
>>>>>> command
>>>>>>> line ; there is no global understanding of the ranking inside a
>>>>>> mpirun
>>>>>>> command.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>> ##################################################################################
>>>>>>>
>>>>>>> Assuming that , I tried to provide a rankfile to each command
>>>>> line:
>>>>>>>
>>>>>>> cat rankfile.0
>>>>>>> rank 0=r011n002 slot=0
>>>>>>>
>>>>>>> cat rankfile.1
>>>>>>> rank 0=r011n003 slot=1
>>>>>>>
>>>>>>> mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname : -rf
>>>>>> rankfile.1
>>>>>>> -n 1 hostname ### CRASHED
>>>>>>> *[r011n002:28778] *** Process received signal ***
>>>>>>> [r011n002:28778] Signal: Segmentation fault (11)
>>>>>>> [r011n002:28778] Signal code: Address not mapped (1)
>>>>>>> [r011n002:28778] Failing at address: 0x34
>>>>>>> [r011n002:28778] [ 0] [0xffffe600]
>>>>>>> [r011n002:28778] [ 1]
>>>>>>> /tmp/HALMPI/openmpi-1.3.1/lib/libopen-rte.so.
>>>>>> 0(orte_odls_base_default_get_add_procs_data+0x55d)
>>>>>>> [0x5557decd]
>>>>>>> [r011n002:28778] [ 2]
>>>>>>> /tmp/HALMPI/openmpi-1.3.1/lib/libopen-rte.so.
>>>>>> 0(orte_plm_base_launch_apps+0x117)
>>>>>>> [0x555842a7]
>>>>>>> [r011n002:28778] [ 3] /tmp/HALMPI/openmpi-1.3.1/lib/openmpi/
>>>>>> mca_plm_rsh.so
>>>>>>> [0x556098c0]
>>>>>>> [r011n002:28778] [ 4] /tmp/HALMPI/openmpi-1.3.1/bin/orterun
>>>>>> [0x804aa27]
>>>>>>> [r011n002:28778] [ 5] /tmp/HALMPI/openmpi-1.3.1/bin/orterun
>>>>>> [0x804a022]
>>>>>>> [r011n002:28778] [ 6] /lib/libc.so.6(__libc_start_main+0xdc)
>>>>>> [0x9f1dec]
>>>>>>> [r011n002:28778] [ 7] /tmp/HALMPI/openmpi-1.3.1/bin/orterun
>>>>>> [0x8049f71]
>>>>>>> [r011n002:28778] *** End of error message ***
>>>>>>> Segmentation fault (core dumped)*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I hope that I've found a bug because it would be very important
>>>>>> for me to
>>>>>>> have this kind of capabiliy .
>>>>>>> Launch a multiexe mpirun command line and be able to bind my
>>>>>>> exes
>>>>>> and
>>>>>>> sockets together.
>>>>>>>
>>>>>>> Thanks in advance for your help
>>>>>>>
>>>>>>> Geoffroy
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>> -------------- next part --------------
>>>>> HTML attachment scrubbed and removed
>>>>>
>>>>> ------------------------------
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>> End of users Digest, Vol 1202, Issue 2
>>>>> **************************************
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> -------------- next part --------------
>>>>> HTML attachment scrubbed and removed
>>>>>
>>>>> ------------------------------
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>> End of users Digest, Vol 1218, Issue 2
>>>>> **************************************
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> -------------- next part --------------
>>>> HTML attachment scrubbed and removed
>>>>
>>>> ------------------------------
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> End of users Digest, Vol 1221, Issue 3
>>>> **************************************
>>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>> -------------- next part --------------
>> HTML attachment scrubbed and removed
>>
>> ------------------------------
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> End of users Digest, Vol 1221, Issue 6
>> **************************************
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users