Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-07-15 09:38:53


Okay, I'll dig into it - must be a bug in my code.

Sorry for the problem! Thanks for patience in tracking it down...
Ralph

On Wed, Jul 15, 2009 at 7:28 AM, Lenny Verkhovsky <
lenny.verkhovsky_at_[hidden]> wrote:

> Thanks, Ralph,
> I guess your guess was correct, here is the display map.
>
>
> $cat rankfile
> rank 0=+n1 slot=0
> rank 1=+n0 slot=0
> $cat appfile
> -np 1 -host witch1 ./hello_world
> -np 1 -host witch2 ./hello_world
> $mpirun -np 2 -rf rankfile --display-allocation -app appfile
>
> ====================== ALLOCATED NODES ======================
>
> Data for node: Name: dellix7 Num slots: 0 Max slots: 0
> Data for node: Name: witch1 Num slots: 1 Max slots: 0
> Data for node: Name: witch2 Num slots: 1 Max slots: 0
>
> =================================================================
>
> --------------------------------------------------------------------------
> Rankfile claimed host +n1 by index that is bigger than number of allocated
> hosts.
>
>
> On Wed, Jul 15, 2009 at 4:10 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>
>> What is supposed to happen is this:
>>
>> 1. each line of the appfile causes us to create a new app_context. We
>> store the provided -host info in that object.
>>
>> 2. when we create the "allocation", we cycle through -all- the
>> app_contexts and add -all- of their -host info into the list of allocated
>> nodes
>>
>> 3. when we get_target_nodes, we start with the entire list of allocated
>> nodes, and then use -host for that app_context to filter down to the hosts
>> allowed for that specific app_context
>>
>> So you should have to only provide -np 1 and 1 host on each line. My guess
>> is that the rankfile mapper isn't correctly behaving for multiple
>> app_contexts.
>>
>> Add --display-allocation to your mpirun cmd line for the "not working" cse
>> and let's see what mpirun thinks the total allocation is - I'll bet that
>> both nodes show up, which would tell us that my "guess" is correct. Then
>> I'll know what needs to be fixed.
>>
>> Thanks
>> Ralph
>>
>>
>>
>> On Wed, Jul 15, 2009 at 6:08 AM, Lenny Verkhovsky <
>> lenny.verkhovsky_at_[hidden]> wrote:
>>
>>> Same result.
>>> I still suspect that rankfile claims for node in small hostlist provided
>>> by line in the app file, and not from the hostlist provided by mpirun on HNP
>>> node.
>>> According to my suspections your proposal should not work(and it does
>>> not), since in appfile line I provide np=1, and 1 host, while rankfile tries
>>> to allocate all ranks (np=2).
>>>
>>> $orte/mca/rmaps/rank_file/rmaps_rank_file.c at line 338
>>>
>>> if(ORTE_SUCCESS != (rc = orte_rmaps_base_get_target_nodes(&node_list,
>>> &num_slots, app,
>>>
>>> map->policy))) {
>>>
>>> node_list will be partial, according to app, and not full provided by
>>> mpirun cmd. If I didnt provide hostlist in the appfile line, mpirun uses
>>> local host and not hosts from the hostfile.
>>>
>>>
>>> Tell me if I am wrong by expecting the following behaivor
>>>
>>> I provide to mpirun NP, full_hostlist, full_rankfile, appfile
>>> I provide in appfile only partial NP and partial hostlist.
>>> and it works.
>>>
>>> Currently, in order to get it working I need to provide full hostlist in
>>> the appfile. Which is quit a problematic.
>>>
>>>
>>> $mpirun -np 2 -rf rankfile -app appfile
>>>
>>> --------------------------------------------------------------------------
>>> Rankfile claimed host +n1 by index that is bigger than number of
>>> allocated hosts.
>>>
>>> --------------------------------------------------------------------------
>>> [dellix7:17277] [[23928,0],0] ORTE_ERROR_LOG: Bad parameter in file
>>> ../../../../../orte/mca/rmaps/rank_file/rmaps_rank_file.c at line 422
>>> [dellix7:17277] [[23928,0],0] ORTE_ERROR_LOG: Bad parameter in file
>>> ../../../../orte/mca/rmaps/base/rmaps_base_map_job.c at line 85
>>> [dellix7:17277] [[23928,0],0] ORTE_ERROR_LOG: Bad parameter in file
>>> ../../../../orte/mca/plm/base/plm_base_launch_support.c at line 103
>>> [dellix7:17277] [[23928,0],0] ORTE_ERROR_LOG: Bad parameter in file
>>> ../../../../../orte/mca/plm/rsh/plm_rsh_module.c at line 1001
>>>
>>>
>>> Thanks
>>> Lenny.
>>>
>>>
>>> On Wed, Jul 15, 2009 at 2:02 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>>
>>>> Try your "not working" example without the -H on the mpirun cmd line -
>>>> i.e.,, just use "mpirun -np 2 -rf rankfile -app appfile". Does that work?
>>>> Sorry to have to keep asking you to try things - I don't have a setup
>>>> here where I can test this as everything is RM managed.
>>>>
>>>>
>>>> On Jul 15, 2009, at 12:09 AM, Lenny Verkhovsky wrote:
>>>>
>>>>
>>>> Thanks Ralph, after playing with prefixes it worked,
>>>>
>>>> I still have a problem running app file with rankfile, by providing full
>>>> hostlist in mpirun command and not in app file.
>>>> Is is planned behaviour, or it can be fixed ?
>>>>
>>>> See Working example:
>>>>
>>>> $cat rankfile
>>>> rank 0=+n1 slot=0
>>>> rank 1=+n0 slot=0
>>>> $cat appfile
>>>> -np 1 -H witch1,witch2 ./hello_world
>>>> -np 1 -H witch1,witch2 ./hello_world
>>>>
>>>> $mpirun -rf rankfile -app appfile
>>>> Hello world! I'm 1 of 2 on witch1
>>>> Hello world! I'm 0 of 2 on witch2
>>>>
>>>> See NOT working example:
>>>>
>>>> $cat appfile
>>>> -np 1 -H witch1 ./hello_world
>>>> -np 1 -H witch2 ./hello_world
>>>> $mpirun -np 2 -H witch1,witch2 -rf rankfile -app appfile
>>>>
>>>> --------------------------------------------------------------------------
>>>> Rankfile claimed host +n1 by index that is bigger than number of
>>>> allocated hosts.
>>>>
>>>> --------------------------------------------------------------------------
>>>> [dellix7:16405] [[24080,0],0] ORTE_ERROR_LOG: Bad parameter in file
>>>> ../../../../../orte/mca/rmaps/rank_file/rmaps_rank_file.c at line 422
>>>> [dellix7:16405] [[24080,0],0] ORTE_ERROR_LOG: Bad parameter in file
>>>> ../../../../orte/mca/rmaps/base/rmaps_base_map_job.c at line 85
>>>> [dellix7:16405] [[24080,0],0] ORTE_ERROR_LOG: Bad parameter in file
>>>> ../../../../orte/mca/plm/base/plm_base_launch_support.c at line 103
>>>> [dellix7:16405] [[24080,0],0] ORTE_ERROR_LOG: Bad parameter in file
>>>> ../../../../../orte/mca/plm/rsh/plm_rsh_module.c at line 1001
>>>>
>>>>
>>>>
>>>> On Wed, Jul 15, 2009 at 6:58 AM, Ralph Castain <rhc_at_[hidden]>wrote:
>>>>
>>>>> Took a deeper look into this, and I think that your first guess was
>>>>> correct.
>>>>> When we changed hostfile and -host to be per-app-context options, it
>>>>> became necessary for you to put that info in the appfile itself. So try
>>>>> adding it there. What you would need in your appfile is the following:
>>>>>
>>>>> -np 1 -H witch1 hostname
>>>>> -np 1 -H witch2 hostname
>>>>>
>>>>> That should get you what you want.
>>>>> Ralph
>>>>>
>>>>> On Jul 14, 2009, at 10:29 AM, Lenny Verkhovsky wrote:
>>>>>
>>>>> No, it's not working as I expect , unless I expect somthing wrong .
>>>>> ( sorry for the long PATH, I needed to provide it )
>>>>>
>>>>> $LD_LIBRARY_PATH=/hpc/home/USERS/lennyb/work/svn/ompi/trunk/build_x86-64/install/lib/
>>>>> /hpc/home/USERS/lennyb/work/svn/ompi/trunk/build_x86-64/install/bin/mpirun
>>>>> -np 2 -H witch1,witch2 hostname
>>>>> witch1
>>>>> witch2
>>>>>
>>>>> $LD_LIBRARY_PATH=/hpc/home/USERS/lennyb/work/svn/ompi/trunk/build_x86-64/install/lib/
>>>>> /hpc/home/USERS/lennyb/work/svn/ompi/trunk/build_x86-64/install/bin/mpirun
>>>>> -np 2 -H witch1,witch2 -app appfile
>>>>> dellix7
>>>>> dellix7
>>>>> $cat appfile
>>>>> -np 1 hostname
>>>>> -np 1 hostname
>>>>>
>>>>>
>>>>> On Tue, Jul 14, 2009 at 7:08 PM, Ralph Castain <rhc_at_[hidden]>wrote:
>>>>>
>>>>>> Run it without the appfile, just putting the apps on the cmd line -
>>>>>> does it work right then?
>>>>>>
>>>>>> On Jul 14, 2009, at 10:04 AM, Lenny Verkhovsky wrote:
>>>>>>
>>>>>> additional info
>>>>>> I am running mpirun on hostA, and providing hostlist with hostB and
>>>>>> hostC.
>>>>>> I expect that each application would run on hostB and hostC, but I get
>>>>>> all of them running on hostA.
>>>>>> dellix7$cat appfile
>>>>>> -np 1 hostname
>>>>>> -np 1 hostname
>>>>>> dellix7$mpirun -np 2 -H witch1,witch2 -app appfile
>>>>>> dellix7
>>>>>> dellix7
>>>>>> Thanks
>>>>>> Lenny.
>>>>>>
>>>>>> On Tue, Jul 14, 2009 at 4:59 PM, Ralph Castain <rhc_at_[hidden]>wrote:
>>>>>>
>>>>>>> Strange - let me have a look at it later today. Probably something
>>>>>>> simple that another pair of eyes might spot.
>>>>>>> On Jul 14, 2009, at 7:43 AM, Lenny Verkhovsky wrote:
>>>>>>>
>>>>>>> Seems like connected problem:
>>>>>>> I can't use rankfile with app, even after all those fixes ( working
>>>>>>> with trunk 1.4a1r21657).
>>>>>>> This is my case :
>>>>>>>
>>>>>>> $cat rankfile
>>>>>>> rank 0=+n1 slot=0
>>>>>>> rank 1=+n0 slot=0
>>>>>>> $cat appfile
>>>>>>> -np 1 hostname
>>>>>>> -np 1 hostname
>>>>>>> $mpirun -np 2 -H witch1,witch2 -rf rankfile -app appfile
>>>>>>>
>>>>>>> --------------------------------------------------------------------------
>>>>>>> Rankfile claimed host +n1 by index that is bigger than number of
>>>>>>> allocated hosts.
>>>>>>>
>>>>>>> --------------------------------------------------------------------------
>>>>>>> [dellix7:13414] [[10851,0],0] ORTE_ERROR_LOG: Bad parameter in file
>>>>>>> ../../../../../orte/mca/rmaps/rank_file/rmaps_rank_file.c at line 422
>>>>>>> [dellix7:13414] [[10851,0],0] ORTE_ERROR_LOG: Bad parameter in file
>>>>>>> ../../../../orte/mca/rmaps/base/rmaps_base_map_job.c at line 85
>>>>>>> [dellix7:13414] [[10851,0],0] ORTE_ERROR_LOG: Bad parameter in file
>>>>>>> ../../../../orte/mca/plm/base/plm_base_launch_support.c at line 103
>>>>>>> [dellix7:13414] [[10851,0],0] ORTE_ERROR_LOG: Bad parameter in file
>>>>>>> ../../../../../orte/mca/plm/rsh/plm_rsh_module.c at line 1001
>>>>>>>
>>>>>>>
>>>>>>> The problem is, that rankfile mapper tries to find an appropriate
>>>>>>> host in the partial ( and not full ) hostlist.
>>>>>>>
>>>>>>> Any suggestions how to fix it?
>>>>>>>
>>>>>>> Thanks
>>>>>>> Lenny.
>>>>>>>
>>>>>>> On Wed, May 13, 2009 at 1:55 AM, Ralph Castain <rhc_at_[hidden]>wrote:
>>>>>>>
>>>>>>>> Okay, I fixed this today too....r21219
>>>>>>>>
>>>>>>>>
>>>>>>>> On May 11, 2009, at 11:27 PM, Anton Starikov wrote:
>>>>>>>>
>>>>>>>> Now there is another problem :)
>>>>>>>>>
>>>>>>>>> You can try oversubscribe node. At least by 1 task.
>>>>>>>>> If you hostfile and rank file limit you at N procs, you can ask
>>>>>>>>> mpirun for N+1 and it wil be not rejected.
>>>>>>>>> Although in reality there will be N tasks.
>>>>>>>>> So, if your hostfile limit is 4, then "mpirun -np 4" and "mpirun
>>>>>>>>> -np 5" both works, but in both cases there are only 4 tasks. It isn't
>>>>>>>>> crucial, because there is nor real oversubscription, but there is still some
>>>>>>>>> bug which can affect something in future.
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Anton Starikov.
>>>>>>>>>
>>>>>>>>> On May 12, 2009, at 1:45 AM, Ralph Castain wrote:
>>>>>>>>>
>>>>>>>>> This is fixed as of r21208.
>>>>>>>>>>
>>>>>>>>>> Thanks for reporting it!
>>>>>>>>>> Ralph
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On May 11, 2009, at 12:51 PM, Anton Starikov wrote:
>>>>>>>>>>
>>>>>>>>>> Although removing this check solves problem of having more slots
>>>>>>>>>>> in rankfile than necessary, there is another problem.
>>>>>>>>>>>
>>>>>>>>>>> If I set rmaps_base_no_oversubscribe=1 then if, for example:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> hostfile:
>>>>>>>>>>>
>>>>>>>>>>> node01
>>>>>>>>>>> node01
>>>>>>>>>>> node02
>>>>>>>>>>> node02
>>>>>>>>>>>
>>>>>>>>>>> rankfile:
>>>>>>>>>>>
>>>>>>>>>>> rank 0=node01 slot=1
>>>>>>>>>>> rank 1=node01 slot=0
>>>>>>>>>>> rank 2=node02 slot=1
>>>>>>>>>>> rank 3=node02 slot=0
>>>>>>>>>>>
>>>>>>>>>>> mpirun -np 4 ./something
>>>>>>>>>>>
>>>>>>>>>>> complains with:
>>>>>>>>>>>
>>>>>>>>>>> "There are not enough slots available in the system to satisfy
>>>>>>>>>>> the 4 slots
>>>>>>>>>>> that were requested by the application"
>>>>>>>>>>>
>>>>>>>>>>> but "mpirun -np 3 ./something" will work though. It works, when
>>>>>>>>>>> you ask for 1 CPU less. And the same behavior in any case (shared nodes,
>>>>>>>>>>> non-shared nodes, multi-node)
>>>>>>>>>>>
>>>>>>>>>>> If you switch off rmaps_base_no_oversubscribe, then it works and
>>>>>>>>>>> all affinities set as it requested in rankfile, there is no
>>>>>>>>>>> oversubscription.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Anton.
>>>>>>>>>>>
>>>>>>>>>>> On May 5, 2009, at 3:08 PM, Ralph Castain wrote:
>>>>>>>>>>>
>>>>>>>>>>> Ah - thx for catching that, I'll remove that check. It no longer
>>>>>>>>>>>> is required.
>>>>>>>>>>>>
>>>>>>>>>>>> Thx!
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, May 5, 2009 at 7:04 AM, Lenny Verkhovsky <
>>>>>>>>>>>> lenny.verkhovsky_at_[hidden]> wrote:
>>>>>>>>>>>> According to the code it does cares.
>>>>>>>>>>>>
>>>>>>>>>>>> $vi orte/mca/rmaps/rank_file/rmaps_rank_file.c +572
>>>>>>>>>>>>
>>>>>>>>>>>> ival = orte_rmaps_rank_file_value.ival;
>>>>>>>>>>>> if ( ival > (np-1) ) {
>>>>>>>>>>>> orte_show_help("help-rmaps_rank_file.txt", "bad-rankfile", true,
>>>>>>>>>>>> ival, rankfile);
>>>>>>>>>>>> rc = ORTE_ERR_BAD_PARAM;
>>>>>>>>>>>> goto unlock;
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> If I remember correctly, I used an array to map ranks, and since
>>>>>>>>>>>> the length of array is NP, maximum index must be less than np, so if you
>>>>>>>>>>>> have the number of rank > NP, you have no place to put it inside array.
>>>>>>>>>>>>
>>>>>>>>>>>> "Likewise, if you have more procs than the rankfile specifies,
>>>>>>>>>>>> we map the additional procs either byslot (default) or bynode (if you
>>>>>>>>>>>> specify that option). So the rankfile doesn't need to contain an entry for
>>>>>>>>>>>> every proc." - Correct point.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Lenny.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 5/5/09, Ralph Castain <rhc_at_[hidden]> wrote: Sorry Lenny,
>>>>>>>>>>>> but that isn't correct. The rankfile mapper doesn't care if the rankfile
>>>>>>>>>>>> contains additional info - it only maps up to the number of processes, and
>>>>>>>>>>>> ignores anything beyond that number. So there is no need to remove the
>>>>>>>>>>>> additional info.
>>>>>>>>>>>>
>>>>>>>>>>>> Likewise, if you have more procs than the rankfile specifies, we
>>>>>>>>>>>> map the additional procs either byslot (default) or bynode (if you specify
>>>>>>>>>>>> that option). So the rankfile doesn't need to contain an entry for every
>>>>>>>>>>>> proc.
>>>>>>>>>>>>
>>>>>>>>>>>> Just don't want to confuse folks.
>>>>>>>>>>>> Ralph
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, May 5, 2009 at 5:59 AM, Lenny Verkhovsky <
>>>>>>>>>>>> lenny.verkhovsky_at_[hidden]> wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> maximum rank number must be less then np.
>>>>>>>>>>>> if np=1 then there is only rank 0 in the system, so rank 1 is
>>>>>>>>>>>> invalid.
>>>>>>>>>>>> please remove "rank 1=node2 slot=*" from the rankfile
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>> Lenny.
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, May 4, 2009 at 11:14 AM, Geoffroy Pignot <
>>>>>>>>>>>> geopignot_at_[hidden]> wrote:
>>>>>>>>>>>> Hi ,
>>>>>>>>>>>>
>>>>>>>>>>>> I got the openmpi-1.4a1r21095.tar.gz tarball, but unfortunately
>>>>>>>>>>>> my command doesn't work
>>>>>>>>>>>>
>>>>>>>>>>>> cat rankf:
>>>>>>>>>>>> rank 0=node1 slot=*
>>>>>>>>>>>> rank 1=node2 slot=*
>>>>>>>>>>>>
>>>>>>>>>>>> cat hostf:
>>>>>>>>>>>> node1 slots=2
>>>>>>>>>>>> node2 slots=2
>>>>>>>>>>>>
>>>>>>>>>>>> mpirun --rankfile rankf --hostfile hostf --host node1 -n 1
>>>>>>>>>>>> hostname : --host node2 -n 1 hostname
>>>>>>>>>>>>
>>>>>>>>>>>> Error, invalid rank (1) in the rankfile (rankf)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in
>>>>>>>>>>>> file rmaps_rank_file.c at line 403
>>>>>>>>>>>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in
>>>>>>>>>>>> file base/rmaps_base_map_job.c at line 86
>>>>>>>>>>>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in
>>>>>>>>>>>> file base/plm_base_launch_support.c at line 86
>>>>>>>>>>>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in
>>>>>>>>>>>> file plm_rsh_module.c at line 1016
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Ralph, could you tell me if my command syntax is correct or not
>>>>>>>>>>>> ? if not, give me the expected one ?
>>>>>>>>>>>>
>>>>>>>>>>>> Regards
>>>>>>>>>>>>
>>>>>>>>>>>> Geoffroy
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2009/4/30 Geoffroy Pignot <geopignot_at_[hidden]>
>>>>>>>>>>>>
>>>>>>>>>>>> Immediately Sir !!! :)
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks again Ralph
>>>>>>>>>>>>
>>>>>>>>>>>> Geoffroy
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>
>>>>>>>>>>>> Message: 2
>>>>>>>>>>>> Date: Thu, 30 Apr 2009 06:45:39 -0600
>>>>>>>>>>>> From: Ralph Castain <rhc_at_[hidden]>
>>>>>>>>>>>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
>>>>>>>>>>>> To: Open MPI Users <users_at_[hidden]>
>>>>>>>>>>>> Message-ID:
>>>>>>>>>>>> <71d2d8cc0904300545v61a42fe1k50086d2704d0f7e6_at_[hidden]
>>>>>>>>>>>> >
>>>>>>>>>>>> Content-Type: text/plain; charset="iso-8859-1"
>>>>>>>>>>>>
>>>>>>>>>>>> I believe this is fixed now in our development trunk - you can
>>>>>>>>>>>> download any
>>>>>>>>>>>> tarball starting from last night and give it a try, if you like.
>>>>>>>>>>>> Any
>>>>>>>>>>>> feedback would be appreciated.
>>>>>>>>>>>>
>>>>>>>>>>>> Ralph
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Apr 14, 2009, at 7:57 AM, Ralph Castain wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Ah now, I didn't say it -worked-, did I? :-)
>>>>>>>>>>>>
>>>>>>>>>>>> Clearly a bug exists in the program. I'll try to take a look at
>>>>>>>>>>>> it (if Lenny
>>>>>>>>>>>> doesn't get to it first), but it won't be until later in the
>>>>>>>>>>>> week.
>>>>>>>>>>>>
>>>>>>>>>>>> On Apr 14, 2009, at 7:18 AM, Geoffroy Pignot wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> I agree with you Ralph , and that 's what I expect from openmpi
>>>>>>>>>>>> but my
>>>>>>>>>>>> second example shows that it's not working
>>>>>>>>>>>>
>>>>>>>>>>>> cat hostfile.0
>>>>>>>>>>>> r011n002 slots=4
>>>>>>>>>>>> r011n003 slots=4
>>>>>>>>>>>>
>>>>>>>>>>>> cat rankfile.0
>>>>>>>>>>>> rank 0=r011n002 slot=0
>>>>>>>>>>>> rank 1=r011n003 slot=1
>>>>>>>>>>>>
>>>>>>>>>>>> mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname : -n 1
>>>>>>>>>>>> hostname
>>>>>>>>>>>> ### CRASHED
>>>>>>>>>>>>
>>>>>>>>>>>> > > Error, invalid rank (1) in the rankfile (rankfile.0)
>>>>>>>>>>>> > >
>>>>>>>>>>>> >
>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>> > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter
>>>>>>>>>>>> in file
>>>>>>>>>>>> > > rmaps_rank_file.c at line 404
>>>>>>>>>>>> > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter
>>>>>>>>>>>> in file
>>>>>>>>>>>> > > base/rmaps_base_map_job.c at line 87
>>>>>>>>>>>> > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter
>>>>>>>>>>>> in file
>>>>>>>>>>>> > > base/plm_base_launch_support.c at line 77
>>>>>>>>>>>> > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter
>>>>>>>>>>>> in file
>>>>>>>>>>>> > > plm_rsh_module.c at line 985
>>>>>>>>>>>> > >
>>>>>>>>>>>> >
>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>> > > A daemon (pid unknown) died unexpectedly on signal 1 while
>>>>>>>>>>>> > attempting to
>>>>>>>>>>>> > > launch so we are aborting.
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > There may be more information reported by the environment
>>>>>>>>>>>> (see
>>>>>>>>>>>> > above).
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > This may be because the daemon was unable to find all the
>>>>>>>>>>>> needed
>>>>>>>>>>>> > shared
>>>>>>>>>>>> > > libraries on the remote node. You may set your
>>>>>>>>>>>> LD_LIBRARY_PATH to
>>>>>>>>>>>> > have the
>>>>>>>>>>>> > > location of the shared libraries on the remote nodes and
>>>>>>>>>>>> this will
>>>>>>>>>>>> > > automatically be forwarded to the remote nodes.
>>>>>>>>>>>> > >
>>>>>>>>>>>> >
>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>> > >
>>>>>>>>>>>> >
>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>> > > orterun noticed that the job aborted, but has no info as to
>>>>>>>>>>>> the
>>>>>>>>>>>> > process
>>>>>>>>>>>> > > that caused that situation.
>>>>>>>>>>>> > >
>>>>>>>>>>>> >
>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>> > > orterun: clean termination accomplished
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Message: 4
>>>>>>>>>>>> Date: Tue, 14 Apr 2009 06:55:58 -0600
>>>>>>>>>>>> From: Ralph Castain <rhc_at_[hidden]>
>>>>>>>>>>>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
>>>>>>>>>>>> To: Open MPI Users <users_at_[hidden]>
>>>>>>>>>>>> Message-ID: <F6290ADA-A196-43F0-A853-CBCB802D8D9C_at_[hidden]>
>>>>>>>>>>>> Content-Type: text/plain; charset="us-ascii"; Format="flowed";
>>>>>>>>>>>> DelSp="yes"
>>>>>>>>>>>>
>>>>>>>>>>>> The rankfile cuts across the entire job - it isn't applied on an
>>>>>>>>>>>> app_context basis. So the ranks in your rankfile must correspond
>>>>>>>>>>>> to
>>>>>>>>>>>> the eventual rank of each process in the cmd line.
>>>>>>>>>>>>
>>>>>>>>>>>> Unfortunately, that means you have to count ranks. In your case,
>>>>>>>>>>>> you
>>>>>>>>>>>> only have four, so that makes life easier. Your rankfile would
>>>>>>>>>>>> look
>>>>>>>>>>>> something like this:
>>>>>>>>>>>>
>>>>>>>>>>>> rank 0=r001n001 slot=0
>>>>>>>>>>>> rank 1=r001n002 slot=1
>>>>>>>>>>>> rank 2=r001n001 slot=1
>>>>>>>>>>>> rank 3=r001n002 slot=2
>>>>>>>>>>>>
>>>>>>>>>>>> HTH
>>>>>>>>>>>> Ralph
>>>>>>>>>>>>
>>>>>>>>>>>> On Apr 14, 2009, at 12:19 AM, Geoffroy Pignot wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> > Hi,
>>>>>>>>>>>> >
>>>>>>>>>>>> > I agree that my examples are not very clear. What I want to do
>>>>>>>>>>>> is to
>>>>>>>>>>>> > launch a multiexes application (masters-slaves) and benefit
>>>>>>>>>>>> from the
>>>>>>>>>>>> > processor affinity.
>>>>>>>>>>>> > Could you show me how to convert this command , using -rf
>>>>>>>>>>>> option
>>>>>>>>>>>> > (whatever the affinity is)
>>>>>>>>>>>> >
>>>>>>>>>>>> > mpirun -n 1 -host r001n001 master.x options1 : -n 1 -host
>>>>>>>>>>>> r001n002
>>>>>>>>>>>> > master.x options2 : -n 1 -host r001n001 slave.x options3 : -n
>>>>>>>>>>>> 1 -
>>>>>>>>>>>> > host r001n002 slave.x options4
>>>>>>>>>>>> >
>>>>>>>>>>>> > Thanks for your help
>>>>>>>>>>>> >
>>>>>>>>>>>> > Geoffroy
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> > Message: 2
>>>>>>>>>>>> > Date: Sun, 12 Apr 2009 18:26:35 +0300
>>>>>>>>>>>> > From: Lenny Verkhovsky <lenny.verkhovsky_at_[hidden]>
>>>>>>>>>>>> > Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
>>>>>>>>>>>> > To: Open MPI Users <users_at_[hidden]>
>>>>>>>>>>>> > Message-ID:
>>>>>>>>>>>> > <
>>>>>>>>>>>> 453d39990904120826t2e1d1d33l7bb1fe3de65b5361_at_[hidden]>
>>>>>>>>>>>> > Content-Type: text/plain; charset="iso-8859-1"
>>>>>>>>>>>> >
>>>>>>>>>>>> > Hi,
>>>>>>>>>>>> >
>>>>>>>>>>>> > The first "crash" is OK, since your rankfile has ranks 0 and 1
>>>>>>>>>>>> > defined,
>>>>>>>>>>>> > while n=1, which means only rank 0 is present and can be
>>>>>>>>>>>> allocated.
>>>>>>>>>>>> >
>>>>>>>>>>>> > NP must be >= the largest rank in rankfile.
>>>>>>>>>>>> >
>>>>>>>>>>>> > What exactly are you trying to do ?
>>>>>>>>>>>> >
>>>>>>>>>>>> > I tried to recreate your seqv but all I got was
>>>>>>>>>>>> >
>>>>>>>>>>>> > ~/work/svn/ompi/trunk/build_x86-64/install/bin/mpirun
>>>>>>>>>>>> --hostfile
>>>>>>>>>>>> > hostfile.0
>>>>>>>>>>>> > -rf rankfile.0 -n 1 hostname : -rf rankfile.1 -n 1 hostname
>>>>>>>>>>>> > [witch19:30798] mca: base: component_find: paffinity
>>>>>>>>>>>> > "mca_paffinity_linux"
>>>>>>>>>>>> > uses an MCA interface that is not recognized (component MCA
>>>>>>>>>>>> v1.0.0 !=
>>>>>>>>>>>> > supported MCA v2.0.0) -- ignored
>>>>>>>>>>>> >
>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>> > It looks like opal_init failed for some reason; your parallel
>>>>>>>>>>>> > process is
>>>>>>>>>>>> > likely to abort. There are many reasons that a parallel
>>>>>>>>>>>> process can
>>>>>>>>>>>> > fail during opal_init; some of which are due to configuration
>>>>>>>>>>>> or
>>>>>>>>>>>> > environment problems. This failure appears to be an internal
>>>>>>>>>>>> failure;
>>>>>>>>>>>> > here's some additional information (which may only be relevant
>>>>>>>>>>>> to an
>>>>>>>>>>>> > Open MPI developer):
>>>>>>>>>>>> >
>>>>>>>>>>>> > opal_carto_base_select failed
>>>>>>>>>>>> > --> Returned value -13 instead of OPAL_SUCCESS
>>>>>>>>>>>> >
>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>> > [witch19:30798] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found
>>>>>>>>>>>> in file
>>>>>>>>>>>> > ../../orte/runtime/orte_init.c at line 78
>>>>>>>>>>>> > [witch19:30798] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found
>>>>>>>>>>>> in file
>>>>>>>>>>>> > ../../orte/orted/orted_main.c at line 344
>>>>>>>>>>>> >
>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>> > A daemon (pid 11629) died unexpectedly with status 243 while
>>>>>>>>>>>> > attempting
>>>>>>>>>>>> > to launch so we are aborting.
>>>>>>>>>>>> >
>>>>>>>>>>>> > There may be more information reported by the environment (see
>>>>>>>>>>>> above).
>>>>>>>>>>>> >
>>>>>>>>>>>> > This may be because the daemon was unable to find all the
>>>>>>>>>>>> needed
>>>>>>>>>>>> > shared
>>>>>>>>>>>> > libraries on the remote node. You may set your LD_LIBRARY_PATH
>>>>>>>>>>>> to
>>>>>>>>>>>> > have the
>>>>>>>>>>>> > location of the shared libraries on the remote nodes and this
>>>>>>>>>>>> will
>>>>>>>>>>>> > automatically be forwarded to the remote nodes.
>>>>>>>>>>>> >
>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>> >
>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>> > mpirun noticed that the job aborted, but has no info as to the
>>>>>>>>>>>> process
>>>>>>>>>>>> > that caused that situation.
>>>>>>>>>>>> >
>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>> > mpirun: clean termination accomplished
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> > Lenny.
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> > On 4/10/09, Geoffroy Pignot <geopignot_at_[hidden]> wrote:
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > Hi ,
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > I am currently testing the process affinity capabilities of
>>>>>>>>>>>> > openmpi and I
>>>>>>>>>>>> > > would like to know if the rankfile behaviour I will describe
>>>>>>>>>>>> below
>>>>>>>>>>>> > is normal
>>>>>>>>>>>> > > or not ?
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > cat hostfile.0
>>>>>>>>>>>> > > r011n002 slots=4
>>>>>>>>>>>> > > r011n003 slots=4
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > cat rankfile.0
>>>>>>>>>>>> > > rank 0=r011n002 slot=0
>>>>>>>>>>>> > > rank 1=r011n003 slot=1
>>>>>>>>>>>> > >
>>>>>>>>>>>> > >
>>>>>>>>>>>> > >
>>>>>>>>>>>> >
>>>>>>>>>>>>
>>>>>>>>>>>> ##################################################################################
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > mpirun --hostfile hostfile.0 -rf rankfile.0 -n 2 hostname
>>>>>>>>>>>> ### OK
>>>>>>>>>>>> > > r011n002
>>>>>>>>>>>> > > r011n003
>>>>>>>>>>>> > >
>>>>>>>>>>>> > >
>>>>>>>>>>>> > >
>>>>>>>>>>>> >
>>>>>>>>>>>>
>>>>>>>>>>>> ##################################################################################
>>>>>>>>>>>> > > but
>>>>>>>>>>>> > > mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname :
>>>>>>>>>>>> -n 1
>>>>>>>>>>>> > hostname
>>>>>>>>>>>> > > ### CRASHED
>>>>>>>>>>>> > > *
>>>>>>>>>>>> > >
>>>>>>>>>>>> >
>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>> > > Error, invalid rank (1) in the rankfile (rankfile.0)
>>>>>>>>>>>> > >
>>>>>>>>>>>> >
>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>> > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter
>>>>>>>>>>>> in file
>>>>>>>>>>>> > > rmaps_rank_file.c at line 404
>>>>>>>>>>>> > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter
>>>>>>>>>>>> in file
>>>>>>>>>>>> > > base/rmaps_base_map_job.c at line 87
>>>>>>>>>>>> > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter
>>>>>>>>>>>> in file
>>>>>>>>>>>> > > base/plm_base_launch_support.c at line 77
>>>>>>>>>>>> > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter
>>>>>>>>>>>> in file
>>>>>>>>>>>> > > plm_rsh_module.c at line 985
>>>>>>>>>>>> > >
>>>>>>>>>>>> >
>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>> > > A daemon (pid unknown) died unexpectedly on signal 1 while
>>>>>>>>>>>> > attempting to
>>>>>>>>>>>> > > launch so we are aborting.
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > There may be more information reported by the environment
>>>>>>>>>>>> (see
>>>>>>>>>>>> > above).
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > This may be because the daemon was unable to find all the
>>>>>>>>>>>> needed
>>>>>>>>>>>> > shared
>>>>>>>>>>>> > > libraries on the remote node. You may set your
>>>>>>>>>>>> LD_LIBRARY_PATH to
>>>>>>>>>>>> > have the
>>>>>>>>>>>> > > location of the shared libraries on the remote nodes and
>>>>>>>>>>>> this will
>>>>>>>>>>>> > > automatically be forwarded to the remote nodes.
>>>>>>>>>>>> > >
>>>>>>>>>>>> >
>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>> > >
>>>>>>>>>>>> >
>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>> > > orterun noticed that the job aborted, but has no info as to
>>>>>>>>>>>> the
>>>>>>>>>>>> > process
>>>>>>>>>>>> > > that caused that situation.
>>>>>>>>>>>> > >
>>>>>>>>>>>> >
>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>> > > orterun: clean termination accomplished
>>>>>>>>>>>> > > *
>>>>>>>>>>>> > > It seems that the rankfile option is not propagted to the
>>>>>>>>>>>> second
>>>>>>>>>>>> > command
>>>>>>>>>>>> > > line ; there is no global understanding of the ranking
>>>>>>>>>>>> inside a
>>>>>>>>>>>> > mpirun
>>>>>>>>>>>> > > command.
>>>>>>>>>>>> > >
>>>>>>>>>>>> > >
>>>>>>>>>>>> > >
>>>>>>>>>>>> >
>>>>>>>>>>>>
>>>>>>>>>>>> ##################################################################################
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > Assuming that , I tried to provide a rankfile to each
>>>>>>>>>>>> command line:
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > cat rankfile.0
>>>>>>>>>>>> > > rank 0=r011n002 slot=0
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > cat rankfile.1
>>>>>>>>>>>> > > rank 0=r011n003 slot=1
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname :
>>>>>>>>>>>> -rf
>>>>>>>>>>>> > rankfile.1
>>>>>>>>>>>> > > -n 1 hostname ### CRASHED
>>>>>>>>>>>> > > *[r011n002:28778] *** Process received signal ***
>>>>>>>>>>>> > > [r011n002:28778] Signal: Segmentation fault (11)
>>>>>>>>>>>> > > [r011n002:28778] Signal code: Address not mapped (1)
>>>>>>>>>>>> > > [r011n002:28778] Failing at address: 0x34
>>>>>>>>>>>> > > [r011n002:28778] [ 0] [0xffffe600]
>>>>>>>>>>>> > > [r011n002:28778] [ 1]
>>>>>>>>>>>> > > /tmp/HALMPI/openmpi-1.3.1/lib/libopen-rte.so.
>>>>>>>>>>>> > 0(orte_odls_base_default_get_add_procs_data+0x55d)
>>>>>>>>>>>> > > [0x5557decd]
>>>>>>>>>>>> > > [r011n002:28778] [ 2]
>>>>>>>>>>>> > > /tmp/HALMPI/openmpi-1.3.1/lib/libopen-rte.so.
>>>>>>>>>>>> > 0(orte_plm_base_launch_apps+0x117)
>>>>>>>>>>>> > > [0x555842a7]
>>>>>>>>>>>> > > [r011n002:28778] [ 3] /tmp/HALMPI/openmpi-1.3.1/lib/openmpi/
>>>>>>>>>>>> > mca_plm_rsh.so
>>>>>>>>>>>> > > [0x556098c0]
>>>>>>>>>>>> > > [r011n002:28778] [ 4] /tmp/HALMPI/openmpi-1.3.1/bin/orterun
>>>>>>>>>>>> > [0x804aa27]
>>>>>>>>>>>> > > [r011n002:28778] [ 5] /tmp/HALMPI/openmpi-1.3.1/bin/orterun
>>>>>>>>>>>> > [0x804a022]
>>>>>>>>>>>> > > [r011n002:28778] [ 6] /lib/libc.so.6(__libc_start_main+0xdc)
>>>>>>>>>>>> > [0x9f1dec]
>>>>>>>>>>>> > > [r011n002:28778] [ 7] /tmp/HALMPI/openmpi-1.3.1/bin/orterun
>>>>>>>>>>>> > [0x8049f71]
>>>>>>>>>>>> > > [r011n002:28778] *** End of error message ***
>>>>>>>>>>>> > > Segmentation fault (core dumped)*
>>>>>>>>>>>> > >
>>>>>>>>>>>> > >
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > I hope that I've found a bug because it would be very
>>>>>>>>>>>> important
>>>>>>>>>>>> > for me to
>>>>>>>>>>>> > > have this kind of capabiliy .
>>>>>>>>>>>> > > Launch a multiexe mpirun command line and be able to bind my
>>>>>>>>>>>> exes
>>>>>>>>>>>> > and
>>>>>>>>>>>> > > sockets together.
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > Thanks in advance for your help
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > Geoffroy
>>>>>>>>>>>> > _______________________________________________
>>>>>>>>>>>> > users mailing list
>>>>>>>>>>>> > users_at_[hidden]
>>>>>>>>>>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>
>>>>>>>>>>>> -------------- next part --------------
>>>>>>>>>>>> HTML attachment scrubbed and removed
>>>>>>>>>>>>
>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> users mailing list
>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>
>>>>>>>>>>>> End of users Digest, Vol 1202, Issue 2
>>>>>>>>>>>> **************************************
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> users mailing list
>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> users mailing list
>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>> -------------- next part --------------
>>>>>>>>>>>> HTML attachment scrubbed and removed
>>>>>>>>>>>>
>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> users mailing list
>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>
>>>>>>>>>>>> End of users Digest, Vol 1218, Issue 2
>>>>>>>>>>>> **************************************
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> users mailing list
>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> users mailing list
>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> users mailing list
>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> users mailing list
>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> users mailing list
>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> users mailing list
>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> users_at_[hidden]
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> users_at_[hidden]
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> users_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>