Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
From: Mike Dubman (mike.ompi_at_[hidden])
Date: 2009-07-16 02:01:44


Hello Ralph,

It seems that Option2 is preferred, because it is more intuitive for
end-user to create rankfile for mpi job, which is described by -app cmd
line.

All hosts definitions used inside -app <file>, will be treated like a single
global hostlist combined from all hosts appearing inside "-app file" and
rankfile will refer to any host, appearing inside "-app <file>" directive.
is that correct?

regards

Mike

P.S. man mpirun claims, that:

-app <appfile> Provide an appfile, ignoring all other command line options.

but it seems that it does not ignore all other command line options.
Even more, it seems that it is very comfortable to specify some per_job
parameters inside mpirun command just before -app appfile and putting
per_host params inside "-app appfile". What do you think?

On Thu, Jul 16, 2009 at 2:00 AM, Ralph Castain <rhc_at_[hidden]> wrote:

> Hmmm...well actually, there isn't a bug in the code. This is an interesting
> question!
> Here is the problem. It has to do with how -host is processed. Remember, in
> the new scheme (as of 1.3.0), in the absence of any other info (e.g., an RM
> allocation or hostfile), we cycle across -all- the -host specifications to
> create a global pool of allocated nodes. Hence, you got the following:
>
> ====================== ALLOCATED NODES ======================
>>
>> Data for node: Name: dellix7 Num slots: 0 Max slots: 0
>> Data for node: Name: witch1 Num slots: 1 Max slots: 0
>> Data for node: Name: witch2 Num slots: 1 Max slots: 0
>>
>> =================================================================
>>
>
> When we start mapping, we call the base function to get the available nodes
> for this particular app_context. The function starts with the entire
> allocation. It then checks for a hostfile, which in this case it won't find.
>
> Subsequently, it looks at the -host spec and removes -all- nodes in the
> list that were not included in -host. In the case of app_context=0, the
> "-host witch1" causes us to remove dellix7 and witch2 from the list -
> leaving only witch1.
>
> This list is passed back to the rank_file mapper. The rf mapper then looks
> at your rankfile, which tells it to put rank=0 on the +n1 node on the list.
>
> But the list only has ONE node on the list, which would correspond to +n0!
> Hence the error message.
>
> We have two potential solutions I can see:
>
> Option 1. we can leave things as they are, and you adjust your rankfile to:
>
> rank 0=+n0 slot=0
> rank 1=+n0 slot=0
>
> Since you specified -host witch2 for the second app_context, this will work
> to put rank0 on witch1 and rank1 on witch2. However, I admit that it looks a
> little weird.
>
> Alternatively, you could adjust your appfile to:
>
> -np 1 -host witch1,witch2 ./hello_world
> -np 1 ./hello_world
>
> Note you could have -host witch1,witch2 on the second line too, if you
> wanted. Now your current rankfile would put rank0 on witch2 and rank1 on
> witch1.
>
> Option 2. we could modify your relative node syntax to be based on the
> eventual total allocation. In this case, we would not use the base function
> to give us a list, but instead would construct it from the allocated node
> pool.
> Your current rankfile would give you what you wanted since we wouldn't count the HNP's node in the pool as it wasn't included in the allocation.
>
>
> Any thoughts on how you'd like to do this? I can make it work either way, but have no personal preference.
> Ralph
>
> On Jul 15, 2009, at 7:38 AM, Ralph Castain wrote:
>
> Okay, I'll dig into it - must be a bug in my code.
>
> Sorry for the problem! Thanks for patience in tracking it down...
> Ralph
>
> On Wed, Jul 15, 2009 at 7:28 AM, Lenny Verkhovsky <
> lenny.verkhovsky_at_[hidden]> wrote:
>
>> Thanks, Ralph,
>> I guess your guess was correct, here is the display map.
>>
>>
>> $cat rankfile
>> rank 0=+n1 slot=0
>> rank 1=+n0 slot=0
>> $cat appfile
>> -np 1 -host witch1 ./hello_world
>> -np 1 -host witch2 ./hello_world
>> $mpirun -np 2 -rf rankfile --display-allocation -app appfile
>>
>> ====================== ALLOCATED NODES ======================
>>
>> Data for node: Name: dellix7 Num slots: 0 Max slots: 0
>> Data for node: Name: witch1 Num slots: 1 Max slots: 0
>> Data for node: Name: witch2 Num slots: 1 Max slots: 0
>>
>> =================================================================
>>
>> --------------------------------------------------------------------------
>> Rankfile claimed host +n1 by index that is bigger than number of allocated
>> hosts.
>>
>>
>> On Wed, Jul 15, 2009 at 4:10 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>
>>> What is supposed to happen is this:
>>>
>>> 1. each line of the appfile causes us to create a new app_context. We
>>> store the provided -host info in that object.
>>>
>>> 2. when we create the "allocation", we cycle through -all- the
>>> app_contexts and add -all- of their -host info into the list of allocated
>>> nodes
>>>
>>> 3. when we get_target_nodes, we start with the entire list of allocated
>>> nodes, and then use -host for that app_context to filter down to the hosts
>>> allowed for that specific app_context
>>>
>>> So you should have to only provide -np 1 and 1 host on each line. My
>>> guess is that the rankfile mapper isn't correctly behaving for multiple
>>> app_contexts.
>>>
>>> Add --display-allocation to your mpirun cmd line for the "not working"
>>> cse and let's see what mpirun thinks the total allocation is - I'll bet that
>>> both nodes show up, which would tell us that my "guess" is correct. Then
>>> I'll know what needs to be fixed.
>>>
>>> Thanks
>>> Ralph
>>>
>>>
>>>
>>> On Wed, Jul 15, 2009 at 6:08 AM, Lenny Verkhovsky <
>>> lenny.verkhovsky_at_[hidden]> wrote:
>>>
>>>> Same result.
>>>> I still suspect that rankfile claims for node in small hostlist provided
>>>> by line in the app file, and not from the hostlist provided by mpirun on HNP
>>>> node.
>>>> According to my suspections your proposal should not work(and it does
>>>> not), since in appfile line I provide np=1, and 1 host, while rankfile tries
>>>> to allocate all ranks (np=2).
>>>>
>>>> $orte/mca/rmaps/rank_file/rmaps_rank_file.c at line 338
>>>>
>>>> if(ORTE_SUCCESS != (rc = orte_rmaps_base_get_target_nodes(&node_list,
>>>> &num_slots, app,
>>>>
>>>> map->policy))) {
>>>>
>>>> node_list will be partial, according to app, and not full provided by
>>>> mpirun cmd. If I didnt provide hostlist in the appfile line, mpirun uses
>>>> local host and not hosts from the hostfile.
>>>>
>>>>
>>>> Tell me if I am wrong by expecting the following behaivor
>>>>
>>>> I provide to mpirun NP, full_hostlist, full_rankfile, appfile
>>>> I provide in appfile only partial NP and partial hostlist.
>>>> and it works.
>>>>
>>>> Currently, in order to get it working I need to provide full hostlist in
>>>> the appfile. Which is quit a problematic.
>>>>
>>>>
>>>> $mpirun -np 2 -rf rankfile -app appfile
>>>>
>>>> --------------------------------------------------------------------------
>>>> Rankfile claimed host +n1 by index that is bigger than number of
>>>> allocated hosts.
>>>>
>>>> --------------------------------------------------------------------------
>>>> [dellix7:17277] [[23928,0],0] ORTE_ERROR_LOG: Bad parameter in file
>>>> ../../../../../orte/mca/rmaps/rank_file/rmaps_rank_file.c at line 422
>>>> [dellix7:17277] [[23928,0],0] ORTE_ERROR_LOG: Bad parameter in file
>>>> ../../../../orte/mca/rmaps/base/rmaps_base_map_job.c at line 85
>>>> [dellix7:17277] [[23928,0],0] ORTE_ERROR_LOG: Bad parameter in file
>>>> ../../../../orte/mca/plm/base/plm_base_launch_support.c at line 103
>>>> [dellix7:17277] [[23928,0],0] ORTE_ERROR_LOG: Bad parameter in file
>>>> ../../../../../orte/mca/plm/rsh/plm_rsh_module.c at line 1001
>>>>
>>>>
>>>> Thanks
>>>> Lenny.
>>>>
>>>>
>>>> On Wed, Jul 15, 2009 at 2:02 PM, Ralph Castain <rhc_at_[hidden]>wrote:
>>>>
>>>>> Try your "not working" example without the -H on the mpirun cmd line -
>>>>> i.e.,, just use "mpirun -np 2 -rf rankfile -app appfile". Does that work?
>>>>> Sorry to have to keep asking you to try things - I don't have a setup
>>>>> here where I can test this as everything is RM managed.
>>>>>
>>>>>
>>>>> On Jul 15, 2009, at 12:09 AM, Lenny Verkhovsky wrote:
>>>>>
>>>>>
>>>>> Thanks Ralph, after playing with prefixes it worked,
>>>>>
>>>>> I still have a problem running app file with rankfile, by providing
>>>>> full hostlist in mpirun command and not in app file.
>>>>> Is is planned behaviour, or it can be fixed ?
>>>>>
>>>>> See Working example:
>>>>>
>>>>> $cat rankfile
>>>>> rank 0=+n1 slot=0
>>>>> rank 1=+n0 slot=0
>>>>> $cat appfile
>>>>> -np 1 -H witch1,witch2 ./hello_world
>>>>> -np 1 -H witch1,witch2 ./hello_world
>>>>>
>>>>> $mpirun -rf rankfile -app appfile
>>>>> Hello world! I'm 1 of 2 on witch1
>>>>> Hello world! I'm 0 of 2 on witch2
>>>>>
>>>>> See NOT working example:
>>>>>
>>>>> $cat appfile
>>>>> -np 1 -H witch1 ./hello_world
>>>>> -np 1 -H witch2 ./hello_world
>>>>> $mpirun -np 2 -H witch1,witch2 -rf rankfile -app appfile
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> Rankfile claimed host +n1 by index that is bigger than number of
>>>>> allocated hosts.
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> [dellix7:16405] [[24080,0],0] ORTE_ERROR_LOG: Bad parameter in file
>>>>> ../../../../../orte/mca/rmaps/rank_file/rmaps_rank_file.c at line 422
>>>>> [dellix7:16405] [[24080,0],0] ORTE_ERROR_LOG: Bad parameter in file
>>>>> ../../../../orte/mca/rmaps/base/rmaps_base_map_job.c at line 85
>>>>> [dellix7:16405] [[24080,0],0] ORTE_ERROR_LOG: Bad parameter in file
>>>>> ../../../../orte/mca/plm/base/plm_base_launch_support.c at line 103
>>>>> [dellix7:16405] [[24080,0],0] ORTE_ERROR_LOG: Bad parameter in file
>>>>> ../../../../../orte/mca/plm/rsh/plm_rsh_module.c at line 1001
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jul 15, 2009 at 6:58 AM, Ralph Castain <rhc_at_[hidden]>wrote:
>>>>>
>>>>>> Took a deeper look into this, and I think that your first guess was
>>>>>> correct.
>>>>>> When we changed hostfile and -host to be per-app-context options, it
>>>>>> became necessary for you to put that info in the appfile itself. So try
>>>>>> adding it there. What you would need in your appfile is the following:
>>>>>>
>>>>>> -np 1 -H witch1 hostname
>>>>>> -np 1 -H witch2 hostname
>>>>>>
>>>>>> That should get you what you want.
>>>>>> Ralph
>>>>>>
>>>>>> On Jul 14, 2009, at 10:29 AM, Lenny Verkhovsky wrote:
>>>>>>
>>>>>> No, it's not working as I expect , unless I expect somthing wrong .
>>>>>> ( sorry for the long PATH, I needed to provide it )
>>>>>>
>>>>>> $LD_LIBRARY_PATH=/hpc/home/USERS/lennyb/work/svn/ompi/trunk/build_x86-64/install/lib/
>>>>>> /hpc/home/USERS/lennyb/work/svn/ompi/trunk/build_x86-64/install/bin/mpirun
>>>>>> -np 2 -H witch1,witch2 hostname
>>>>>> witch1
>>>>>> witch2
>>>>>>
>>>>>> $LD_LIBRARY_PATH=/hpc/home/USERS/lennyb/work/svn/ompi/trunk/build_x86-64/install/lib/
>>>>>> /hpc/home/USERS/lennyb/work/svn/ompi/trunk/build_x86-64/install/bin/mpirun
>>>>>> -np 2 -H witch1,witch2 -app appfile
>>>>>> dellix7
>>>>>> dellix7
>>>>>> $cat appfile
>>>>>> -np 1 hostname
>>>>>> -np 1 hostname
>>>>>>
>>>>>>
>>>>>> On Tue, Jul 14, 2009 at 7:08 PM, Ralph Castain <rhc_at_[hidden]>wrote:
>>>>>>
>>>>>>> Run it without the appfile, just putting the apps on the cmd line -
>>>>>>> does it work right then?
>>>>>>>
>>>>>>> On Jul 14, 2009, at 10:04 AM, Lenny Verkhovsky wrote:
>>>>>>>
>>>>>>> additional info
>>>>>>> I am running mpirun on hostA, and providing hostlist with hostB and
>>>>>>> hostC.
>>>>>>> I expect that each application would run on hostB and hostC, but I
>>>>>>> get all of them running on hostA.
>>>>>>> dellix7$cat appfile
>>>>>>> -np 1 hostname
>>>>>>> -np 1 hostname
>>>>>>> dellix7$mpirun -np 2 -H witch1,witch2 -app appfile
>>>>>>> dellix7
>>>>>>> dellix7
>>>>>>> Thanks
>>>>>>> Lenny.
>>>>>>>
>>>>>>> On Tue, Jul 14, 2009 at 4:59 PM, Ralph Castain <rhc_at_[hidden]>wrote:
>>>>>>>
>>>>>>>> Strange - let me have a look at it later today. Probably something
>>>>>>>> simple that another pair of eyes might spot.
>>>>>>>> On Jul 14, 2009, at 7:43 AM, Lenny Verkhovsky wrote:
>>>>>>>>
>>>>>>>> Seems like connected problem:
>>>>>>>> I can't use rankfile with app, even after all those fixes ( working
>>>>>>>> with trunk 1.4a1r21657).
>>>>>>>> This is my case :
>>>>>>>>
>>>>>>>> $cat rankfile
>>>>>>>> rank 0=+n1 slot=0
>>>>>>>> rank 1=+n0 slot=0
>>>>>>>> $cat appfile
>>>>>>>> -np 1 hostname
>>>>>>>> -np 1 hostname
>>>>>>>> $mpirun -np 2 -H witch1,witch2 -rf rankfile -app appfile
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>> Rankfile claimed host +n1 by index that is bigger than number of
>>>>>>>> allocated hosts.
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>> [dellix7:13414] [[10851,0],0] ORTE_ERROR_LOG: Bad parameter in file
>>>>>>>> ../../../../../orte/mca/rmaps/rank_file/rmaps_rank_file.c at line 422
>>>>>>>> [dellix7:13414] [[10851,0],0] ORTE_ERROR_LOG: Bad parameter in file
>>>>>>>> ../../../../orte/mca/rmaps/base/rmaps_base_map_job.c at line 85
>>>>>>>> [dellix7:13414] [[10851,0],0] ORTE_ERROR_LOG: Bad parameter in file
>>>>>>>> ../../../../orte/mca/plm/base/plm_base_launch_support.c at line 103
>>>>>>>> [dellix7:13414] [[10851,0],0] ORTE_ERROR_LOG: Bad parameter in file
>>>>>>>> ../../../../../orte/mca/plm/rsh/plm_rsh_module.c at line 1001
>>>>>>>>
>>>>>>>>
>>>>>>>> The problem is, that rankfile mapper tries to find an appropriate
>>>>>>>> host in the partial ( and not full ) hostlist.
>>>>>>>>
>>>>>>>> Any suggestions how to fix it?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Lenny.
>>>>>>>>
>>>>>>>> On Wed, May 13, 2009 at 1:55 AM, Ralph Castain <rhc_at_[hidden]>wrote:
>>>>>>>>
>>>>>>>>> Okay, I fixed this today too....r21219
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On May 11, 2009, at 11:27 PM, Anton Starikov wrote:
>>>>>>>>>
>>>>>>>>> Now there is another problem :)
>>>>>>>>>>
>>>>>>>>>> You can try oversubscribe node. At least by 1 task.
>>>>>>>>>> If you hostfile and rank file limit you at N procs, you can ask
>>>>>>>>>> mpirun for N+1 and it wil be not rejected.
>>>>>>>>>> Although in reality there will be N tasks.
>>>>>>>>>> So, if your hostfile limit is 4, then "mpirun -np 4" and "mpirun
>>>>>>>>>> -np 5" both works, but in both cases there are only 4 tasks. It isn't
>>>>>>>>>> crucial, because there is nor real oversubscription, but there is still some
>>>>>>>>>> bug which can affect something in future.
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Anton Starikov.
>>>>>>>>>>
>>>>>>>>>> On May 12, 2009, at 1:45 AM, Ralph Castain wrote:
>>>>>>>>>>
>>>>>>>>>> This is fixed as of r21208.
>>>>>>>>>>>
>>>>>>>>>>> Thanks for reporting it!
>>>>>>>>>>> Ralph
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On May 11, 2009, at 12:51 PM, Anton Starikov wrote:
>>>>>>>>>>>
>>>>>>>>>>> Although removing this check solves problem of having more slots
>>>>>>>>>>>> in rankfile than necessary, there is another problem.
>>>>>>>>>>>>
>>>>>>>>>>>> If I set rmaps_base_no_oversubscribe=1 then if, for example:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> hostfile:
>>>>>>>>>>>>
>>>>>>>>>>>> node01
>>>>>>>>>>>> node01
>>>>>>>>>>>> node02
>>>>>>>>>>>> node02
>>>>>>>>>>>>
>>>>>>>>>>>> rankfile:
>>>>>>>>>>>>
>>>>>>>>>>>> rank 0=node01 slot=1
>>>>>>>>>>>> rank 1=node01 slot=0
>>>>>>>>>>>> rank 2=node02 slot=1
>>>>>>>>>>>> rank 3=node02 slot=0
>>>>>>>>>>>>
>>>>>>>>>>>> mpirun -np 4 ./something
>>>>>>>>>>>>
>>>>>>>>>>>> complains with:
>>>>>>>>>>>>
>>>>>>>>>>>> "There are not enough slots available in the system to satisfy
>>>>>>>>>>>> the 4 slots
>>>>>>>>>>>> that were requested by the application"
>>>>>>>>>>>>
>>>>>>>>>>>> but "mpirun -np 3 ./something" will work though. It works, when
>>>>>>>>>>>> you ask for 1 CPU less. And the same behavior in any case (shared nodes,
>>>>>>>>>>>> non-shared nodes, multi-node)
>>>>>>>>>>>>
>>>>>>>>>>>> If you switch off rmaps_base_no_oversubscribe, then it works and
>>>>>>>>>>>> all affinities set as it requested in rankfile, there is no
>>>>>>>>>>>> oversubscription.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Anton.
>>>>>>>>>>>>
>>>>>>>>>>>> On May 5, 2009, at 3:08 PM, Ralph Castain wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Ah - thx for catching that, I'll remove that check. It no longer
>>>>>>>>>>>>> is required.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thx!
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, May 5, 2009 at 7:04 AM, Lenny Verkhovsky <
>>>>>>>>>>>>> lenny.verkhovsky_at_[hidden]> wrote:
>>>>>>>>>>>>> According to the code it does cares.
>>>>>>>>>>>>>
>>>>>>>>>>>>> $vi orte/mca/rmaps/rank_file/rmaps_rank_file.c +572
>>>>>>>>>>>>>
>>>>>>>>>>>>> ival = orte_rmaps_rank_file_value.ival;
>>>>>>>>>>>>> if ( ival > (np-1) ) {
>>>>>>>>>>>>> orte_show_help("help-rmaps_rank_file.txt", "bad-rankfile",
>>>>>>>>>>>>> true, ival, rankfile);
>>>>>>>>>>>>> rc = ORTE_ERR_BAD_PARAM;
>>>>>>>>>>>>> goto unlock;
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>> If I remember correctly, I used an array to map ranks, and
>>>>>>>>>>>>> since the length of array is NP, maximum index must be less than np, so if
>>>>>>>>>>>>> you have the number of rank > NP, you have no place to put it inside array.
>>>>>>>>>>>>>
>>>>>>>>>>>>> "Likewise, if you have more procs than the rankfile specifies,
>>>>>>>>>>>>> we map the additional procs either byslot (default) or bynode (if you
>>>>>>>>>>>>> specify that option). So the rankfile doesn't need to contain an entry for
>>>>>>>>>>>>> every proc." - Correct point.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Lenny.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 5/5/09, Ralph Castain <rhc_at_[hidden]> wrote: Sorry
>>>>>>>>>>>>> Lenny, but that isn't correct. The rankfile mapper doesn't care if the
>>>>>>>>>>>>> rankfile contains additional info - it only maps up to the number of
>>>>>>>>>>>>> processes, and ignores anything beyond that number. So there is no need to
>>>>>>>>>>>>> remove the additional info.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Likewise, if you have more procs than the rankfile specifies,
>>>>>>>>>>>>> we map the additional procs either byslot (default) or bynode (if you
>>>>>>>>>>>>> specify that option). So the rankfile doesn't need to contain an entry for
>>>>>>>>>>>>> every proc.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Just don't want to confuse folks.
>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, May 5, 2009 at 5:59 AM, Lenny Verkhovsky <
>>>>>>>>>>>>> lenny.verkhovsky_at_[hidden]> wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>> maximum rank number must be less then np.
>>>>>>>>>>>>> if np=1 then there is only rank 0 in the system, so rank 1 is
>>>>>>>>>>>>> invalid.
>>>>>>>>>>>>> please remove "rank 1=node2 slot=*" from the rankfile
>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>> Lenny.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, May 4, 2009 at 11:14 AM, Geoffroy Pignot <
>>>>>>>>>>>>> geopignot_at_[hidden]> wrote:
>>>>>>>>>>>>> Hi ,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I got the openmpi-1.4a1r21095.tar.gz tarball, but unfortunately
>>>>>>>>>>>>> my command doesn't work
>>>>>>>>>>>>>
>>>>>>>>>>>>> cat rankf:
>>>>>>>>>>>>> rank 0=node1 slot=*
>>>>>>>>>>>>> rank 1=node2 slot=*
>>>>>>>>>>>>>
>>>>>>>>>>>>> cat hostf:
>>>>>>>>>>>>> node1 slots=2
>>>>>>>>>>>>> node2 slots=2
>>>>>>>>>>>>>
>>>>>>>>>>>>> mpirun --rankfile rankf --hostfile hostf --host node1 -n 1
>>>>>>>>>>>>> hostname : --host node2 -n 1 hostname
>>>>>>>>>>>>>
>>>>>>>>>>>>> Error, invalid rank (1) in the rankfile (rankf)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in
>>>>>>>>>>>>> file rmaps_rank_file.c at line 403
>>>>>>>>>>>>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in
>>>>>>>>>>>>> file base/rmaps_base_map_job.c at line 86
>>>>>>>>>>>>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in
>>>>>>>>>>>>> file base/plm_base_launch_support.c at line 86
>>>>>>>>>>>>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in
>>>>>>>>>>>>> file plm_rsh_module.c at line 1016
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ralph, could you tell me if my command syntax is correct or not
>>>>>>>>>>>>> ? if not, give me the expected one ?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>
>>>>>>>>>>>>> Geoffroy
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2009/4/30 Geoffroy Pignot <geopignot_at_[hidden]>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Immediately Sir !!! :)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks again Ralph
>>>>>>>>>>>>>
>>>>>>>>>>>>> Geoffroy
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>
>>>>>>>>>>>>> Message: 2
>>>>>>>>>>>>> Date: Thu, 30 Apr 2009 06:45:39 -0600
>>>>>>>>>>>>> From: Ralph Castain <rhc_at_[hidden]>
>>>>>>>>>>>>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
>>>>>>>>>>>>> To: Open MPI Users <users_at_[hidden]>
>>>>>>>>>>>>> Message-ID:
>>>>>>>>>>>>> <
>>>>>>>>>>>>> 71d2d8cc0904300545v61a42fe1k50086d2704d0f7e6_at_[hidden]>
>>>>>>>>>>>>> Content-Type: text/plain; charset="iso-8859-1"
>>>>>>>>>>>>>
>>>>>>>>>>>>> I believe this is fixed now in our development trunk - you can
>>>>>>>>>>>>> download any
>>>>>>>>>>>>> tarball starting from last night and give it a try, if you
>>>>>>>>>>>>> like. Any
>>>>>>>>>>>>> feedback would be appreciated.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Apr 14, 2009, at 7:57 AM, Ralph Castain wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ah now, I didn't say it -worked-, did I? :-)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Clearly a bug exists in the program. I'll try to take a look at
>>>>>>>>>>>>> it (if Lenny
>>>>>>>>>>>>> doesn't get to it first), but it won't be until later in the
>>>>>>>>>>>>> week.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Apr 14, 2009, at 7:18 AM, Geoffroy Pignot wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> I agree with you Ralph , and that 's what I expect from openmpi
>>>>>>>>>>>>> but my
>>>>>>>>>>>>> second example shows that it's not working
>>>>>>>>>>>>>
>>>>>>>>>>>>> cat hostfile.0
>>>>>>>>>>>>> r011n002 slots=4
>>>>>>>>>>>>> r011n003 slots=4
>>>>>>>>>>>>>
>>>>>>>>>>>>> cat rankfile.0
>>>>>>>>>>>>> rank 0=r011n002 slot=0
>>>>>>>>>>>>> rank 1=r011n003 slot=1
>>>>>>>>>>>>>
>>>>>>>>>>>>> mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname : -n
>>>>>>>>>>>>> 1 hostname
>>>>>>>>>>>>> ### CRASHED
>>>>>>>>>>>>>
>>>>>>>>>>>>> > > Error, invalid rank (1) in the rankfile (rankfile.0)
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> >
>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>> > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad
>>>>>>>>>>>>> parameter in file
>>>>>>>>>>>>> > > rmaps_rank_file.c at line 404
>>>>>>>>>>>>> > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad
>>>>>>>>>>>>> parameter in file
>>>>>>>>>>>>> > > base/rmaps_base_map_job.c at line 87
>>>>>>>>>>>>> > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad
>>>>>>>>>>>>> parameter in file
>>>>>>>>>>>>> > > base/plm_base_launch_support.c at line 77
>>>>>>>>>>>>> > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad
>>>>>>>>>>>>> parameter in file
>>>>>>>>>>>>> > > plm_rsh_module.c at line 985
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> >
>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>> > > A daemon (pid unknown) died unexpectedly on signal 1 while
>>>>>>>>>>>>> > attempting to
>>>>>>>>>>>>> > > launch so we are aborting.
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> > > There may be more information reported by the environment
>>>>>>>>>>>>> (see
>>>>>>>>>>>>> > above).
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> > > This may be because the daemon was unable to find all the
>>>>>>>>>>>>> needed
>>>>>>>>>>>>> > shared
>>>>>>>>>>>>> > > libraries on the remote node. You may set your
>>>>>>>>>>>>> LD_LIBRARY_PATH to
>>>>>>>>>>>>> > have the
>>>>>>>>>>>>> > > location of the shared libraries on the remote nodes and
>>>>>>>>>>>>> this will
>>>>>>>>>>>>> > > automatically be forwarded to the remote nodes.
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> >
>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> >
>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>> > > orterun noticed that the job aborted, but has no info as to
>>>>>>>>>>>>> the
>>>>>>>>>>>>> > process
>>>>>>>>>>>>> > > that caused that situation.
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> >
>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>> > > orterun: clean termination accomplished
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Message: 4
>>>>>>>>>>>>> Date: Tue, 14 Apr 2009 06:55:58 -0600
>>>>>>>>>>>>> From: Ralph Castain <rhc_at_[hidden]>
>>>>>>>>>>>>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
>>>>>>>>>>>>> To: Open MPI Users <users_at_[hidden]>
>>>>>>>>>>>>> Message-ID: <F6290ADA-A196-43F0-A853-CBCB802D8D9C_at_[hidden]>
>>>>>>>>>>>>> Content-Type: text/plain; charset="us-ascii"; Format="flowed";
>>>>>>>>>>>>> DelSp="yes"
>>>>>>>>>>>>>
>>>>>>>>>>>>> The rankfile cuts across the entire job - it isn't applied on
>>>>>>>>>>>>> an
>>>>>>>>>>>>> app_context basis. So the ranks in your rankfile must
>>>>>>>>>>>>> correspond to
>>>>>>>>>>>>> the eventual rank of each process in the cmd line.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Unfortunately, that means you have to count ranks. In your
>>>>>>>>>>>>> case, you
>>>>>>>>>>>>> only have four, so that makes life easier. Your rankfile would
>>>>>>>>>>>>> look
>>>>>>>>>>>>> something like this:
>>>>>>>>>>>>>
>>>>>>>>>>>>> rank 0=r001n001 slot=0
>>>>>>>>>>>>> rank 1=r001n002 slot=1
>>>>>>>>>>>>> rank 2=r001n001 slot=1
>>>>>>>>>>>>> rank 3=r001n002 slot=2
>>>>>>>>>>>>>
>>>>>>>>>>>>> HTH
>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Apr 14, 2009, at 12:19 AM, Geoffroy Pignot wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> > Hi,
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > I agree that my examples are not very clear. What I want to
>>>>>>>>>>>>> do is to
>>>>>>>>>>>>> > launch a multiexes application (masters-slaves) and benefit
>>>>>>>>>>>>> from the
>>>>>>>>>>>>> > processor affinity.
>>>>>>>>>>>>> > Could you show me how to convert this command , using -rf
>>>>>>>>>>>>> option
>>>>>>>>>>>>> > (whatever the affinity is)
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > mpirun -n 1 -host r001n001 master.x options1 : -n 1 -host
>>>>>>>>>>>>> r001n002
>>>>>>>>>>>>> > master.x options2 : -n 1 -host r001n001 slave.x options3 : -n
>>>>>>>>>>>>> 1 -
>>>>>>>>>>>>> > host r001n002 slave.x options4
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Thanks for your help
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Geoffroy
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Message: 2
>>>>>>>>>>>>> > Date: Sun, 12 Apr 2009 18:26:35 +0300
>>>>>>>>>>>>> > From: Lenny Verkhovsky <lenny.verkhovsky_at_[hidden]>
>>>>>>>>>>>>> > Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
>>>>>>>>>>>>> > To: Open MPI Users <users_at_[hidden]>
>>>>>>>>>>>>> > Message-ID:
>>>>>>>>>>>>> > <
>>>>>>>>>>>>> 453d39990904120826t2e1d1d33l7bb1fe3de65b5361_at_[hidden]>
>>>>>>>>>>>>> > Content-Type: text/plain; charset="iso-8859-1"
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Hi,
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > The first "crash" is OK, since your rankfile has ranks 0 and
>>>>>>>>>>>>> 1
>>>>>>>>>>>>> > defined,
>>>>>>>>>>>>> > while n=1, which means only rank 0 is present and can be
>>>>>>>>>>>>> allocated.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > NP must be >= the largest rank in rankfile.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > What exactly are you trying to do ?
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > I tried to recreate your seqv but all I got was
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > ~/work/svn/ompi/trunk/build_x86-64/install/bin/mpirun
>>>>>>>>>>>>> --hostfile
>>>>>>>>>>>>> > hostfile.0
>>>>>>>>>>>>> > -rf rankfile.0 -n 1 hostname : -rf rankfile.1 -n 1 hostname
>>>>>>>>>>>>> > [witch19:30798] mca: base: component_find: paffinity
>>>>>>>>>>>>> > "mca_paffinity_linux"
>>>>>>>>>>>>> > uses an MCA interface that is not recognized (component MCA
>>>>>>>>>>>>> v1.0.0 !=
>>>>>>>>>>>>> > supported MCA v2.0.0) -- ignored
>>>>>>>>>>>>> >
>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>> > It looks like opal_init failed for some reason; your parallel
>>>>>>>>>>>>> > process is
>>>>>>>>>>>>> > likely to abort. There are many reasons that a parallel
>>>>>>>>>>>>> process can
>>>>>>>>>>>>> > fail during opal_init; some of which are due to configuration
>>>>>>>>>>>>> or
>>>>>>>>>>>>> > environment problems. This failure appears to be an internal
>>>>>>>>>>>>> failure;
>>>>>>>>>>>>> > here's some additional information (which may only be
>>>>>>>>>>>>> relevant to an
>>>>>>>>>>>>> > Open MPI developer):
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > opal_carto_base_select failed
>>>>>>>>>>>>> > --> Returned value -13 instead of OPAL_SUCCESS
>>>>>>>>>>>>> >
>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>> > [witch19:30798] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found
>>>>>>>>>>>>> in file
>>>>>>>>>>>>> > ../../orte/runtime/orte_init.c at line 78
>>>>>>>>>>>>> > [witch19:30798] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found
>>>>>>>>>>>>> in file
>>>>>>>>>>>>> > ../../orte/orted/orted_main.c at line 344
>>>>>>>>>>>>> >
>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>> > A daemon (pid 11629) died unexpectedly with status 243 while
>>>>>>>>>>>>> > attempting
>>>>>>>>>>>>> > to launch so we are aborting.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > There may be more information reported by the environment
>>>>>>>>>>>>> (see above).
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > This may be because the daemon was unable to find all the
>>>>>>>>>>>>> needed
>>>>>>>>>>>>> > shared
>>>>>>>>>>>>> > libraries on the remote node. You may set your
>>>>>>>>>>>>> LD_LIBRARY_PATH to
>>>>>>>>>>>>> > have the
>>>>>>>>>>>>> > location of the shared libraries on the remote nodes and this
>>>>>>>>>>>>> will
>>>>>>>>>>>>> > automatically be forwarded to the remote nodes.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>> >
>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>> > mpirun noticed that the job aborted, but has no info as to
>>>>>>>>>>>>> the process
>>>>>>>>>>>>> > that caused that situation.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>> > mpirun: clean termination accomplished
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Lenny.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > On 4/10/09, Geoffroy Pignot <geopignot_at_[hidden]> wrote:
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> > > Hi ,
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> > > I am currently testing the process affinity capabilities of
>>>>>>>>>>>>> > openmpi and I
>>>>>>>>>>>>> > > would like to know if the rankfile behaviour I will
>>>>>>>>>>>>> describe below
>>>>>>>>>>>>> > is normal
>>>>>>>>>>>>> > > or not ?
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> > > cat hostfile.0
>>>>>>>>>>>>> > > r011n002 slots=4
>>>>>>>>>>>>> > > r011n003 slots=4
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> > > cat rankfile.0
>>>>>>>>>>>>> > > rank 0=r011n002 slot=0
>>>>>>>>>>>>> > > rank 1=r011n003 slot=1
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> >
>>>>>>>>>>>>>
>>>>>>>>>>>>> ##################################################################################
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> > > mpirun --hostfile hostfile.0 -rf rankfile.0 -n 2 hostname
>>>>>>>>>>>>> ### OK
>>>>>>>>>>>>> > > r011n002
>>>>>>>>>>>>> > > r011n003
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> >
>>>>>>>>>>>>>
>>>>>>>>>>>>> ##################################################################################
>>>>>>>>>>>>> > > but
>>>>>>>>>>>>> > > mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname :
>>>>>>>>>>>>> -n 1
>>>>>>>>>>>>> > hostname
>>>>>>>>>>>>> > > ### CRASHED
>>>>>>>>>>>>> > > *
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> >
>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>> > > Error, invalid rank (1) in the rankfile (rankfile.0)
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> >
>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>> > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad
>>>>>>>>>>>>> parameter in file
>>>>>>>>>>>>> > > rmaps_rank_file.c at line 404
>>>>>>>>>>>>> > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad
>>>>>>>>>>>>> parameter in file
>>>>>>>>>>>>> > > base/rmaps_base_map_job.c at line 87
>>>>>>>>>>>>> > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad
>>>>>>>>>>>>> parameter in file
>>>>>>>>>>>>> > > base/plm_base_launch_support.c at line 77
>>>>>>>>>>>>> > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad
>>>>>>>>>>>>> parameter in file
>>>>>>>>>>>>> > > plm_rsh_module.c at line 985
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> >
>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>> > > A daemon (pid unknown) died unexpectedly on signal 1 while
>>>>>>>>>>>>> > attempting to
>>>>>>>>>>>>> > > launch so we are aborting.
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> > > There may be more information reported by the environment
>>>>>>>>>>>>> (see
>>>>>>>>>>>>> > above).
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> > > This may be because the daemon was unable to find all the
>>>>>>>>>>>>> needed
>>>>>>>>>>>>> > shared
>>>>>>>>>>>>> > > libraries on the remote node. You may set your
>>>>>>>>>>>>> LD_LIBRARY_PATH to
>>>>>>>>>>>>> > have the
>>>>>>>>>>>>> > > location of the shared libraries on the remote nodes and
>>>>>>>>>>>>> this will
>>>>>>>>>>>>> > > automatically be forwarded to the remote nodes.
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> >
>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> >
>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>> > > orterun noticed that the job aborted, but has no info as to
>>>>>>>>>>>>> the
>>>>>>>>>>>>> > process
>>>>>>>>>>>>> > > that caused that situation.
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> >
>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>> > > orterun: clean termination accomplished
>>>>>>>>>>>>> > > *
>>>>>>>>>>>>> > > It seems that the rankfile option is not propagted to the
>>>>>>>>>>>>> second
>>>>>>>>>>>>> > command
>>>>>>>>>>>>> > > line ; there is no global understanding of the ranking
>>>>>>>>>>>>> inside a
>>>>>>>>>>>>> > mpirun
>>>>>>>>>>>>> > > command.
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> >
>>>>>>>>>>>>>
>>>>>>>>>>>>> ##################################################################################
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> > > Assuming that , I tried to provide a rankfile to each
>>>>>>>>>>>>> command line:
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> > > cat rankfile.0
>>>>>>>>>>>>> > > rank 0=r011n002 slot=0
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> > > cat rankfile.1
>>>>>>>>>>>>> > > rank 0=r011n003 slot=1
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> > > mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname :
>>>>>>>>>>>>> -rf
>>>>>>>>>>>>> > rankfile.1
>>>>>>>>>>>>> > > -n 1 hostname ### CRASHED
>>>>>>>>>>>>> > > *[r011n002:28778] *** Process received signal ***
>>>>>>>>>>>>> > > [r011n002:28778] Signal: Segmentation fault (11)
>>>>>>>>>>>>> > > [r011n002:28778] Signal code: Address not mapped (1)
>>>>>>>>>>>>> > > [r011n002:28778] Failing at address: 0x34
>>>>>>>>>>>>> > > [r011n002:28778] [ 0] [0xffffe600]
>>>>>>>>>>>>> > > [r011n002:28778] [ 1]
>>>>>>>>>>>>> > > /tmp/HALMPI/openmpi-1.3.1/lib/libopen-rte.so.
>>>>>>>>>>>>> > 0(orte_odls_base_default_get_add_procs_data+0x55d)
>>>>>>>>>>>>> > > [0x5557decd]
>>>>>>>>>>>>> > > [r011n002:28778] [ 2]
>>>>>>>>>>>>> > > /tmp/HALMPI/openmpi-1.3.1/lib/libopen-rte.so.
>>>>>>>>>>>>> > 0(orte_plm_base_launch_apps+0x117)
>>>>>>>>>>>>> > > [0x555842a7]
>>>>>>>>>>>>> > > [r011n002:28778] [ 3]
>>>>>>>>>>>>> /tmp/HALMPI/openmpi-1.3.1/lib/openmpi/
>>>>>>>>>>>>> > mca_plm_rsh.so
>>>>>>>>>>>>> > > [0x556098c0]
>>>>>>>>>>>>> > > [r011n002:28778] [ 4] /tmp/HALMPI/openmpi-1.3.1/bin/orterun
>>>>>>>>>>>>> > [0x804aa27]
>>>>>>>>>>>>> > > [r011n002:28778] [ 5] /tmp/HALMPI/openmpi-1.3.1/bin/orterun
>>>>>>>>>>>>> > [0x804a022]
>>>>>>>>>>>>> > > [r011n002:28778] [ 6]
>>>>>>>>>>>>> /lib/libc.so.6(__libc_start_main+0xdc)
>>>>>>>>>>>>> > [0x9f1dec]
>>>>>>>>>>>>> > > [r011n002:28778] [ 7] /tmp/HALMPI/openmpi-1.3.1/bin/orterun
>>>>>>>>>>>>> > [0x8049f71]
>>>>>>>>>>>>> > > [r011n002:28778] *** End of error message ***
>>>>>>>>>>>>> > > Segmentation fault (core dumped)*
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> > > I hope that I've found a bug because it would be very
>>>>>>>>>>>>> important
>>>>>>>>>>>>> > for me to
>>>>>>>>>>>>> > > have this kind of capabiliy .
>>>>>>>>>>>>> > > Launch a multiexe mpirun command line and be able to bind
>>>>>>>>>>>>> my exes
>>>>>>>>>>>>> > and
>>>>>>>>>>>>> > > sockets together.
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> > > Thanks in advance for your help
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> > > Geoffroy
>>>>>>>>>>>>> > _______________________________________________
>>>>>>>>>>>>> > users mailing list
>>>>>>>>>>>>> > users_at_[hidden]
>>>>>>>>>>>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>
>>>>>>>>>>>>> -------------- next part --------------
>>>>>>>>>>>>> HTML attachment scrubbed and removed
>>>>>>>>>>>>>
>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>
>>>>>>>>>>>>> End of users Digest, Vol 1202, Issue 2
>>>>>>>>>>>>> **************************************
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>>> -------------- next part --------------
>>>>>>>>>>>>> HTML attachment scrubbed and removed
>>>>>>>>>>>>>
>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>
>>>>>>>>>>>>> End of users Digest, Vol 1218, Issue 2
>>>>>>>>>>>>> **************************************
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> users mailing list
>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> users mailing list
>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> users_at_[hidden]
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> users_at_[hidden]
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> users_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> users_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>