Having gone around in circles on hostfile-related issues for over five years now, I honestly have little motivation to re-open the entire discussion again. It doesn't seem to be that daunting a requirement for those who are using it, so I'm inclined to just leave well enough alone.
Ralph Castain wrote:Agreed.The two files have a slightly different format
and completely different meaning.Somewhat agreed. They're both related to mapping processes onto a cluster.Agreed.
The hostfile specifies how many slots are on a node. The rankfile specifies a rank and what node/slot it is to be mapped onto.This is the main part I'm concerned about. E.g.,
Rankfiles can use relative node indexing and refer to nodes received from a resource manager - i.e., without any hostfile.
% cat rankfile
rank 0=node0 slot=0
rank 1=node1 slot=0
% mpirun -np 2 -rf rankfile ./a.out
--------------------------------------------------------------------------
Rankfile claimed host node1 that was not allocated or oversubscribed it's slots:
--------------------------------------------------------------------------
[node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad parameter in file rmaps_rank_file.c at line 107
[node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad parameter in file base/rmaps_base_map_job.c at line 86
[node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad parameter in file base/plm_base_launch_support.c at line 86
[node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad parameter in file plm_rsh_module.c at line 1016
% mpirun -np 2 -host node0,node1 -rf rankfile ./a.out
0 on node0
1 on node1
done
It seems to me that the rankfile has sufficient information to express what I want it to do. But mpirun won't accept this. To fix this, I have to, e.g., supply/maintain/specify redundant information in a hostfile or host list.Right. And my issue is that I'm forced to use both when I only want rankfile functionality.
So the files are intentionally quite different. Trying to combine them would be rather ugly.
On Thu, Jun 18, 2009 at 1:52 PM, Eugene Loh <Eugene.Loh@sun.com> wrote:
In order to use "mpirun --rankfile", I also need to specify hosts/hostlist. But that information is redundant with what I provide in the rankfile. So, from a user's point of view, this strikes me as broken. Yes? Should I file a ticket, or am I missing something here about this functionality?
_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel