On Wed, 6 Feb 2008, Christian Bell wrote:
> Hi Daniel --
>
> PSM should determine your node setup and enable shared contexts
> accordingly, but it looks like something isn't working right. You
> can apply the patch I've attached to this e-mail and things should
> work again.
Alas, it doesn't compile (patch was applied to OpenMPI 1.2.5):
mtl_psm.c(109): error: struct "orte_proc_info_t" has no field "num_local_procs"
if (orte_process_info.num_local_procs > 0) {
^
mtl_psm.c(111): error: struct "orte_proc_info_t" has no field "num_local_procs"
snprintf(buf, sizeof buf - 1, "%d", orte_process_info.num_local_procs);
^
mtl_psm.c(113): error: struct "orte_proc_info_t" has no field "local_rank"
snprintf(buf, sizeof buf - 1, "%d", orte_process_info.local_rank);
^
compilation aborted for mtl_psm.c (code 2)
> However, it would be useful to identify what's going wrong. Can
> you compile a hello world program and run it with the machinefile
> you're trying to use. Send me the output from:
>
> mpirun -machinefile .... env PSM_TRACEMASK=0x101 ./hello_world
>
> I understand your failure mode only if somehow the 8-core node is
> detected to be a 4-core node. The output should tell us this.
Attached. It seems it does try to enable context sharing but for some
reason /dev/ipath still returns a busy code.
Daniël
|