From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2006-08-29 18:57:13


On 8/29/06 1:55 PM, "Josh Hursey" <jjhursey_at_[hidden]> wrote:

> So I'm having trouble getting tests to complete without timing out in
> MTT. It seems that the tests timeout and hang in MTT, but complete
> normally outside of MTT.

Does this apply to *all* tests, or only some of the tests (like allgather)?
 
> Here are some details:
> Build:
> Open MPI Trunk (1.3a1r11481)
>
> Tests:
> Trivial
> ibm
>
> BTL:
> tcp
> self
>
> Nodes/processes:
> 16 nodes (32 processors) on the Odin Cluster at IU
>
>
> In MTT all of the tests timeout:
> <mtt snip>
> Running command: mpirun -mca btl tcp,self -np 32 --prefix
> /san/homedirs/mpiteam/tmp/mtt-scratch/installs/ompi-nightly-trunk/
> odin_g
> cc_warnings/1.3a1r11481/install collective/allgather
> Timeout: 1 - 1156872348 (vs. now: 1156872028)
> Past timeout! 1156872348 < 1156872349
> Past timeout! 1156872348 < 1156872349
[snipped]
> &or: returning 0
> String now: 0
> *** WARNING: Test: allgather, np=32, variant=1: TIMED OUT (failed)
> </mtt snip>
>
> Outside of MTT using the same build the test runs and completes
> normally:
> $ cd ~/tmp/mtt-scratch/installs/ompi-nightly-trunk/
> odin_gcc_warnings/1.3a1r11481/tests/ibm/ibm/
> $ mpirun -mca btl tcp,self -np 32 --prefix /san/homedirs/mpiteam/
> tmp/mtt-scratch/installs/ompi-nightly-trunk/odin_gcc_warnings/
> 1.3a1r11481/install collective/allgather

Where is mpirun in your path?

MTT actually drops sourceable files in the top-level install dir (i.e., the
1.3a1r11481) that you can source in your shell and set the
PATH/LD_LIBRARY_PATH for that install. Can you source it and try to run
again?

How long does it take to run manually -- just a few seconds, or a long time
(that could potentially timeout)?

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems