From: Josh Hursey (jjhursey_at_[hidden])
Date: 2006-08-29 20:57:29


On Aug 29, 2006, at 6:57 PM, Jeff Squyres wrote:

> On 8/29/06 1:55 PM, "Josh Hursey" <jjhursey_at_[hidden]> wrote:
>
>> So I'm having trouble getting tests to complete without timing out in
>> MTT. It seems that the tests timeout and hang in MTT, but complete
>> normally outside of MTT.
>
> Does this apply to *all* tests, or only some of the tests (like
> allgather)?

All of the tests: Trivial and ibm. They all timeout :(

>
>> Here are some details:
>> Build:
>> Open MPI Trunk (1.3a1r11481)
>>
>> Tests:
>> Trivial
>> ibm
>>
>> BTL:
>> tcp
>> self
>>
>> Nodes/processes:
>> 16 nodes (32 processors) on the Odin Cluster at IU
>>
>>
>> In MTT all of the tests timeout:
>> <mtt snip>
>> Running command: mpirun -mca btl tcp,self -np 32 --prefix
>> /san/homedirs/mpiteam/tmp/mtt-scratch/installs/ompi-nightly-
>> trunk/
>> odin_g
>> cc_warnings/1.3a1r11481/install collective/allgather
>> Timeout: 1 - 1156872348 (vs. now: 1156872028)
>> Past timeout! 1156872348 < 1156872349
>> Past timeout! 1156872348 < 1156872349
> [snipped]
>> &or: returning 0
>> String now: 0
>> *** WARNING: Test: allgather, np=32, variant=1: TIMED OUT (failed)
>> </mtt snip>
>>
>> Outside of MTT using the same build the test runs and completes
>> normally:
>> $ cd ~/tmp/mtt-scratch/installs/ompi-nightly-trunk/
>> odin_gcc_warnings/1.3a1r11481/tests/ibm/ibm/
>> $ mpirun -mca btl tcp,self -np 32 --prefix /san/homedirs/mpiteam/
>> tmp/mtt-scratch/installs/ompi-nightly-trunk/odin_gcc_warnings/
>> 1.3a1r11481/install collective/allgather
>
> Where is mpirun in your path?
>
> MTT actually drops sourceable files in the top-level install dir
> (i.e., the
> 1.3a1r11481) that you can source in your shell and set the
> PATH/LD_LIBRARY_PATH for that install. Can you source it and try
> to run
> again?

Yep I exported the PATH/LD_LIBRARY_PATH to the one cited in the --
prefix argument before running manually.

>
> How long does it take to run manually -- just a few seconds, or a
> long time
> (that could potentially timeout)?

Just a few seconds (say 5 or so).

>
> --
> Jeff Squyres
> Server Virtualization Business Unit
> Cisco Systems

----
Josh Hursey
jjhursey_at_[hidden]
http://www.open-mpi.org/