Open MPI logo

MTT Users Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Josh Hursey (jjhursey_at_[hidden])
Date: 2006-08-29 13:55:58


Hey all,

So I'm having trouble getting tests to complete without timing out in
MTT. It seems that the tests timeout and hang in MTT, but complete
normally outside of MTT.

Here are some details:
Build:
   Open MPI Trunk (1.3a1r11481)

Tests:
   Trivial
   ibm

BTL:
   tcp
   self

Nodes/processes:
   16 nodes (32 processors) on the Odin Cluster at IU

In MTT all of the tests timeout:
<mtt snip>
Running command: mpirun -mca btl tcp,self -np 32 --prefix
    /san/homedirs/mpiteam/tmp/mtt-scratch/installs/ompi-nightly-trunk/
odin_g
    cc_warnings/1.3a1r11481/install collective/allgather
Timeout: 1 - 1156872348 (vs. now: 1156872028)
Past timeout! 1156872348 < 1156872349
Past timeout! 1156872348 < 1156872349
Command complete, exit status: 72057594037927935
Evaluating: &or(&eq(&test_exit_status(), 0), &eq(&test_exit_status(),
77))
Got name: test_exit_status
Got args:
_do: $ret = MTT::Values::Functions::test_exit_status()
&test_exit_status returning: 72057594037927935
String now: &or(&eq(72057594037927935, 0), &eq(&test_exit_status(), 77))
Got name: eq
Got args: 72057594037927935, 0
_do: $ret = MTT::Values::Functions::eq(72057594037927935, 0)
&eq got: 72057594037927935 0
&eq: returning 0
String now: &or(0, &eq(&test_exit_status(), 77))
Got name: test_exit_status
Got args:
_do: $ret = MTT::Values::Functions::test_exit_status()
&test_exit_status returning: 72057594037927935
String now: &or(0, &eq(72057594037927935, 77))
Got name: eq
Got args: 72057594037927935, 77
_do: $ret = MTT::Values::Functions::eq(72057594037927935, 77)
&eq got: 72057594037927935 77
&eq: returning 0
String now: &or(0, 0)
Got name: or
Got args: 0, 0
_do: $ret = MTT::Values::Functions::or(0, 0)
&or got: 0 0
&or: returning 0
String now: 0
*** WARNING: Test: allgather, np=32, variant=1: TIMED OUT (failed)
</mtt snip>

Outside of MTT using the same build the test runs and completes
normally:
  $ cd ~/tmp/mtt-scratch/installs/ompi-nightly-trunk/
odin_gcc_warnings/1.3a1r11481/tests/ibm/ibm/
  $ mpirun -mca btl tcp,self -np 32 --prefix /san/homedirs/mpiteam/
tmp/mtt-scratch/installs/ompi-nightly-trunk/odin_gcc_warnings/
1.3a1r11481/install collective/allgather
  $

Any thoughts on why this might be happening in MTT but not outside of
it?

Cheers,
Josh