Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] MPI_WAITALL error when running between two virtual machines
From: Hoot Thompson (hoot_at_[hidden])
Date: 2011-08-15 22:57:53


I'm trying to run openmpi between two 11.04 virtual machines, each on
each own physical node. Each VM has three network interfaces, one is the
base (eth0) which is in the same subnet as the hypervisor bridge and the
other two are Intel SR-IOV VFs. I can ping across all the interfaces.
Bottom line is that I'm trying to run the OSU benchmarks between the two
VMs and I get the following error. It's also shown that I can run the
two processes on the same VM and it works fine.

hoot_at_u1-1104:~$ mpirun -host 10.10.10.1,10.10.10.2 osu_bw
[u2-1104:1946] *** An error occurred in MPI_Waitall
[u2-1104:1946] *** on communicator MPI_COMM_WORLD
[u2-1104:1946] *** MPI_ERR_TRUNCATE: message truncated
[u2-1104:1946] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
# OSU MPI Bandwidth Test v3.3
# Size Bandwidth (MB/s)
--------------------------------------------------------------------------
mpirun has exited due to process rank 1 with PID 1946 on
node 10.10.10.2 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
hoot_at_u1-1104:~$ mpirun -host 10.10.10.1,10.10.10.1 osu_bw
# OSU MPI Bandwidth Test v3.3
# Size Bandwidth (MB/s)
1 2.87
2 5.88
4 11.21
8 22.53
16 46.84
32 91.84
64 176.93
128 278.83
256 537.92
512 888.02
1024 1602.69
2048 2757.05
4096 2510.99
8192 3504.59
16384 4487.80
32768 4097.11
65536 4100.36
131072 4058.36
262144 4090.21
524288 7335.43
1048576 7523.41
2097152 7165.27
4194304 7548.46
hoot_at_u1-1104:~$ mpirun -host 10.10.10.2,10.10.10.2 osu_bw
# OSU MPI Bandwidth Test v3.3
# Size Bandwidth (MB/s)
1 4.54
2 9.20
4 18.70
8 37.40
16 74.68
32 144.03
64 262.93
128 523.46
256 977.52
512 1732.71
1024 2981.65
2048 4853.07
4096 5493.16
8192 7357.55
16384 9300.16
32768 4879.94
65536 4596.26
131072 4471.06
262144 4559.58
524288 4501.23
1048576 4541.63
2097152 4504.08
4194304 4493.76
hoot_at_u1-1104:~$ mpirun -host 10.10.10.2,10.10.10.2 osu_bw
# OSU MPI Bandwidth Test v3.3
# Size Bandwidth (MB/s)
1 4.50
2 9.14
4 18.51
8 36.47
16 74.05
32 142.71
64 256.99
128 516.84
256 972.40
512 1709.23
1024 2937.36
2048 4903.72
4096 5550.57
8192 7297.00
16384 8908.34
32768 8640.99
65536 8424.97
131072 8059.00
262144 4541.50
524288 4560.11
1048576 4554.80
2097152 4527.91
4194304 4493.71