Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI_WAITALL error when running between two virtual machines
From: Hoot Thompson (hoot_at_[hidden])
Date: 2011-08-16 09:40:31


I will try NetPIPE or similar

On 8/16/11 9:01 AM, "Jeff Squyres" <jsquyres_at_[hidden]> wrote:

> Are you able to run other TCP-based applications between the two VM's, such as
> the TCP version of NetPIPE?
>
>
> On Aug 15, 2011, at 10:57 PM, Hoot Thompson wrote:
>
>> I'm trying to run openmpi between two 11.04 virtual machines, each on each
>> own physical node. Each VM has three network interfaces, one is the base
>> (eth0) which is in the same subnet as the hypervisor bridge and the other two
>> are Intel SR-IOV VFs. I can ping across all the interfaces. Bottom line is
>> that I'm trying to run the OSU benchmarks between the two VMs and I get the
>> following error. It's also shown that I can run the two processes on the same
>> VM and it works fine.
>>
>>
>> hoot_at_u1-1104:~$ mpirun -host 10.10.10.1,10.10.10.2 osu_bw
>> [u2-1104:1946] *** An error occurred in MPI_Waitall
>> [u2-1104:1946] *** on communicator MPI_COMM_WORLD
>> [u2-1104:1946] *** MPI_ERR_TRUNCATE: message truncated
>> [u2-1104:1946] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
>> # OSU MPI Bandwidth Test v3.3
>> # Size Bandwidth (MB/s)
>> --------------------------------------------------------------------------
>> mpirun has exited due to process rank 1 with PID 1946 on
>> node 10.10.10.2 exiting without calling "finalize". This may
>> have caused other processes in the application to be
>> terminated by signals sent by mpirun (as reported here).
>> --------------------------------------------------------------------------
>> hoot_at_u1-1104:~$ mpirun -host 10.10.10.1,10.10.10.1 osu_bw
>> # OSU MPI Bandwidth Test v3.3
>> # Size Bandwidth (MB/s)
>> 1 2.87
>> 2 5.88
>> 4 11.21
>> 8 22.53
>> 16 46.84
>> 32 91.84
>> 64 176.93
>> 128 278.83
>> 256 537.92
>> 512 888.02
>> 1024 1602.69
>> 2048 2757.05
>> 4096 2510.99
>> 8192 3504.59
>> 16384 4487.80
>> 32768 4097.11
>> 65536 4100.36
>> 131072 4058.36
>> 262144 4090.21
>> 524288 7335.43
>> 1048576 7523.41
>> 2097152 7165.27
>> 4194304 7548.46
>> hoot_at_u1-1104:~$ mpirun -host 10.10.10.2,10.10.10.2 osu_bw
>> # OSU MPI Bandwidth Test v3.3
>> # Size Bandwidth (MB/s)
>> 1 4.54
>> 2 9.20
>> 4 18.70
>> 8 37.40
>> 16 74.68
>> 32 144.03
>> 64 262.93
>> 128 523.46
>> 256 977.52
>> 512 1732.71
>> 1024 2981.65
>> 2048 4853.07
>> 4096 5493.16
>> 8192 7357.55
>> 16384 9300.16
>> 32768 4879.94
>> 65536 4596.26
>> 131072 4471.06
>> 262144 4559.58
>> 524288 4501.23
>> 1048576 4541.63
>> 2097152 4504.08
>> 4194304 4493.76
>> hoot_at_u1-1104:~$ mpirun -host 10.10.10.2,10.10.10.2 osu_bw
>> # OSU MPI Bandwidth Test v3.3
>> # Size Bandwidth (MB/s)
>> 1 4.50
>> 2 9.14
>> 4 18.51
>> 8 36.47
>> 16 74.05
>> 32 142.71
>> 64 256.99
>> 128 516.84
>> 256 972.40
>> 512 1709.23
>> 1024 2937.36
>> 2048 4903.72
>> 4096 5550.57
>> 8192 7297.00
>> 16384 8908.34
>> 32768 8640.99
>> 65536 8424.97
>> 131072 8059.00
>> 262144 4541.50
>> 524288 4560.11
>> 1048576 4554.80
>> 2097152 4527.91
>> 4194304 4493.71
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>