MPI Benchmarking on Google Compute Platform Revisit


Three years ago we visited the Google’s IaaS service – Google Compute Engine (GCE) for its networking performance and Ryan posted the results in his blog post. Back then, the conclusion was that GCE instances were more suitable for a typical workload of hosting web services but there was still performance tuning space for HPC applications. Recently, we revisited the GCE’s instances with their latest offering again.

Benchmark Tools
To make the results somewhat comparable with the old ones, we’re still using the OSU Micro Benchmarks but with the latest version 5.3.2. And among all the benchmarking tools being offered, we pick two most critical ones: osu_latency for latency test and osu_bibw for bidirectional bandwidth test.

Test Environment
Operating System: Debian GNU/Linux 8 (jessie)

MPI Flavor: MPICH3

Test Instances
Since we are testing the interconnection performance between VM instances, we want to make sure the VM instances we launched are actually sitting on different physical hosts so the traffic actually goes through the underlying network but not the host machine’s memory.

So we picked the biggest instance of each series:

n1-standard-32, n1-highmem-32 and n-highcpu-32

Test Results
For latency (in microseconds):

Instance Type Trial #1 Trial #2 Trial #3 Average
n1-standard-32 45.68 47.03 48.46 47.06
n1-highmem-32 43.17 43.08 36.87 41.04
n1-highcpu-32 47.11 48.51 48.17 47.93

(size: 0-bytes)

For bidirectional bandwidth: (MB/s)

Instance Type Trial #1 Trial #2 Trial #3 Average
n1-standard-32 808.28 864.91 872.36 848.52
n1-highmem-32 1096.35 1077.33 1055.2 1076.29
n1-highcpu-32 847.68 791.16 900.32 846.39

(size: 1,048,576-bytes)

Summary of Results
For the network latency, we can see the average is around 40 ~ 45 microseconds, which is 4x faster than the previous result – around 180 microseconds. And the new latency is fairly consistent among other smaller instance types.

For bandwidth, we don’t have a previous result to compare to but among all the GCE instance types, we found n1-highmem-32 has the best performance which can be as high as 1070 MB/s. This result aligns with GCE’s official document

This article was written by Irwen Song.