Three years ago we visited the Google’s IaaS service – Google Compute Engine (GCE) for its networking performance and Ryan posted the results in his blog post. Back then, the conclusion was that GCE instances were more suitable for a typical workload of hosting web services but there was still performance tuning space for HPC applications. Recently, we revisited the GCE’s instances with their latest offering again.

Benchmark Tools
To make the results somewhat comparable with the old ones, we’re still using the OSU Micro Benchmarks but with the latest version 5.3.2. And among all the benchmarking tools being offered, we pick two most critical ones: osu_latency for latency test and osu_bibw for bidirectional bandwidth test.

Test Environment
Operating System: Debian GNU/Linux 8 (jessie)

MPI Flavor: MPICH3

Test Instances
Since we are testing the interconnection performance between VM instances, we want to make sure the VM instances we launched are actually sitting on different physical hosts so the traffic actually goes through the underlying network but not the host machine’s memory.

So we picked the biggest instance of each series:

n1-standard-32, n1-highmem-32 and n-highcpu-32

Test Results
For latency (in microseconds):

Instance Type Trial #1 Trial #2 Trial #3 Average
n1-standard-32 45.68 47.03 48.46 47.06
n1-highmem-32 43.17 43.08 36.87 41.04
n1-highcpu-32 47.11 48.51 48.17 47.93

(size: 0-bytes)

For bidirectional bandwidth: (MB/s)

Instance Type Trial #1 Trial #2 Trial #3 Average
n1-standard-32 808.28 864.91 872.36 848.52
n1-highmem-32 1096.35 1077.33 1055.2 1076.29
n1-highcpu-32 847.68 791.16 900.32 846.39

(size: 1,048,576-bytes)

Summary of Results
For the network latency, we can see the average is around 40 ~ 45 microseconds, which is 4x faster than the previous result – around 180 microseconds. And the new latency is fairly consistent among other smaller instance types.

For bandwidth, we don’t have a previous result to compare to but among all the GCE instance types, we found n1-highmem-32 has the best performance which can be as high as 1070 MB/s. This result aligns with GCE’s official document https://cloud.google.com/compute/docs/networks-and-firewalls#egress_throughput_caps.

This article was written by Irwen Song.



Google released TensorFlow (www.tensorflow.org), an open source machine learning library, last November which attracted huge attention in the field of AI. TensorFlow is also known as “Machine Learning for Everyone” since it is relatively easy to hands on even for those who don’t have much experience in machine learning.  Today we are excited to announce that TensorFlow is now available on Rescale’s platform.  This means you can learn to create and train your machine learning models using TensorFlow with just a web browser.  I’ll walk you through how in this blog post.

Let’s Start With a Simple Case

We’ll start from the first official TensorFlow tutorial: MNIST for ML beginners.  It introduces what the MNIST is and how to model and train it with softmax regression, a basic machine learning method, in TensorFlow.  Here we’ll be focusing on how to set the job up and run it on the Rescale platform.

You can create the python script in a local editor mnist_for_beginners.py:

The script above is just putting all the snippets together.  Now, we need to run that on Rescale’s GPU hardware.

First, you need to create an account, if you still haven’t, click here to create one.

If you want to skip the hassle of setting up the job step-by-step, you can also click here to view the tutorial job and clone it into your own account.

After account registration, login to Rescale and click “+ New Job” button on the top left to create a new job.

Screen Shot 2016-04-15 at 1.28.08 PM

Click “upload from this computer” and upload your python script to Rescale.

Screen Shot 2016-04-15 at 1.29.47 PM

Click “Next” to go to the Software Settings page and choose TensorFlow from the software list.  Currently 0.71 is the only supported version on Rescale, so choose this version and type “python ./mnist_for_beginners.py” in the Command field.  Select “Next” to go to the Hardware Settings page.

Screen Shot 2016-04-15 at 1.39.15 PM

In Hardware Settings, choose core type Jade and select 4 cores.  This job is not very compute intensive, so we choose the minimum valid number of cores.  We can skip the post-processing for this example, and click “Submit” on the Review page to submit the job.

Screen Shot 2016-04-15 at 1.39.46 PM

Screen Shot 2016-04-15 at 2.01.08 PM

It will take 4 – 5 minutes to launch the server and 1 minute to run the job.  When the job is running, you can use Rescale’s  live tailing feature to monitor the files in the working directory.

After the job is finished, you can view the files from the results page.  Let’s take a look at process_output.log which is the output from that python script we uploaded.  At the third line from the bottom, we can verify that the accuracy is 91.45%.

Screen Shot 2016-04-15 at 2.06.17 PM

A More Advanced Model

In the second TensorFlow tutorial, a more advanced model is built with a multilayer convolutional network to increase the accuracy to 99.32%.

To run this advanced model on Rescale, you can simply repeat the process of the first one and replace the python script with the new model from the tutorial.  You can also view and clone an existing job from here.

Single GPU vs. Multiple GPU Performance Speedup Test

If you have more than one GPU on your machine, TensorFlow can utilize all of them for better performance.  In this section, we are going to do a performance benchmark on a single K520 GPU machine vs. a 4 K520 GPUs machine and test performance speedups.

The CIFAR10 Convolutional Neural Network example is used as our benchmarking job.  From the result below we can see that with 4 times the number of GPUs, the examples being processed per second are only 2.37 times the single GPU performance.


Work Ahead

TensorFlow has just released a new distributed version (v0.8) on 4/13/2016 which can distribute the workload across the GPUs on multiple machines.  It would be very interesting to see its performance under a multi-node-multi-GPU cluster.  Before that, we’ll make the process to launch a multi-node-multi-GPU cluster with TensorFlow support on Rescale as simple as possible.


Import this job to your account


This article was written by Irwen Song.


When we start prototyping our first web application with Django, we always tend to create one Django app and put all the models into that app.  The reason is simple – there are not that many models and the business logic is simple.  But with the growth of the business, more models and business logic get added–one day we might find our application in an awkward position: it’s harder and harder to locate a bug and takes longer and longer to add new features, even if they are simple.  In this blog we’ll talk about how to use different Django apps to reorganize models and business logic so that it scales with the growth of our business.  We will also illustrate the flow of the change with a simple case study.

Prototyping stage – a simple case study

We start from a simple application called “Weblog” which allows the users to create and publish blogs. We create an app called weblog. And the models are as follows.

In weblog/models.py:

 Now assume the rest of the application is completed based on the models above. The users can now login, create and publish their blogs using our application.

Evolving approach I – keep adding new business logic into the same app

Say we have a new requirement. In order to attract more authors to create content using our application, we’ll pay the authors based on view counts. The price is $10 for every 1000 views. And the payout is sent once a month.

Since the new requirement sounds pretty simple, a regular approach that puts the new models and logic into the existing app is good enough. First, we add a new model in the “weblog” app like below:

In weblog/models.py:

For each new view of a blog, we increase the count based on the month or create a new MonthlyViewCount record if this is the first view in this month. The code looks like:

At the end of each month, we run a cron task which aggregates all the view counts for each author and sends the payment to them accordingly. Here’s the pseudo code:

In weblog/tasks.py:

The approach above seems fine for handling a simple change request like this. But there will always be new requirements coming in.

Evolving approach II – organize the business logics into different Django apps

Now we have a new requirement. To encourage the authors to create content in certain categories, the business team wants to adjust the award strategy into a category-based one. Each category will have a different award price. Say the award price table looks like following:

Price (per 1000 views)
Tech $15
Sports $10
Fashion $5

This new requirement also looks simple enough that we can just create a new model in the existing app to store the category base price and update the cron task to look for the category-based price when aggregating the total. The whole change will take less than 30 minutes and everything is good to go.

But there are two major problems with cranking more and more new models and business logic into the main ‘weblog’ app.

  1. The main app becomes responsible for business logic of different domain knowledge. The class files become bigger and unmaintainable.
  2. The agility is compromised since it is harder to to debug an issue and adding new features is slower.

In Django, we can use different apps to organize the business logic of different domains and Signals to handle the communication between apps. In our example, we’ll try to move all the billing-related models and methods into the a new app called ‘billing’.

First we move all the billing-related models into the new billing app.

In billing/models.py:

Now for each new view of any blog article, the billing app needs to be informed so that it can record them accordingly. To do so, we can define a signal in the “weblog” app and create a single handler in the “billing” app to process the signal received.

We move the Blog.increase_view_count()into billing/receivers.py as a signal handler:

Then a new signal is created in weblog/signals.py:

And we also need to inject a signal-sending snippet in one of view methods in weblog/views.py:

Finally we can move the billing-related cron task send_viewcount_payment_to_authors from weblog/tasks.py to billing/tasks.py and add new logic to handle the new category-based pricing.

Although compared with the regular approach, which simply puts everything new into the main app, the approach above needs more code changes and refactoring, it does have several merits that make it worthwhile.

  1. The business logic from a specific domain is segregated from the other domains, which makes the code base easier to maintain.
  2. If an issue occurs during the runtime, the cause can be promptly located in the scope of an app based on the symptom. This shortens the debugging time.
  3. When a new developer onboards, they can start working on a single app first, which will moderate the learning curve.
  4. If we decide to deprecate the whole set of business logics in a specific domain (e.g. all the billing features are no longer needed), we can simply remove that app and everything else should continue to run normally.


A lot of startups are using Django to prototype their product or service. Additionally, Django can handle the growth of their business pretty well.   An important aspect is to rethink and reorganize the business logic into different apps from time to time and keep the responsibility of each app as simple as possible.

This article was written by Irwen Song.


The Budget feature
When you run simulations on Rescale, you may want to control your budgets and spending. For example, an individual user might want to set a budget cap on their jobs, or a company administrator may want to control the total spending for the whole company or a specific project.

Set budgets for yourself
As an individual user, you can set your own budget on your Settings page (click on your email in the top right corner of your Rescale account and select Settings). The currency is based on your account currency setting.


Set budgets for your company
As a company administrator, you can also specify your company’s budget on the Settings page of your company administration portal (if you are a company admin, you can click on your email in the top right corner of your Rescale account and select {Company} Administration). Once the budget is set, the remaining budget will also be shown at bottom right.


Set budgets for your a company project
You can also set the budget for certain project on the Project settings page if you’re a company administrator.


To enable the project selection to appear during the job setup page for a user, you need to:
1. Add that user into a group on the Group settings page


2. Attach the group to the project in Project settings page


After that, that user will be able to select project in the job setup page. (Note: Company administrator is able to select all company projects by default without any setup.)


How budgets work
Once the budget is set, all types of cost(i.e. hardware, software, data transfer, storage, and license proxy) of the budget level (i.e. user, company, or project) are monitored to ensure the budget is not exceeded. And currently there is no timeframe for it, which means the cost from day one till now will all be included in the calculation. When you’re running low on budget, new jobs will be queued and the running jobs will be terminated.

Screen Shot 2015-11-13 at 11.40.43 AM

When a job is queued because of a low budget, you can adjust your own budget or contact your company administrator if the project or company level budget needs to be adjusted. The queued jobs will start once the budget is sufficient.

Screen Shot 2015-11-13 at 11.45.06 AM

It should be noted that running jobs will be terminated if they exceed the budget.

Screen Shot 2015-11-13 at 12.03.02 PM

Budgets were introduced to help control your simulation cost. A budget can be set at the user, company, and/or project level. Running low on budget may result in your new jobs getting queued or running jobs getting terminated, so choose your budget wisely.

For questions about setting budgets or for more information, please contact Rescale at info@rescale.com.

This article was written by Irwen Song.