Background and Challenge
Spike Aerospace is a fast-growing engineering firm developing the world’s first supersonic business jet with Quiet Supersonic Flight technology.  We run hundreds of complex CFD simulations to understand the aerodynamic performance of our aircraft. CFD simulations are computationally intensive and traditionally require significant investment in on-premise infrastructure, tens of thousands of hours of runtime, expensive software license fees, and a team of hardware experts to optimize the HPC.  The fixed capital costs for such compute- and time-intensive simulations would have been prohibitive under the traditional paradigm.  To quickly get up to speed without investing in significant infrastructure, we partnered with Rescale and migrated 100% of our CFD process to the cloud.  We have realized massive cost and time savings as a result.  

The Rescale Solution
Spike Aerospace is a lean, agile, and innovative organization.  Rescale’s platform was closely aligned with our needs and gave us a cost-effective, turn-key, and secure way to meet our demanding HPC needs.

On the Rescale platform, we used the natively-integrated STAR-CCM+ software and Rescale’s Nickel hardware configuration on 64 cores to analyze the aerodynamics of Spike’s CD1 aircraft for various angles of attacks at cruising altitude conditions.  Our model had 32 million cells with a domain size of 1,000 million.  The entire simulation process was conducted on Rescale’s cloud, including CAD preparation and CFD domain creation, surface and volume grid preparation, setup and execution of HPC simulations, and post-processing.


Results and Benefits
Rescale was the perfect solution for our HPC needs.  Specifically, it enabled us to:

  • Reduce capital expenditure on fixed infrastructure costs and expensive software licenses.  With Rescale’s pay-as-you-go scheme for server hardware and software licenses, we paid only for what we used.
  • Accelerate our product time to market with instant, scalable access to HPC resources.  Rescale’s turn-key cloud solution enabled us to get up and running in weeks rather than months.  Additionally, jobs never waited in queues or schedulers for HPC resources, and job runtimes dropped dramatically on Rescale’s scalable hardware.
  • Collaborate in real-time with team members around the world.  Rescale’s cloud-based platform allowed our team to view and share all simulation files and results in real-time.

In addition, we trusted that our data was secure with Rescale, which complies with the strictest industry standards for security.  Measures such as end-to-end data encryption and tight administrative controls ensured the highest level of job security.  Rescale also provided excellent product support at every step of the process.  Their engineers worked closely with Spike’s engineers to set up our CFD simulations on the cloud, and the support team was very prompt in answering our queries and resolving all our issues.

Spike Aerospace
Spike Aerospace is leading a global collaboration to develop the world’s first supersonic business jet, the Spike S-512 Supersonic Jet.  This advanced next-generation aircraft, with Quiet Supersonic Flight technology, will save travelers up to 50% flight time.  A world-class team of senior engineers with backgrounds from leading aerospace companies are developing the high-level conceptual design of the supersonic aircraft.  Top aerospace firms, like Maya, Siemens, Aernnova and Quartus Engineering are providing their expertise in aircraft design, engineering, manufacturing and testing. Flying Faster, Do More. http://www.spikeaerospace.com/

Rescale is the world’s leading cloud platform provider of simulation software and high performance computing (HPC) solutions.  Rescale’s platform solutions are deployed securely and seamlessly to enterprises via a web-based application environment powered by preeminent simulation software providers and backed by the largest commercially available HPC infrastructure.  Headquartered in San Francisco, CA, Rescale’s customers include global Fortune 500 companies in the aerospace, automotive, life sciences, marine, consumer products, and energy sectors.  For more information on Rescale products and services, visit www.rescale.com.

This article was written by Spike Aerospace.


I grew up playing Magic: the Gathering.  As a kid I noticed something interesting about the card names – there were no generic names.  There were no cards named “Zombie” or “Elf” or “Wizard”.  There were cards named “Fugitive Wizard”, “Llanowar Elves”, “Gravebane Zombie”, and even “Storm Crow” but no “Crow”.  Modern card names are even more specific and evocative; witness “Crow of Dark Tidings” and “Flameheart Werewolf”.  Why?  Because the designers need to leave space open for new cards.  If there were a card named “Zombie”, that’s it.  That card shows what a zombie is.  If you want to make another zombie card, it will live in the shadow of the original “Zombie”.

This has applications in software engineering.  The names we choose for classes frame how we’ll think about them, and what sort of responsibilities we’ll assign to them.  If you have a class named User, then it makes sense to put things related to the concept of “a user” on that class.  That’s a problem though.  It makes sense to put anything related to that concept on the User class.  You’ll end up with login information, billing preferences, email settings, and permissions.  It’s long been known that large classes are problems.  They’re more difficult to read because they have more logic in the same place, they’re more difficult to change since the logic is more likely to be intertangled, and those make them more likely to be buggy.

Class names set the stage for the logic the class develops over time.  We have to remember that we don’t just write code once and then it’s done.  Code is continually evolving, continually being changed to meet new needs.  At each stage, developers will ask themselves, “where does this logic make sense?”   As Stephen Wolfram noted, “the names of functions … directly determine how people will think about a function”.  Developers will look to class names as one sign for where logic belongs.  They’ll look to the concept embodied by the class as another.  If the name and concept are broad, developers will put lots of pieces of logic in the class.

Another point to make about generic class names is that they aren’t descriptive.  If you open up a class named User, you don’t have an immediate idea of the data it might contain.  There’s a lot of things it might contain.  You’ll have to read over it to find out, and remember for next time.  That imposes a lot of cognitive burden on working developers.  They have to keep in mind the details of large classes or else read over them to be sure.  If the name were something like LoginCredentials, then it’s pretty obvious what it will contain.  The name guides the reader by bounding the role of the class.

We should look for our code to provide a rich set of clues to make itself understood.  Names are an important piece of the puzzle.  Taking a cue from Magic: the Gathering, if we try to rename User with a more evocative name, we’ll quickly realize we need to break it up.  We’ll probably end up with smaller, more focused classes, which is also a boon.

This article was written by Alex Kudlick.


Today we will discuss how to make use of multiple GPUs to train a single neural network using the Torch machine learning library. This is the first in a series of articles on techniques for scaling up deep neural network (DNN) training workloads to use multiple GPUs and multiple nodes.

In this series, we will be focusing on parallelizing the training of a single network. For more about the embarassingly parallel problem of training multiple networks efficiently to optimize configuration parameters, see this earlier post on hyper-parameter optimization.

About Torch
Torch is a lightweight, flexible tensor library built on top of the Lua programming language. Torch is popular with machine learning researchers, so many new deep neural network ideas are first implemented in Torch and made available as open source extensions. Thus, the state-of-the-art in deep learning is often first available to use in Torch.

The downside of this is that Torch documentation often falls behind implementation, so unless you find an example on github for exactly what you want to do, it can be a challenge to figure out which Torch modules you should be using and how to use them.

One example of this is how to get Torch to train your neural networks using multiple GPUs. Searching for “multi gpu torch” on the internet yields this github issue as one of the top results. From this, we know we can access more than one GPU from the torch environment, but how do we use this low-level construct to train a complex network?

Data vs. Model Parallelism
When parallelizing the work to train a single neural network, we have 2 choices on how to split up the work: Model Parallelism and Data Parallelism.


With Model Parallelism, each GPU runs a chunk of the nodes in the network for a given batch of data.


With Data Parallelism, each GPU runs the entire network for different batches of data.

This distinction is discussed in detail in this paper, but the choice between using one or the other impacts what kind of synchronization is required between GPUs. Data parallelism requires synchronization of model parameters, model parallelism requires synchronizing input and output values between the chunks.

Simple Torch Example
We will now look at a simple example of training a convolutional neural network based on a unit test in Torch itself. This network has 2 convolutions layers and 2 rectifier layers. We do a simple forward and backward pass over the network. Instead of actually computing error gradients for training, we just set them to a random vector to keep things simple.

Now let’s convert it to run on a GPU (this example will only run if you have a CUDA-compatible GPU):

To run this on a GPU, we call cuda()on the network and then make the input a CudaTensor.

Now let’s distribute the model across 2 GPUs (as an example of the model parallel paradigm). We iterate over the GPU device IDs and use the cutorch.withDevice to place each layer on a particular GPU.

This puts a convolutional layer and a ReLU layer on each GPU. The forward and backward passes must propagate the outputs between GPU 1 and GPU 2.

Next, we use nn.DataParallelTable to distribute batches of data to copies of the whole network running on multiple GPUs. DataParallelTable is a Torch Container that wraps multiple Containers and distributes the input across them.


So instead of running the forward and backward passes over the original Sequential container, we now run it on the DataParallelTable container and the data is distributed to copies of the network on each GPU.

Here is a job on Rescale you can clone and run yourself with all the above code.

A Larger Example
Let’s now look at a use of DataParallelTable in action when training a real DNN. We will be using Sergey Zagoruyko’s implementation of Wide Residual Networks on CIFAR10 on github.

In train.lua, we see all the parallelization of the base neural network is applied by a helper function:

Delving into makeDataParallelTable, we see a similar structure to our last example above using nn.DataParallelTable:add

You can clone these jobs and run the training yourself on Rescale:

CIFAR10 Wide ResNet, 1 GPU

CIFAR10 Wide ResNet, 4 GPUs

After running the training for 10 epochs, we see the 4 GPU job runs about 3.33 times faster than the single GPU job. Pretty good scale up!

In this article, we have given example implementations of model and data parallel DNN training using Torch. In future posts, we will cover multi-GPU training usage using other neural network libraries as well as multi-node scaling.

This article was written by Mark Whitney.