doe

Let’s say you want to run your simulation against a variety of configurations. Rescale makes it easy to run a design of experiments (DOE)–also known as a “Parallel Process” in the Rescale interface. To set up a DOE, it is easiest to start from an existing Rescale job, consisting of the following:

1. Selected hardware type
2. Selected software
3. Input files
4. Analysis command
5. (Optional) Pre/Post process scripts

We will need to make some small adjustments to the setup to turn it into a DOE:

1. Change your existing input file (or files) that you wish to parameterize into a Rescale “template file”
2. Specify the unique combinations of variables that you would like to use; we refer to each unique combination as a “run”
3. (Optional) Add more cores to increase the number of parallel executions

Then, when your job executes, we do the following for every run:

blog

Let’s start with an example. Suppose our existing job uses one input file, baseline.in. We would like to modify two values in our input file, the x and y velocity rates, set at 12.3 and 2.1 respectively, in baseline.in:

First we need to specify the different combinations of variables that we would like to use. Variable combinations can be defined directly in your browser:

variables

You can also supply a comma-separated values (CSV) file, where each row is a combination that you would like to use. We’re going to use that option here, since we want to have a denser distribution in one area of the design space. Here are the contents of the CSV file:

x_velocity,y_velocity
11,2
11,4
11.5,2.5
11.5,3
11.5,3.5
12,2.5
12,3
12,3.5
12.5,2.5
12.5,3
12.5,3.5
13,2
13,4

This will result in 13 runs in this job, one for each row in the CSV file (excluding the header). When Rescale executes each run, the platform will replace the placeholders in the template with the values for each run. This way you don’t need to change any input file arguments in the execution command or in any of the referenced files.

Next we need to convert baseline.in into a template. We’ll instruct the platform to replace the values that we would like to change with placeholders using the following syntax:

So to turn baseline.in into a template, we will update the line as follows:

To make it easier to recognize as a template, we’ll save it as baseline.in.template. Instead of uploading as an input file, we will upload it in the template section. The “Processed Filename” is the filename used when the platform populates the template with the variables of the current run. This is typically the name of the file used as the baseline for the template, baseline.in for this example. Note that each Rescale “run” is performed in a unique directory, so there won’t be any naming conflicts.

Here is what the line we modified in baseline.in will look like for a few select runs:

Run 1:

Run 2:

Run 3:

At this point, you may wish to increase the number of cores so that more runs can be performed in parallel. We will bump it up to thirteen so all of the runs will execute concurrently, taking about as long as the original single run job.

hw

The only step left is to submit the job. As the job is running, you can use Rescale’s plotting tools to review the results of your job, along with any post-processing results. You can find out more about generating results for use in our plotting tools in the “Basic DOE with Post-Processing” tutorial. Here is a plot of the design space we are exploring.

ui

We hope these features will make it easier for you to easily explore the design space of your specific problem. Please contact us if you would like to learn more about templating or any other feature of the Rescale platform. Please email info@rescale.com

This article was written by Rescale Engineering.

ryan-azure

One of the key criticisms leveled against HPC in the cloud is the relatively slow interconnect speed between nodes when compared to on-premise clusters. While a number of niche providers offer InfiniBand connectivity to address this gap, Microsoft is the first major provider to offer this type of high-bandwidth, low-latency interconnect with its new Big Compute solution. This is exciting news because there are relatively few companies that have the resources necessary to manage data centers on a large scale while also dealing with the security compliance issues and certifications needed for enterprises to move their workloads over to the cloud. Fair or not, having the backing of a Microsoft, Amazon, or Google can make a big difference in obtaining corporate IT buy-in.

According to specs, the new A8 and A9 instance sizes provide InfiniBand connectivity with RDMA. This last bit is especially important because, as this blog post correctly points out, having InfiniBand alone is not enough. The transport being used makes a critical difference and TCP performs very poorly. The Big Compute instances support virtualized RDMA that provide near bare metal performance according to Microsoft. This announcement should be a boon for users looking to run tightly coupled simulations in the cloud. This type of “chatty” MPI application is highly sensitive to the latency of the underlying network. However, after taking the platform out for a spin, I think there are a few barriers to entry in its current incarnation.

First, the RDMA capabilities are exposed through an interface called Network Direct which is currently only supported by MS-MPI–Microsoft’s MPI implementation. Applications will need to be recompiled against these libraries. This is not too big of a hurdle since MPI is a well-defined standard and MS-MPI is based on MPICH, which is widely supported. A bigger issue, however, is that applications will need to be written to run on Windows. Thankfully, many of the popular engineering applications being used today already have Windows versions that support MS-MPI. Anecdotally at least, it seems like applications can be recompiled with little effort.

Second, configuring an MPI cluster is very different in the Windows world compared to Linux. While Windows is certainly capable of putting up impressive MPI benchmark numbers, the vast majority of HPC practitioners are currently running on Linux. Configuring an MPI cluster in the cloud for Linux generally boils down to: “Launch your instances, use the package manager to install the MPI flavor of your choosing, set up passwordless SSH amongst all the nodes in your cluster, and create a machinefile”. On Windows, the recommended approach is to install and configure HPC Pack on a Windows Server box (either on premise or in the cloud). This can be difficult for someone familiar with Linux and not versed in the nuances of Windows server administration. While the HPC Pack solution is robust and full-featured, it does feel a bit heavyweight if you just want to run a few benchmarks or a simple one-off simulation. What would be nice is a tool like Starcluster to get people up and running as quickly as possible without having to configure Active Directory, install SQL Server, or figure out Powershell and REST APIs.

It turns out that you can install MS-MPI on Azure without HPC Pack but there doesn’t seem to be a lot of guidance out there on how to do this. Further, there are a number of SSH servers and UNIX utilities that have been ported to Windows. We wanted an easier way to launch an MPI cluster in Windows without having to install, configure, and manage a separate HPC Pack instance. What we ended up experimenting with was using the PaaS offering to deploy a Cloud Service containing a set of startup tasks to perform the following operations on each node:

  1. Install MS-MPI (a standalone installer is available here)
  2. Launch SMPD
  3. Install and configure an OpenSSH server and a standard set of UNIX command line utilities

system

Each Cloud Service has a single virtual IP (VIP) assigned to it. To work around this, we used Instance Internal Endpoints to allow users to SSH into individual nodes using different ports. Internal Endpoints are opened up so that each Role Instance can connect to the SMPD daemon running on the others. The end result of all this is an easy to deploy .cspkg file and accompanying configuration xml. Users can SSH into Role Instances and use the UNIX commands that they know and are familiar with.

We wanted to run a couple of latency and bandwidth benchmarks against 2 A9 instances. First, we recompiled the osu_latency and osu_bibw benchmarks from the OSU Microbenchmark library against MS-MPI. Then, we deployed the Cloud Service above, copied the benchmark executable to each machine with SCP (note that SCP is not a viable solution if you have large files that need to be moved around but it works fine for smaller files like these benchmark executables). Finally, we SSHed into one of the nodes and launched the executables:

ssh

The results of the benchmarks are below. As you can see, the 0-byte latency numbers are ~3us and we are seeing ~7.5GB/s being transferred in the bidirectional bandwidth test for larger message sizes, which is pretty close to full saturation.


# OSU MPI Latency Test
# Size         Latency (us)
0                      3.28
1                      3.69
2                      3.70
4                      3.67
8                      3.69
16                     4.11
32                     4.53
64                     5.35
128                    6.60
256                    2.85
512                    3.06
1024                   3.44
2048                   4.19
4096                   5.96
8192                   7.60
16384                 10.64
32768                 15.31
65536                 23.32
131072                53.65
262144                85.02
524288               156.81
1048576              299.23
2097152              567.89
4194304             1098.55

# OSU MPI Bi-Directional Bandwidth Test
# Size Bi-Bandwidth (MB/s)
1                      0.43
2                      0.87
4                      1.69
8                      3.35
16                     6.82
32                    13.69
64                    18.64
128                   29.12
256                  486.75
512                 1174.69
1024                2170.21
2048                3844.66
4096                5982.22
8192                2873.87
16384               7078.87
32768               6669.85
65536               4926.26
131072              4878.30
262144              5853.30
524288              6674.26
1048576             7066.08
2097152             7344.74
4194304             7479.30

These are very impressive performance numbers. I suspect that the real tipping point for Big Compute usage however, will be once Microsoft adds support for Linux VMs with their IaaS solution. It is not clear from the documentation available online what the timeline for this is right now (IaaS support for Windows Server was just recently added). It will be interesting to see how the new low-latency interconnect wars play out in 2014. As always, Rescale intends to remain agnostic on providers and offer our customers the best available hardware.

This article was written by Ryan Kaneshiro.

rescale-api

At Rescale, we’ve been quite busy lately with a few exciting features in the works. One of the items at the top of our customers’ wish list has been the ability to programmatically burst jobs onto the platform. To enable that, we’ve been working on making a public facing API.

Currently, our web service is a Django app, whose end points our Javascript client calls to create and monitor jobs. Fleshing out our internal data models and converting our existing Django views into a public facing RESTful API was quite an undertaking. We ended up using Django REST Framework for this purpose.

The Django REST Framework (DRF) is a powerful toolkit for Django. It provides a clear and easily extensible interface for handling serialization, deserialization, authentication, and routing for your API. DRF worked quite nicely with our multilevel relational model, and migrating our Django forms to DRF serializers was quite easy. In fact, using DRF made our code clearer and more concise because a lot of the features of its serializers and viewsets made our custom validation redundant.

Anatomy of a Rescale Job

Before going into the details of the API endpoints, let us first take a look at how a Rescale job is defined. A Rescale job is composed of two types of primary objects: jobanalyses and jobvariables. A job can contain one or more jobvariables. Jobvariables are input parameters that a user may specify to run a parameter sweep. Jobanalyses refers to the simulation software packages–or analysis code in Rescale terms–that you want to run for that job. Each jobanalysis object contains the analysis code, input command, hardware selection and the files associated with that particular jobanalysis. A job requires at least one jobanalysis in order to run.

JSON for an example job performing a ‘parameter sweep’  (sometimes referred to a Design of Experiments (DOE)) using a user uploaded binary “run_doe” would look like:

API endpoints

You can use the API to upload files to the Rescale platform and then construct the JSON blob for the job. You can then create, submit, and monitor your job using the relevant API endpoints. While all of the API endpoints can be accessed over the web, we have built a python client which currently supports a reduced set of API calls as follows: upload_file download_file create_job submit_job get_status get_files. The API is currently in a private beta. If you’re interested getting a token for accessing the API, or simply learning how it could best suit your needs, contact us at: support@rescale.com

This article was written by Rescale.

Cost–Effective, Scalable Model for Hardware and Premier FEA Software

Improves Efficiency and Product Quality; Speeds Time-to-Market

Siemens’ NX™ Nastran® software is now available on Rescale, an on-demand, dynamically scalable cloud environment delivered through a Software as a Service (SaaS) model. The new joint solution, which integrates on-demand high-performance computing (HPC) hardware with the industry’s leading finite element analysis (FEA) software, will be delivered by Rescale, Inc., a leading cloud simulation platform provider. Engineers can now seamlessly customize compute capacity, based on individual simulation requirements, to perform virtual product simulations. Rescale’s web-based platform allows engineers to easily run numerous instances of NX Nastran, resulting in cost-effective large scale simulations that more thoroughly evaluate design options. This results in improved efficiency, product quality and faster time-to-market.

Using the Rescale platform, NX Nastran users can run hundreds of simulations simultaneously, leveraging a pay-per-use operating expense model. This can be a more cost effective option for new customers to quickly gain the benefits of NX Nastran, a product of Siemens’ product lifecycle management (PLM) software business. Existing customers can significantly enhance flexibility by maintaining a basic level of in-house analysis capability for ongoing activities and leveraging the Rescale simulation platform on an hourly basis for peak demand. This new deployment option further expands the wide variety of platform choices available to Siemens PLM Software customers.

“NX Nastran in the cloud is a perfect solution for customers who want to avoid investing in in-house IT infrastructure for high-performance computing, or those who wish to augment their existing capabilities with additional capacity on demand,” said Jim Rusk, Senior Vice President, Product Engineering Software, Siemens PLM Software. “The Rescale platform helps enhance the value of NX Nastran by enabling users to efficiently perform large-scale simulations in the cloud – including volume priced Designs of Experiment simulations – via a robust, secure, online service.”

Rescale’s simulation platform seamlessly integrates simulation software with a customizable HPC infrastructure. This helps engineers and scientists develop more innovative products by performing research and development much faster. Rescale, a Siemens PLM Software solutions partner, offers users numerous workflow options including executing one job at a time, running multiple jobs in parallel, and performing Designs of Experiment (DoE) simulations that execute hundreds of individual runs for varying parameters across the design space. Engineers run DoE simulations to better understand the effects of parameter variations on the robustness of their designs or to evaluate a broader design solution space. The new on-demand platform makes large DoE simulations significantly more affordable and practical due to a volume pricing model that provides higher discounts with an increasing number of runs.

“We are excited to partner with Siemens PLM Software to provide NX Nastran on our simulation platform,” said Joris Poort, Chief Executive Officer, Rescale. “By offering flexible hourly pricing models for its leading CAE solution, Siemens PLM Software reinforces its position as a technological and thought leader in the industry.”

For further information on NX Nastran, please see www.siemens.com/plm/nxcae.

Follow Siemens on Twitter at www.twitter.com/siemens_press.

Follow Rescale on Twitter at: www.twitter.com/rescaleinc.

About Rescale

Rescale is a secure, cloud-based, high performance computing platform for engineering and scientific simulations. The platform allows engineers and scientists to quickly build, compute, and analyze large simulations easily on demand. Rescale partners with industry-leading software vendors to provide instant access to a variety of simulation packages while simultaneously offering customizable HPC hardware. Headquartered in San Francisco, CA, Rescale’s customers include global Fortune 500 companies in the aerospace, automotive, life sciences, and energy sectors. For more information on Rescale products and services, visitwww.rescale.com.

About Siemens PLM Software

Siemens PLM Software, a business unit of the Siemens Industry Automation Division, is a world-leading provider of product lifecycle management (PLM) software, systems and services with nine million licensed seats and 77,000 customers worldwide. Headquartered in Plano, Texas, Siemens PLM Software helps thousands of companies make great products by optimizing their lifecycle processes, from planning and development through manufacturing and support. Our HD-PLM vision is to give everyone involved in making a product the information they need, when they need it, to make the smartest decision. For more information on Siemens PLM Software products and services, visit www.siemens.com/plm.

About Siemens Industry Automation Division

The Siemens Industry Automation Division (Nuremberg, Germany) supports the entire value chain of its industrial customers – from product design to production and services – with an unmatched combination of automation technology, industrial control technology, and industrial software. With its software solutions, the Division can shorten the time-to-market of new products by up to 50 percent. Industry Automation comprises five Business Units: Industrial Automation Systems, Control Components and Systems Engineering, Sensors and Communications, Siemens PLM Software, and Water Technologies. For more information, visitwww.siemens.com/industryautomation.

Note: Siemens and the Siemens logo are registered trademarks of Siemens AG. NX is a trademark or registered trademark of Siemens Product Lifecycle Management Software Inc. or its subsidiaries in the United States and in other countries. NASTRAN is a registered trademark of the National Aeronautics and Space Administration. Rescale and the Rescale logo are registered trademarks of Rescale, Inc. All other trademarks, registered trademarks or service marks belong to their respective holders.

This article was written by Rescale Engineering.