ryanblog3

It has been about a year and a half since we released a reusable Azure Cloud Service for provisioning a simple Windows MS-MPI cluster without having to install HPC Pack. Azure has undergone a lot of changes since that time and we thought it would be worth revisiting this topic to see what the current landscape looks like for running Windows MPI applications in the cloud.

First, Cloud Services and the so-called “IaaS v1 Virtual Machines” have been relegated to “Classic” status in the Azure portal. Microsoft now recommends that all new deployments use Azure Resource Manager (ARM). Azure Resource Manager allows clients to submit a declarative template written in json that defines all of the cloud resources such as VMs, load balancers, and network interfaces, that need to be created as part of an application or cluster. Dependencies can be defined between resources and the Resource Manager is smart enough to parallelize resource deployment where it can. This can make deploying a new cluster or application much faster than the old model. Azure Resource Manager is essentially the equivalent of CloudFormation on AWS. There are some additional niceties here though such as being able to specify loops in the template. Dealing with conditional resource deployment however, is clunkier in ARM templates than in CloudFormation. Both services suffer from trying to support programming logic from within json. All in all however, ARM deployments are much easier to manage than Classic ones.

The Azure Quickstart Templates project on Github is a great resource for finding ARM templates. Deploying an application is literally as simple as clicking a Deploy to Azure button and filling in a few template parameter values. On the HPC front, there is a handy HPC Pack example available that can be used to provision and setup the scheduler.

However, as we touched on in our original blog post, using HPC Pack may not be the best choice if you are getting started with MPI and simply want to spin up a new MPI cluster, test your application, and then shut everything back down again. While HPC Pack provides the capabilities of a full blown HPC scheduler, this additional power comes at the cost of some resource overhead on the submit node (setting up Active Directory, installing SQL Server, etc). This can be overkill if you just want a one-off cluster to run a MPI application.

Another potentially lighter weight option for running Windows MPI applications in the cloud is the Azure Batch service. Recently, Microsoft announced support for running multi-instance MPI tasks on a pool of VMs. This looks to be a useful option for those who interested in automating the execution of MPI jobs however it does require some investment of developer resources to become familiar with the service before MPI jobs can be run.

We feel there is a still room for an Azure Resource Manager template that 1) launches a bare-bones Windows MPI cluster without the overhead of HPC Pack and 2) allows MPI jobs to be run from the command line or a batch script from any operating system.

On that second point above, another interesting development since our original post is that Microsoft has decided to officially support SSH for remote access. Since that announcement, the pre-release version of the code has been made available on GitHub.

So, given those pieces we decided to put together a simple ARM template to accomplish both of those goals. For someone getting started with MS-MPI, we feel this is a simpler option to getting your code running on a Windows cluster in Azure.

Here is a basic usage example:

  1. Click the Deploy To Azure button from the Github project. Fill in the template parameters. Here, a 2-node Standard_D2 cluster is being provisioned:

    ryan
  2. Make a note of the public IP address assigned to the cluster when the deployment completes.
  3. The template will enable SSH and SFTP on all of the nodes. Upload your application to the first VM in the cluster (N0).  Here we are using the hello world application from this blog post.
  4.  SSH into N0, copy the MPI binary into the shared SMB directory (C:\shared), and run it. Enter your password as the
    argument to the -pwd switch (redacted below). The -savecreds command line argument will securely save your credentials on
    the compute nodes so you don’t have to specify the password in future mpiexec calls. See
    here for more details.

And that’s it! For those that are more GUI-inclined, RDP is also opened up to all of the instances in the MPI cluster. Head on over to the Github project page for more details.

This article was written by Ryan Kaneshiro.

Transfermanager

The Rescale Transfer Manager is a native Windows application that can be used to download output files from jobs. This is a more robust and faster alternative to downloading large files through the browser.

Getting Started

First, you’ll need to provision an API key for your account by navigating to Settings > API (direct link). Click the Generate button in the API section to create a new key if you do not already have one.

Then, select the “Click to Install” button in the Rescale Transfer Manager section to download and install the application on your desktop.
When the Transfer Manager launches, it will prompt you to enter your API key. Copy the API key that was provisioned above, paste it into the text box, and click the OK button to continue.

apikey

How do I download files?

Once the Transfer Manager is installed, there are several approaches that can be used to start your download depending on what you are trying to transfer.

If you simply want to download all job output files…
The easiest way to download all files is to navigate to a job’s result page in the browser and click the “Download with Rescale Transfer Manager” button in the upper right. This will launch the Transfer Manager if it is not already running, prompt you for a download location, and then start the download.

results_screen

If you want to download a subset of the job output files…

For many jobs, there are only a small number of output files that need to be transferred to a local workstation. Transferring only the files that you need will save a lot of time and bandwidth.
First, you’ll need to make a note of the job ID of the job that you want to download. The easiest way to obtain the job ID is to select the job in the browser and look at the address bar. There is a short code that is displayed in the URL that consists of a series of upper and lowercase letters. Paste this code into the Job ID text box.

addressbar

Next, you’ll need to provide a Search Query to restrict the files that are downloaded. Note that this is currently limited to a simple substring match with no globbing or regular expressions allowed. In the screenshot below, any file that contain “d3dump” in its name will be downloaded.

downloadfromapp2

If you want to download output files automatically when jobs complete…

The Transfer Manager has a simple background download feature that can be used to monitor your jobs and automatically start downloading output files when the job finishes. This can be a useful feature to enable if you are submitting a number of jobs at the same time and don’t want to manually download the results from each one individually.

To enable this feature, click on the gear icon in the upper right to open the Settings page.
On the settings page, you will need to first enable Automatic Downloads by checking the Enabled box.

downloadinbackground

This will reveal additional settings that control which jobs and files will be automatically downloaded:

  • Max job age (in days) specifies the oldest completed job that will be downloaded. In the screenshot above, all completed jobs that were created in the last 30 days will be downloaded.
  • Destination indicates the directory that jobs will be saved to.
  • Search Query will restrict downloads to the files that have names which contain the specified value.

Click the Save button to commit your changes. After a few moments, a job download should begin automatically. Note that the Transfer Manager application must remain open for automatic downloads to work.

The Rescale Transfer Manager is available for download today. Please email support@rescale.com if you have any questions or feature suggestions.

This article was written by Ryan Kaneshiro.

compress

One of the key challenges with cloud HPC is minimizing the amount of data that needs to be transferred between on-premise machines and machines in the cloud. Unlike traditional on-premise systems, this transfer occurs over a much slower and less reliable Wide Area Network. As we’ve touched on previously, the best thing to do is perform post-processing remotely and avoid transferring data unnecessarily.

That said, a common scenario for many users is to run a simulation and then transfer all of the output files from the job back to their workstation.

After a job has completed, each file in the working directory is encrypted and uploaded to cloud storage. This provides flexibility for users that only need to download a small subset of the output files to their machine. However the tradeoff is that each file introduces additional overhead in the transfer. When transferring data over a network, the more data that can be packed into a single file, the better. Further, many engineering codes will emit files that are highly compressible. Although compressing a file takes extra time, this can still be a net win if the time spent compressing plus transferring a smaller file is less than the time spent uploading the larger, uncompressed archive. Even if the compression and ensuing transfer takes longer overall, the real bottleneck in the overall transfer process is going to be the last hop between cloud storage and the user’s workstation. Having a smaller compressed file to transfer here can make an enormous difference depending on the user’s Internet connection speed.

If you know beforehand that you will need to download all of the output files for a job, then in general it is best to generate a single compressed archive file first instead of transferring each file individually. The linux tar command provides an easy way to create a compressed archive however it does not utilize the extra computing power available on the MPI cluster to generate the archive.

Jeff Gilchrist has developed an easy-to-use bz2 compressor that runs on MPI clusters (http://compression.ca/mpibzip2/). We compiled a Linux binary with a static bzip2 library reference and have made it available here for download to make it easier to incorporate it into your own jobs. The binary was built with the OpenMPI 1.6.4 mpic++ wrapper compiler. Please note that it may need to recompiled depending on the MPI flavor that you are using.

To use it, upload the mpibzip2 executable as an additional input file on your job. Then, the following commands should be appended to the end of the analysis command on the job settings page.

tar cf files.tar –exclude=mpibzip2 *

mpirun -np 16 mpibzip2 -v files.tar

find ! -name ‘files.tar.bz2’ -type f -exec rm -f {} +

First, a tar file is created called files.tar that contains everything except the parallel bzip utility. Then, we launch the mpibzip2 executable and generate a compressed archive called files.tar.bz2. Finally, all files except files.tar.bz2 are deleted. This prevents both individual files AND the compressed archive from being uploaded to cloud storage.

Note that the -np argument on the mpirun call should reflect the number of cores in the cluster. Here, the commands are being run on a 16 Nickel core cluster.

One additional thing to be aware of is that Windows does not support bz2 or tar files by default. 7-Zip can be installed to add support for this format along with many others.

As a quick test we built compressed archives from an OpenFOAM job that contained 2.1 GB worth of output data spread over 369 files and uploaded the resulting file to cloud storage.

image

As a baseline, we built an uncompressed tar file. We also tried creating a gzip compressed tar file using the -z flag with the tar command. Finally, we tried building a bz2 compressed archive with 8, 16, and 32 Nickel cores.

Not surprisingly, in the baseline case, building the archive takes a negligible amount of time and the majority of the overall time is spent uploading the larger file. When compressing the file, the overall time breakdown is flipped: The majority of the time is spent compressing the file instead. Also unsurprisingly, leveraging multiple cores provides a nice speedup over using the single-core gzip support that comes with the tar command. At around 16 cores, the overall time is roughly the same as the baseline case.

The real payoff for the compression step however will become evident when a user attempts to download the output to his or her local workstation as the compressed bz2 file is almost 5 times smaller than the uncompressed tar (439 MB vs 2.1 GB).

To reiterate, we believe that pushing as much of your post-processing and visualization into the cloud is the best way to minimize data transfer. However, for those cases where a large number of output files are needed, you can dramatically reduce your transfer times in many cases by spending a little bit of time preparing a compressed archive in advance. We plan on automating many of the manual steps described in this post and making this a more seamless process in the future. Stay tuned!

This article was written by Ryan Kaneshiro.

drf
A quick Google search for “REST API versioning” turns up lots of discussion around how a version should be specified in the request. There is no shortage of passionate debate around RESTful principles and the pros and cons of whether embedding version numbers in urls, using custom request headers, or leveraging the existing accept header is the best way to go. Unfortunately, there isn’t any one correct answer that will satisfy everyone. Regardless of which approach you end up using, some camp will proclaim that you are doing it wrong. Respective of the approach that is used, once the version is parsed out of the request, there is another, larger, question around how you should manage your API code to support the different schema’s used across different versions. Surprisingly, there is little discussion out there around how to best accomplish this.

At Rescale, we have built our API with the Django Rest Framework. The upcoming 3.1 release will offer some API versioning support. The project lead has wisely decided to sidestep the versioning debate by providing a framework that allows API builders to select between different strategies for specifying the version in the request. However, there isn’t much official guidance around what the best practices are for dealing with this version in your code. Indeed, the documentation mostly just punts on the issue by stating “how you vary your behavior is up to you”.

One of Stripe’s engineers posted a nice high-level summary of how Stripe deals with backwards compatibility with their API. The author describes a transformation pipeline that requests and responses are passed through. One of the main appeals of this approach is that the core API logic is always dealing with the current version of the API and there are separate “compatibility” layers to deal with the different versions. Requests from the client pass through a “request compatibility” layer to transform them into the current schema before being handed off to the core API logic. Responses from the core API logic are passed through a “response compatibility” layer to downgrade the response into the schema format of the requested version.

In this post, I want to explore a potential approach for supporting this type of transformation pipeline with DRF. The natural place then to inject the transform logic in DRF is within the Serializer as it is involved in both validating the request data (Serializer.to_internal_value) as well as preparing response data to return from an APIView method (Serializer.to_representation).

The general idea is to create an ordered series of transformations that will be applied to request data to convert it into the current schema. Similarly, response data will be passed through the transformations in the reverse order to convert data in the current schema to the version requested by the client. This ends up looking very similar to the forwards and backwards methods on database migrations.

As a basic example of how a response transform would be used, the following is a simple serializer that returns back a mailing list name and list of subscriber emails:

The payload from an endpoint that uses this serializer might return JSON formatted as:

Some time later, we decide that this endpoint also needs to return the date that each subscriber signed up for the mailing list. This is going to be a breaking change for any client that is using the original version of the API as each element in the subscribers array is now going to be an object instead of a simple string:

The serializer needs to updated to return data in this new format. To support backwards compatibility with the original version of the API, it will also need to be modified to derive from a VersioningMixin class and specify the location of the Transform classes (more on this in a bit):

Whenever a new API version is introduced for this serializer, a new numbered Transform class needs to be added to the api.transforms.mailinglist module. Each Transform handles the the downgrade of version N to version N-1 by munging the response data dict:

The Transform class is the analogue of a schema migration and contains methods to transform request and response data. Each Transform class name needs to have a numerical value as a suffix. The VersioningMixin class uses this to identify the order that Transforms should be applied to the request or response data.

The VerisoningMixin class provides the Serializer.to_internal_value and Serializer.to_representation overrides that will look up the Transforms pointed to by the transform_base property on the serializer and apply them in order to convert requests into the current API version or downgrade responses from the current API version to the requested version. In the following code snippet, settings.API_VERSION refers to the latest, current API verison number and the request.version field is set to the requested API version from the client:

The main benefit of this approach is the  APIView (the classes that will generally implement the core API logic and use Serializers for request/response processing) only need to worry about the latest schema version. In addition, writing a Transform requires knowledge of the only current and previous version of the API. When creating version 10 of a particular response, there is just a single Transform between v10 and v9 that needs to be created. A request asking for v7, will be first transformed from v10 to v9 by the new Transform. The existing v9 to v8 and v8 to v7 Transforms will handle the rest.

We certainly do not believe that this is a panacea for all backwards compatibility issues that will crop up. There are certainly some performance issues to consider with having to constantly run requests and responses through a series of potentially expensive transformations. Further, in the same way that it is sometimes impossible to create backwards migrations for database schema changes, there are certainly more complex breaking API changes that are not easily resolvable by this approach. However, for basic API changes, this seems like it could be a nice way to isolate concerns and avoid embedding conditional goop inside the core API logic for versioning purposes.

This article was written by Ryan Kaneshiro.