markblog2

Rescale’s Design-of-Experiments (DOE) framework is an easy way to optimize the performance of machine learning models.  This article will discuss a workflow for doing hyper-parameter optimization on deep neural networks.  For an introduction to DOEs on Rescale, see this webinar.

Deep neural networks (DNNs) are a popular machine learning model used today in many many applications including robotics, self-driving cars, image search, facial recognition, and speech recognition.  In this article we will train some neural networks to do image classification and show how to use Rescale to maximize the performance of your DNN models.

classification

For an introduction on training an image classification DNN on Rescale, please see this previous article.  This article will expand on the basic model training example and show how to improve the performance of your network through a technique called hyper-parameter optimization.

Hyper-Parameter Optimization

A typical starting point for a task using DNNs is to select a model published in the literature and implement it in the neural network training framework of your choice, or even easier, download an already-implemented model from the Caffe Model Zoo.  You might then train the model on your training dataset and find that the performance (classification accuracy, training time, etc.) is not quite good enough.  At this point, you could go back and look for a whole new model to train, or you can try tweaking your current model to get the additional performance you desire.  The process of tweaking parameters for a given neural network architecture is known as hyper-parameter optimization.  Here is a brief list of hyper-parameters people often vary:

  • Learning rates
  • Batch size
  • Training epochs
  • Image processing parameters
  • Number of layers
  • Convolutional filters
  • Convolutional kernel size
  • Dropout fractions

networkchoices

Given the large list hyper-parameter choices above, even once we set the model architecture, there are still a large number of similar neural network variants.  Our task is to find a variant that performs well enough for our needs.

Randomized Hyper-Parameter Search DOE

Using the Rescale platform, we are now going to create a Design-of-Experiments job to sample hyper-parameters and then train these different network variants using GPUs.

For our first example, we will start with one of the example convolutional networks from the Keras github repository to train an MNIST digit classifier.  Using this network, we will vary the following network parameters:

  • nb_filters: Number of filters in our convolutional layer
  • nb_conv_size: Size of convolutional kernel

We modified the example to have the template values above and here is an excerpt from the script with the templated variables:

Note the nb_filters and nb_conv_size parameters surrounded by ${}.  We are now ready to create a DOE job on the Rescale platform that uses this template.

If you want to follow along, you can clone the example job:

Keras MNIST DOE

input

First, we select the DOE job type in the top right corner and upload the mnist.pkl.gz dataset from the Keras github repository.

params

Next, we specify that we are running a Monte Carlo DOE (lower left), that we want to do 60 different training runs, and we specify our template variables.  We somewhat arbitrarily choose both parameters to be sampled from a uniform distribution.  The convolutional kernel size ranges from 1 to 10 (each image size is 28×28, so more than 28 would not work), and the number of convolutional filters ranges from 16 to 256.

Note, we could specify our template variable ranges with a CSV instead.  For this example, we would manually sample random values for our variables and the CSV would look like this:

template

Next, we need to upload the Keras script template that we created above.  At this point, we can also specify a new name for the script with the values inserted.  It is also fine if you just want to keep the name the same, as we have done here.  The materialized version will not overwrite the template.

software

Time to select the training library we want to use. Search for “Keras” in the search box and select it. Then, we select the K520/Theano compatible version.  Finally, we enter the command line to run the training script.  The first part of the command is just copying the MNIST dataset archive into a place where Keras can find it.  Then, we are just calling the python script.

hardware

Since we selected a version of Keras configured to work with K520s, our hardware type is already narrowed down to Jade.  We can now size each training cluster on the left.  The count here is in number of CPU cores.  For the Jade hardware type, there are 4 CPU cores for every GPU.  On the right, we set the number of separate training clusters we will provision.  In this case we are using 2 training clusters with a single GPU per cluster.

postproc

The last step is to specify the post-processing script.  This script is used to parse the output of our training jobs and make any metrics available to Rescale to show in the results page we will see later.  The expected output format is a line for each value:

Since the training script already prints the properly formatted accuracy as the last line of its output, result.out, our post-processing script just needs to parse out that last line.

You can now submit the job in the upper right and we move to the real-time cluster status page.

opt-status

Once the clusters are provisioned and the job starts, we can view the progress of our training jobs.  In the live tailing window, select a run and select process_output.log.  If the training has already started, you can view the training progress and the current training accuracy.  Individual runs can be terminated manually by selecting the “x” next the select run number.  This enables the user to stop a run early if the parameters are clearly yielding inferior accuracy.

results-table

Once our job completes, the results page summarizes the hyper-parameters used for each run and the accuracy results.  In the above case, the initial model we took from the Keras examples had an accuracy of 99.1% and the best result we obtain has an accuracy of about 99.4%, a small improvement.  If we want to download the weights for the most accurate model, we can sort by accuracy and then select the run details.

results-details

Among other results files is mnist_model.json and mnist_model.h5, which are the model architecture and model weights files needed to reload the model back into Keras.  We can also download all the data from all runs as one big archive or download the results table as a CSV.

results-plot

Our accuracy results can also be visualized in the charting tab.

Bring Your Own Hyper-Parameter Optimizer

Rescale supports use of 3rd party optimization software as detailed here.  We will now discuss creating a Rescale optimization job to run a black-box optimizer from the machine learning literature.

Using the Rescale optimization SDK, we choose to plug in the Sequential Model-based Algorithm Configuration (SMAC) optimizer from the University of British Columbia.  The SMAC optimizer builds a random forest model of our neural network performance, based on the hyper-parameter choices.  We use version 2.10.03, available here.

The optimizer is a Java application which takes the following configuration:

  • “scenario” file: specifies the command line for the optimizer to run, in this case this is our network training script
  • parameter file: specifies the names of our hyper-parameters and ranges of values

When SMAC runs the training script, it passes the current hyper-parameter selections to evaluate as command line flags.  It expects to receive results from the run in stdout, formatted as a string like this:

Result of this algorithm run: , , , , ,

To start, we create a parameter file for the hyper-parameters we will vary in this experiment:

Now, in addition to varying the convolutional filter count and convolutional kernel size, we are also varying the dropout fractions and the size of the pooling layer.

Now we look at modifications made to our previous training script to accommodate SMAC input and results:

Rather than injecting hyper-parameter values in as a template, we now just use argparse to parse the flags that are provided by SMAC.

Since we are feeding errors into an optimizer that then selects new parameters based on that error, we now keep separate validation and test datasets.  Above we are splitting the original training data into a training and validation set.  It is important to hold out a separate test dataset so we have an error metric to evaluate that is not in danger of being overfit by the optimizer.  The optimization algorithm only gets to see the validation error.

Here we train the model, evaluate it on the validation and test datasets, and then output the results in the SMAC-specific format (“Result of algorithm…”).  Note, we also catch any errors in training and validation and mark that run as “UNSAT” to SMAC so that SMAC knows that it is an invalid combination of parameters.

opt-flow

In order for SMAC to call the Rescale python SDK, we write a wrapper script, which we will call smac_opt.py, and specify for SMAC to call it in the scenario file.  The wrapper then submits the training script to be run.

Some important excerpts from the wrapper script:

To start, we take all the command line flags passed into the script and pass them directly to our objective function.  This is the objective function that will be called for each set of hyper-parameters.

To start the objective function, we package up the input files for this training run into a .zip file.

Here, we format the training script command we will run.  Note, that we are just passing the flags from SMAC (variable X) through to the training script.  Then, we call the submit command which sends the input files to the training cluster and starts training.  We also now call the format_results script ourselves.

We parse the output file from the training script to get the expected SMAC results line (“Result of algorithm run…”), as well as the error on the test dataset.

Finally, we need to specify the scenario file that tells SMAC to call our wrapper script.

The important parts here are:

pcs-file: specifies the parameters
algo: wrapper script to run
numberOfRunsLimit: sets the number of training runs
check-sat-consistency: tells the optimizer that for the same training dataset, different parameter selections might lead to a feasible or infeasible model

So now that we have all of our input files, we are ready to create a job.

You can clone the job to run yourself here:

Keras MNIST SMAC Optimizer

smac-input

The input.tar.gz archive consists of an input/ directory with the our mnist_cnn_smac.py training script and the format_results.py post-processing script.

We select the same software as before, Keras configured for Theano on a K520.

smac-hardware

Hardware selection is roughly the same as well. We again select 2 task slots to train 2 networks in parallel.

smac-optsettings

For the optimizer, we select “Custom Optimization” and then enter the command line to run SMAC.  This command is made complicated by the fact that each SMAC process only runs one iteration at a time.  To run multiple trainings in parallel, we must use SMAC’s “shared model mode”.  Turning on this mode tells SMAC to periodically check its current directory for optimizer results from other SMAC processes and incorporate those results.  This mode requires we set the “-seed” to a different value for each SMAC process.

Since we need to run multiple optimizer processes at once, we background all the calls and then sleep for the maximum amount of time we would want to run this optimizer for.  In this case, we are waiting for 4 hours.

This job is now ready to submit.  Once running, you can live tail current iterations, and at the end, view results just as in the previous randomized case.  The results will now show both the validation and test errors for each set of hyper-parameters.

Conclusion

In this article, we showed 2 ways to search the space of hyper-parameters for a neural network.  One used randomized search and the other used a more sophisticated optimization tool, making use of the Rescale optimization SDK.

This article was written by Mark Whitney.

markblog

Rescale now supports running a number of neural network software packages including the Theano-based Keras. Keras is a Python package that enables a user to define a neural network layer-by-layer, train, validate, and then use it to label new images. In this article, we will train a convolutional neural network (CNN) to classify images based on the CIFAR10 dataset. We will then use this trained model to classify new images.

We will be using a modified version of the Keras CIFAR10 CNN training example and will start by going step-by-step through our modified version of the training script.

CIFAR10 Dataset

The CIFAR10 image classification dataset can be downloaded here. It consists of approximately 60000 32×32 pixel images, each given one of 10 categories. You can either download the python version of the dataset directly or use Keras’ built-in dataset downloader (more on this later).

We will load this dataset with the following code:

 

We are using the cifar10 data loader here, converting the category labels to a one-hot encoding, then scaling the 8-bit RGB values to a 0-1.0 range.

The X_train and X_test outputs are numpy matrices of RGB pixel values for every image in the training and test set. Since there are 50000 training images and 10000 test images and each image is 32×32 pixels with 3 color channels, the shape of each matrix is as follows:

X_train (50000, 3, 32, 32)
Y_train (50000)
X_test (10000, 3, 32, 32)
Y_test (10000)

The Y matrices correspond to an ordinal value representing one of the 10 image classes for the 50000 and 10000 image groups:

airplane 0
automobile 1
bird 2
cat 3
deer 4
dog 5
frog 6
horse 7
ship 8
truck 9

For the sake of simplicity, we do no further pre-processing on the correctly sized images in this example. In a real image recognition problem, we would do some sort of normalization, ZCA whitening, and/or jittering. Keras integrates some of this pre-processing with the ImageDataGenerator class.

Defining the Network

The next step is to define the neural network architecture we wish to train:

This network has 4 convolutional layers followed by 2 dense layers. Additional layers can be added and layers can be removed or changed, but the first layer must have the same size as an input image (3, 32, 32) and the last dense layer must have the same number of outputs as number of classes we are using as labels (10). After the final dense layer is a softmax layer that squashes the output to a (0, 1) range that sums to 1.

Training and Testing

Next we train and test the network:

Here we have chosen stochastic gradient descent as our optimization method with a cross entropy loss. Then we train the model using the fit() method. We specify the number of training epochs (times we iterate through the data) and the size of our batches (the number of inputs to evaluate on the network at once). Larger batch sizes correspond to more memory usage while training. After our network is trained, we evaluate the model against the test data set and print the accuracy.

Saving the Model

Finally, we save our trained model to files so that we can later re-use it:

Keras distinguishes between saving the model architecture (in our case, what is output from make_network()) and the trained weights. The weights are saved in HDF5 format.

Note that Keras does not guarantee that the saved model is compatible across different versions of Keras and Theano. We recommend you try to load saved models with the same version of Keras and Theano if possible.

Rescale Training Job

Now that we have explained the contents of the cifar10_cnn.py training script we will be running, we will create a Rescale job to train on a GPU node which is already optimized to run on NVIDIA GPUs. This job is publicly available on Rescale. First, we upload the training script and CIFAR10 dataset:

training-input-files

Here we are uploading the pre-processed version of the CIFAR10 images as downloaded by Keras to avoid re-downloading the dataset from the CIFAR site every time the job is run. This step is optional and we could instead just upload the cifar10_cnn.py script.

Next, we select Keras and specify the command line:

training-software

We select Keras from the software picker and then the Theano-backed K520 GPU version of Keras. The command line re-packs the dataset we uploaded and then moves the archive into the default Keras dataset location at ~/.keras/datasets. Then it calls the training script. If we opted to not upload the CIFAR10 set ourselves, we could omit all the archive manipulation commands and just run the training script. The dataset would then be automatically downloaded to the job cluster.

In the last step, we select the GPU hardware we want to run on:

training-hardware

Here we have selected the Jade core type and the minimum 4 cores for this type. Finally, submit the job.

It will take about 15 minutes to provision the cluster and compile the network before training begins. Once it starts, you can view the progress by selecting process_output.log.

training-progress

Once the job completes, you can use your trained model files. You can either download them from the job results page or use them in a new job, as we will show now.

Classifying New Images

We used a pre-processed numpy-formatted dataset for our training job, so what do we do if we want to take some real images off the internet and classify those? Since dog and cat are 2 of the 10 classes of images represented in CIFAR10, we pick a dog and cat image off the internet and try to classify them:

standing-cat        dog-face

We start by loading and scaling down the images:

We use scipy’s imread to load the JPGs and then resize the images to 32×32 pixels. The resulting image tensor has dimensions of (32, 32, 3) and we want the color dimension to be first instead of last, so we take the transpose. Finally, combine the list of image tensors into a single tensor and normalize the levels to be between 0-1.0 as we did before. After processing, the images are smaller:

standing-cat-small                                                                   dog-face-small

Note that we performed the simplest resizing here which does not even preserve the aspect ratio of the original image. If we had done any normalization on the training images, we would also want to apply these transformations to these images as well.

Loading the Model and Labeling

Assembling the saved model is a 2 step process shown here:

Putting it together, we take the model we loaded and call predict_classes to get the class ordinal values for our 2 images.

Rescale Labeling Job

Now to put our labeling script in a job and label our example images. This job is publicly available on Rescale. We start selecting the trained models we created. “Use files from cloud storage” and then select the JSON and HDF5 model files created by the training job:

label-input-model

Then upload our new labeling script dog_cat.py and dog and cat images.

label-input-rest

Select the Keras GPU software and run the labeling script. In this case, the dog and cat images are loaded from the current directory where the job is run from, so no files need to be moved around.

label-software

The labels will then be shown in process_output.log when the job completes.

label-output

label-results

The output is [3, 5] which corresponds to cat and dog from our image class table above.

That wraps up this tutorial. We successfully trained an image recognition convolutional neural network on Rescale and then used that network to label additional images. Coming soon in another post, we will talk about using more complex Rescale workflows to optimize network training.

This article was written by Mark Whitney.

 regressiontesting2

Rescale is a valuable regression and performance testing resource for software vendors. Using our API and command line tools, we will discuss how you can run all or a subset of your in-house regression tests on Rescale. The advantages of testing on Rescale are as follows:

  1. Compute resources are on-demand, so you only pay when you are actually running tests
  2. Compute resources are also scalable, enabling you to run a large test suite in parallel and get feedback sooner
  3. Heterogeneous resources are available to test software performance on various hardware configurations (e.g. Infiniband, 10GigE, Tesla and Phi coprocessors)
  4. Testing your software on Rescale can then optionally enable you to provide private beta releases to other customers on Rescale

Test Setup

For the remainder of this article, we will assume you have the following sets of files:

  1. Full build tree of your software package, in a commonly supported archive format (tar.gz, zip, etc.)
  2. Archived set of reference test inputs and expected outputs
  3. Script or command line to run your software build against one or more test inputs
  4. Script to evaluate actual test output with expected output
  5. (Optional) Smaller set of incremental build products to overlay on top of the full build tree

In the examples below, we will be using our python SDK. A selection of examples below are available in the SDK repo here. The SDK just wraps our REST API, so you can port these examples to other languages by using the endpoints referenced in https://github.com/rescale/python-sdk/blob/master/rescale/client.py.

Note that all these examples require:

  1. An account on platform.rescale.com
  2. A local RESCALE_API_KEY environment variable set to your API key found in Settings->API from the main platform page

Running tests from a single job

We will start with the simplest example, uploading a full build and test reference data as job input files, running the tests serially, and comparing the results. Let’s start with some example “software” which we will upload and run. Here is a list of the software package and test files:

Each software build and test case is archived separately. Here are the steps to prepare and run our test suite job:

  1. Upload build, reference test data, and results comparison script using the Rescale python SDK:

RescaleFile uploads the local file contents to Rescale and returns metadata to reference that file. At this point, you can view these files at  https://platform.rescale.com/files/.

      2. Create the test suite job:

RescaleJob creates a new job which you can now view at https://platform.rescale.com/jobs/. Note here we are running the job on a single Marble core. You can opt to run more cores by increasing coresPerSlot or change the core type by selecting a different core type code from RescaleConnect.get_core_types().

Note that the command and postProcessScriptCommand fields can be any valid bash script, so there is quite a bit of flexibility in how you run your test and evaluate the results. In our very simple example, the post-test command comparison just diffs the out and expected_out files in each test case directory.

  1. Submit the job for execution and wait for it to complete:

Once the job cluster is provisioned, the input files are transferred to the cluster, unencrypted, then uncompressed in the work directory. Next, the TEST_COMMAND is run, followed by the POST_RUN_COMPARE_COMMAND.

  1. Download the test results. All Rescale job commands have stdout redirected to process_output.log let’s just download that one file to get the test result summary.

It is important to note here that by doing our test result comparison as a post-processing step in our job, we avoid downloading potentially large output files, which would delay how long it takes to get test results. This doesn’t address the issue that we still need to upload the test reference outputs, which will often be similar size to the actual output. The key is that we only need to upload files to Rescale when they change, not for every test job we launch. Assuming our reference test cases do not change very often, we can now reuse the files we just uploaded to Rescale in later test runs.

You can find this example in full here.

Reusing reference test data

We will now modify the above procedure to avoid uploading reference test data for every submitted test job.

  1. (modified) Find test file metadata on Rescale and use as input file to subsequent jobs:

RescaleFile.get_newest_by_name just retrieves metadata for the test file that was already uploaded to Rescale. Note that if you uploaded multiple test archives with the same name, this will pick the most recently uploaded one.

Steps 2 through 4 are the same as the previous example.

Parallelize long running tests

The previous examples just run all your tests sequentially, let’s now run some in parallel. For this example, we assume our tests are partitioned into “short” and “long” tests. The short tests are in an archive called all_short_tests.tar.gz and each long test is in a separate archive called long_test_.tar.gz.

We will now launch a single job for all the short tests and a job per test for the long tests. We assume these test files have already been uploaded to Rescale, as was done in the first example.

In this example, we launched our short test job with a single Marble core and each of our long tests with a 32 core (2 nodes) Nickel MPI cluster.

This test job configuration is particularly appropriate for performance tests. To test that a particular build + test case combination scales, you might launch 4 jobs with 1, 2, 4, and 8 nodes respectively.

This example can be found here.

Incremental builds

In the above, we avoided re-uploading test data for each test run by reusing the same data already stored on Rescale. If we have a large software build we need to test, we would like to also reuse already uploaded data, but each build tested will generally be different. In many cases though, only a small subset of files from the whole package will change from build to build.

To leverage the similarity in builds, we can supply an incremental build delta that will be uncompressed on top of the base build tree we uploaded in the first job. There are just 2 requirements:

  1. The build delta must have the same directory structure as the base build
  2. We need to specify the build delta archive as an input file AFTER the base build archive

In the above, base_build_input comes from the file already on Rescale and incremental_build_input is uploaded each time.

Parallelism in Design-of-Experiment (DOE) jobs

Another way to run tests is to group multiple tests into a single DOE job. The number of tests that can run in parallel is then defined by the number of task slots you configure for your job. You would then structure your test runs so that they can be parameterized by a templated configuration file, as described in https://www.rescale.com/resources/getting-started/doe/.

This method has the advantage of eliminating job cluster setup time, compared to the multi-job case. The disadvantage is that each test run is limited to the same hardware configuration you define for a task slot. For an example on how to set up a DOE job with the python SDK, see https://github.com/rescale/python-sdk/tree/master/examples/doe.

Large file uploads

In the above examples, we have uploaded our input files with a simple PUT request. This will be slow and/or often not work for multi-gigabyte files. An alternative is to use the Rescale CLI tool, which provides bandwidth optimized file uploads and downloads and can resume transfers if they are interrupted.

For more info on the Rescale CLI, see here: support@rescale.com.

Running tests on Rescale is a great way to reduce testing time and strain on internal compute resources for large regression and performance test suites. The Rescale API provides a very flexible way to launch groups of tests on diverse hardware configurations, including high-memory, high-storage, infiniband, and GPU-enabled clusters. If you are interested in doing your own testing on Rescale, check out our SDK and example scripts at https://github.com/rescale/python-sdk or contact us at support@rescale.com.

This article was written by Mark Whitney.

Microsft azure logo header

Rescale recently rolled out SAML Single Sign-On login support for our ScaleX Enterprise users. This post will discuss how to set up Rescale as a SAML Service Provider, using Azure Active Directory as the Identity Provider.

Prerequisites to follow the tutorial below are that you have ScaleX Enterprise and Microsoft Azure accounts.

SAML Background

SAML is an authentication protocol used for web single sign-on (SSO). There are 2 entities involved in a SAML deployment:
Identity Provider (IdP) The entity that manages your users’ credentials. Its role is to authenticate users with passwords or other types of keys.
Service Provider (SP) The entity that provides end user applications. It relies on the Identity Provider to authenticate users.

This Wikipedia page has far more detail, but the SAML protocol roughly consists of the following steps:

  1. User tries to access a resource from the Service Provider and the provider needs to authenticate that user
  2. Service Provider redirects the user to the IdP’s SSO endpoint with a user identifier
  3. IdP redirects the user to its own login page if the user is not currently authenticated with the IdP
  4. IdP responds with authentication and redirects user back to the SP’s Assertion Consumer Service (ACS)
  5. SP ACS verifies the authentication came from the IdP and logs the user in on the SP
  6. SP redirects the user to the originally requested resource if they are authorized to access it

Note that while the Identity Provider manages the authentication of the user, the Service Provider still imposes its own authorization rules on that user. Also note that the Identity Provider and Service Provider never exchange the users’ secret key information but instead the Service Provider is just allowed to ask the Identity Provider about the identity of the user currently logged into the web client.

In this tutorial, the Identity Provider will be an Azure Active Directory and the Service Provider will be Rescale.

The configuration we will outline has 3 high-level chunks:

  1. Set up a new Azure Active Directory as a test IdP
  2. Add your Rescale as an authorized SP to access your new IdP
  3. Set up your ScaleX Enterprise account to authorize your new IdP

Creating a test Azure Active Directory SAML Identity Provider

For completeness, we will start by setting up a test of Azure Active Directory. If you already have an AD set up for your organization, you can probably skip ahead to the “Authorizing Rescale as an SP” section.

Start by logging into the Azure management. Go to the Active Directory section and create a new directory with the +NEW button in the bottom left corner.

azure-select-ad

Select DIRECTORY and CUSTOM CREATE

azure-select-directory

Fill in the fields to create your new directory. The NAME and DOMAIN NAME can be the same but should be unique across directories in your organization.

azure-custom-add

At this point, you have just created a new Active Directory. You can now add users to your directory. Select the directory you just created and choose the USERS tab at the top.

azure-new-config

Your Azure user should already in the user list for this directory. Add another user by selecting ADD USER at the bottom.

azure-add-user

Select “User with an existing Microsoft account” and add an email of the user from your domain you wish to add. Fill out additional profile information on the next page and then save the new user.

azure-add-user-name

At this point, we have an Active Directory with one or more users. The next step is to tell our AD to allow Rescale to query for users to authenticate.

Authorizing Rescale as a Service Provider

The Identity Provider must now be told to allow Rescale to query it for logged-in user information.
Select the APPLICATIONS tab at the top and then select ADD at the bottom. Pick “Add an application my organization is developing.”

azure-new-app

Name your application to whatever you like and set it to be a “WEB APPLICATION AND/OR WEB API”.

azure-app-name

On the next page, you start to configure the important bits of your Service Provider application. You should set your SIGN-ON URL to “https://platform.rescale.com/saml2/company ID/sso/” and your APP ID URI to “https://platform.rescale.com/saml2/company ID/”. Your company ID is generally just the name of your company but you can contact support to verify this. In this example, we are adding users using the company code “rescale”.

azure-app-properties

Next we need to set a few additional properties and get the IdP endpoints the Rescale Service Provider will use. Go to CONFIGURE tab and scroll down to the REPLY URL. Set the URL to “https://platform.rescale.com/saml2/company ID/acs/” and then SAVE the change.

azure-app-configure-reply

Next, let’s note the endpoints we will need to configure our Rescale account to use this IdP. You should copy these endpoints:

  • SAML-P SIGN-ON ENDPOINT
  • SAML-P SIGN-OUT ENDPOINT
  • FEDERATION METADATA DOCUMENT

Next, we need to retrieve the Azure entity ID it uses to identify itself with the Service Provider and the X509 certificate it uses to sign SAML responses.
Open up another tab in your browser and go to the FEDERATION METADATA DOCUMENT URL you just copied. Copy both the “entityID” attribute and the first “X509Certificate” element in the document.

azure-metadata-get-fields

At this point, we have everything we need from the Azure console. It is time to move over to the Rescale platform to configure your Service Provider.

Authorize your Identity Provider in ScaleX Enterprise

Open up a new tab and log into https://platform.rescale.com as a company administrator. Select Company Administration in the top right user drop-down menu, then select the Settings tab and scroll down to the “SSO (Single Sign-On)” section.

rescale-sso-settings

You can now fill in the relevant fields here:

Service Provider saml:NameID Format attribute: For Azure, this should be set to “urn:oasis:names:tc:SAML:2.0:nameid-format:persistent”
Name of Identity Provider email field in ACS response: For Azure, should be set to “http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress”
Identity Provider EntityDescriptor Entity ID: This should be set to the “entityID” value copied from the Azure metadata.
Identity Provider SingleSignOnService URL: This should be set to “SAML-P SIGN-ON ENDPOINT” from the Azure endpoint list.
Identity Provider public X509 certificate: This should be filled with the X509 certificate copied from the Azure metadata. You should ensure there are no line breaks in the middle of the certificate contents.

Note that the first two fields are the same regardless of your particular AD deployment but the last 3 are different and are based the values you captured above. These three values will differ from what is shown in the screenshot.

Check the following checkboxes:

  • Active
  • Encrypt NameID
  • Sign AuthnRequests

For your initial test, you should also select “Create any user who can authenticate with SSO”. More on this in the next section.

Then click Update SSO Settings. With that, you should be able to log your user in using your AD IdP! To test it, log out of Rescale via the upper right user drop down. Go to the Rescale SSO login page and try to log in as one of the users you authorized by your Active Directory provider.

SP vs. IdP initiated login and invites

Users of your ScaleX Enterprise account can log in with SSO through either of these endpoints:

IdP initiated logins https://platform.rescale.com/saml2/company ID/sso/
SP initiated logins https://platform.rescale.com/login/sso/

The former is the more “direct” link. It redirects straight to your IdP SSO URL and does not require the user to enter any credentials on Rescale’s side. The second page takes a user email and then tries to route the user to the correct Identity Provider. This page is meant for users that start by going directly to https://platform.rescale.com but still want to log in with SSO.

As mentioned in the previous section, you can choose to either allow any user authenticated by your IdP to log into Rescale under your organization, or you can only allow users to create Rescale accounts who have been explicitly invited. In order to enforce the invite-only restriction, you should go back to the SSO settings section in the Rescale Company Administration panel and select “Only create invited users” and save those settings. Now to invite users, go to the Members tab at the top, click “Invite Members”.

rescale-invites

You can then enter a list of emails of the IdP authenticated users you want to enable access for on Rescale. These users will then be sent invitation emails with the IdP initiated login mentioned above.

At this point, you should have a secure Active Directory configured to allow Single Sign-On on Rescale. You can now manage your users access to Rescale by either controlling user authorization in the Rescale Company Administration portal or controlling user authentication to Rescale via your Active Directory.

This article was written by Mark Whitney.