Rescale released a new feature with its latest deploy: persistent clusters. This feature, when enabled, allows users to submit multiple jobs to the same cluster using the Rescale workflow (web UI) without needing to launch and shutdown multiple clusters. Prior to this, each job had to spin up its own cluster, which then shut down automatically after the job was completed, resulting in delays that could add up when running multiple small jobs. This new feature allows for faster iteration, which is particularly useful for testing or multiple jobs that require the same hardware configuration.
Saving time and money
Generally, it takes a few minutes for each cluster to spin up and shut down. By keeping a persistent cluster alive, you save time and money for each additional job that you submit to your cluster.
Why is that? A standard cluster shuts down automatically once the job is complete, and subsequent jobs are similarly spun up, shut down, and charged on a separate cluster. With persistent clusters, however, the cluster will instantly be available for the next job submission and you don’t waste time shutting down and spinning up another cluster between jobs. For customers that launch a multitude of similar jobs, the result is significant time and cost savings.
Persistent clusters are also useful for a testing environment: to test that new script that you set up, or to debug issues with your simulation. Normally, an error that causes the software to exit will mark the job as complete, resulting in a premature shutdown of the cluster. However, with persistent clusters, you can continue submitting jobs onto the same cluster, modifying and iterating your code as you go.
A beneficial byproduct of persistent clusters is the ability to queue jobs. By submitting multiple jobs to the same cluster, users are able to “queue” them. The Rescale backend will run the jobs in the order that they were submitted as the cluster frees up. This may be a useful workflow for some of our customers.
A few pro-tips
1. Attach all your software first: Since the attached software is installed onto the VM when the cluster is initialized, users are not able to change the software configuration of a persistent cluster once it has been spun up. If you need to run more software, we recommend that you attach all the various software you need when first launching the cluster. Because the software only checks out licenses when the program runs, you will only be charged for software runtime, and not when the cluster is idling.
2. Start your cluster with the max core count needed: For now, we recommend that you launch the persistent cluster with the maximum number of cores you will need. If you want core count to vary from job to job, you can use command line flags (refer to Software Examples/FAQs section on the Resources Page) to limit the number of cores used for a particular job. However, note that users are charged for the entire cluster, regardless of whether the cores are utilized or not. Ability for real-time expansion and shrinking of clusters is on the roadmap. Do watch for future updates on the Rescale platform!
3. Don’t forget to shut down your cluster: Lastly, don’t forget to manually terminate the persistent cluster once you are done. You will be billed for usage until the cluster shuts down, even if the cluster is idle.
This article was written by Rahul Verghese.