odometer-grey

CONVERGE™ from Convergent Science is one of the most popular computational fluid design(CFD) simulation programs in the field of engine design and simulation. Its parallel processing feature leverages MPI which effectively increase the job running speed in the multicore and many core environment. You can run CONVERGE™ jobs on demand on Rescale’s cloud simulation platform using all the available core types. In this blogpost, I’ll make a performance and cost comparison of both Nickel and Iron HPC core types. Hopefully this can serve as a core selection guide for running your own CONVERGE™ simulations on Rescale.

Test Environment

Nickel (HPC+) Iron (HPC InfiniBand)
Application CONVERGE™ 2.2 for Linux CONVERGE™ 2.2 for Windows
MPI Flavor hp-mpi for Linux Microsoft MPI 4.2
CoreType Nickel Iron
Compute 6.75 CU 6.75 CU
Memory (GB/core) 3.8 GB 3.8 GB
Storage (GB/core) 32 GB 32 GB
Network 10 Gb/s RDMA InfiniBand (40 Gb/s)
Price $0.15 (/core/hour) $0.30 (/core/hour)

The first two rows of table show the software environment we chose and the remaining rows indicate the hardware specifications. Although most of the hardware specifications are similar for Nickel and Iron, one noteworthy difference is the network. While the Nickel has only 10Gb/s, the Iron core type has 40Gb/s bandwidth, which is a significant advantage for jobs running across multiple nodes, and I believe this is the primary reason for the 50% price disparity between the two core types.

Benchmarking Job

The benchmark job we chose is provided by Convergent Science. It models the phenomenon of a curving shot on the soccer field, and is intended to show us what it takes to get the “bending” of the ball in mid-air (detailed description). The simulation models 0.1 seconds of the soccer ball’s movement using a time step of 0.001 seconds. The model initially consists of 81,576 nodes.

Small Cluster Performance-Cost Comparison

In the first round, we tested the cluster performance on 16, 32, and 64 cores respectively for both core types. The results are shown in the table below.

16 cores 32 cores 64 cores
Nickel (HPC+) Time(s) 2611.39 2086.38 1857.28
Price($/hour) 2.40 4.80 9.60
Iron (HPC InfiniBand) Time(s) 3671.327635 2709.023048 2020.854733
Price($/hour) 4.80 9.60 19.20

From the table, we see that a Nickel cluster with up to 64 cores has better performance than Iron, and is less expensive. So if you need to run a small job on a small cluster, Nickel is probably a better choice.

Mid-sized Cluster Performance-Cost Comparison

In the second round, we tested mid-sized cluster performance on 128 and 256 core clusters for both core types. And the results are shown in the table below.

128 cores 256 cores
Nickel (HPC+) Time(s) 2973.34 /
Price($/hour) 19.20 38.40
Iron (HPC InfiniBand) Time(s) 1434.00 1277.43
Price($/hour) 38.40 76.80

We can see that the runtime of the Nickel cluster drastically increased when the number of cores reached 128. For the 256 core cluster case, I terminated the job after I found it takes longer than the 128 core case. This is caused by both communication overhead and slow interconnection. On the other hand, the performance of the Iron cluster increases steadily with the number of cores involved. So Iron outperforms the Nickel cluster when running on more than 128 cores.

Conclusion

From the graph above, we can tell that for clusters less than 64 cores, Nickel is faster for CONVERGE™ jobs, while for a mid-sized cluster, which has more than 128 cores, Iron is a better choice. More importantly, running a job faster could potentially save you yet more on the on-demand license cost.

This article was written by Irwen Song.

use-cloud

For a CIO, the ultimate goal is creating a lean and agile IT structure that both meets the current and unanticipated future needs of the various internal teams–without creating a cost structure that is unsustainable or disconnected from justifiable activities. Properly allocating resources within the company without wasted expenditure or unmet requirements is the true challenge. When it comes to purchasing and maintaining an on-site HPC cluster, this goal is increasingly difficult to achieve. Typically, either demand is unsatisfied or costs are disproportionate to meet unexpected demand. A cloud solution can effectively address these seemingly opposing goals.

Reduce Capital Expenditures

As a CIO your job is not only to meet the needs of the various internal teams who want increased capability, but also to adhere to corporate initiatives that require IT to control  capital expenditures and conform to a budget. Your company wants more financial agility without needing to make a long term investment on items that are either underutilized or become obsolete before their end of life. An on-premise cluster makes this challenging.  Predicting the engineering needs over the life of a cluster is extremely difficult. Frequently, constraints on capital mean that the needs for engineering teams will not be met at some point. For larger companies, with a wide variety of use cases, maintaining a cluster that has the ability to accommodate all needs can be extremely challenging, especially over the life of the asset.  On-premise systems are not elastic enough to handle engineering organizations’ constantly changing demands. A cloud solution can eliminate these issues, while providing a pay-as-you-go model; thereby, consuming no capital. You will significantly reduce capital expenditure and procurement expenses by eliminating new HPC hardware purchasing, allowing an improved return on assets. For companies currently maintaining a cluster, by developing a hybrid HPC solution between the on-premise system and the cloud, you can leverage existing assets and burst to the cloud for overflow.

Shorten the Hardware Procurement Cycle

Even if you have the budget to invest in an onsite system, the procurement process takes a minimum of six months. With the dynamic nature of hardware this makes your system dated by the time it is fully implemented. Further, such a long procurement cycle will not be able to meet the dynamic timing of product development. If a six month delay in hardware procurement results in a six month delay in product launch, the result is a dramatic erosion in profits due to delayed time to market for that product.

In addition to rapidly shortening hardware release cycles, there is also the issue of a scarcity of knowledgeable procurement resources. With a cloud platform, you obtain instant procurement and with that you gain improved agility and a reduced risk for IT planning. In addition to diminished procurement costs, you also gain the ability to simplify cost reduction throughout business cycles.

Smarter Allocation of Hardware and Software Resources

Typically, HPC costs are charged to overhead IT and distributed to the organization in a uniform manner. Users are generally concentrated among specific groups; however, the ability to charge specific engineers or projects is often practically impossible. During a product’s development cycle, the scope of work outlined for high fidelity analysis can be dynamic, and often, utilizing an organization’s fixed computing resources can lead to a reduction in the scope of work. Doing so can leave your organization at a disadvantage amongst your competitors. Addressing varying demand with a fixed capacity will always result in times where you have underutilized resources or alternatively schedule delays due to excess demand.

Through a cloud HPC environment, you gain better control over the allocation of your resources and gain the ability to use your funds to meet priority needs for both hardware and software. This increased control allows a CIO to not only reduce expenses but also better determine who needs and can afford what resources. You can eliminate thankless IT traffic cop roles, and allow individuals and teams instant access to a variety of the newest hardware options, if they are willing to allocate the funds. You also produce faster product development cycles by giving your engineers access to newer hardware, thereby, gaining improved performance. By enhancing the responsiveness of your IT department to the changing needs of your engineering teams, and by better tailoring resources to specific groups,  the organization as a whole becomes more efficient.

Improved Security and Control over Data, Hardware and Software

The most common explanation for not adopting a cloud HPC environment is that a company is not comfortable using outside IT resources due to the sensitivity of proprietary data.  Security vulnerability is often higher in a closed system relying on a single layer of firewall protection compared to multi-layered security protections deployed in leading cloud HPC solutions with isolated networks, end-to-end data encryption, and data segmentation. Cloud HPC environments now have clear standards and policies, such as the SOC2 compliance standard, to allow enterprises the ability to easily identify compliant organizations.

Maintaining everything on internal systems is becoming increasingly difficult and expensive. Staying abreast of evolving security vulnerabilities becomes more challenging. A cloud solution allows for a software defined method of control and regulation by IT, rather than implementing policies that you expect and hope for employees to follow. By putting these programmatic controls in place, you reduce risk by narrowing the scope of access of any single individual while removing unnecessary barriers blocking collaboration between authorized users. For extended enterprises, these software defined controls and policies allow for better supplier management and limited collaboration with third parties. Leading cloud HPC platforms can enable the capability to share data seamlessly with suppliers without giving up access to internal networks.

Eliminate the Need to Recruit IT Resources

It is becoming increasingly difficult to recruit the most talented individuals, and the cost for these individuals is increasing at a faster rate than the general labor population. In addition, many companies are finding it is becoming challenging to provide a defined career path for these individuals where the IT organization may not be considered a core competence for the overall enterprise. Often this results in sub-par staffing and costly outsourcing of non-critical roles. A cloud system that does not require on-site maintenance and monitoring will greatly reduce personnel costs, allowing for funds to hire the best people for the limited number of positions needed. Through this narrowed focus, you get a clearer career path for these employees, and improved morale overall.

The Cloud is Rolling In

Despite the hesitation of some to move toward the cloud, the fact remains that organizations–both big and small–are all moving toward a cloud integrated HPC system.  Leading Fortune 500 CIOs are paving the way by demonstrating clear cost benefits and increased agility for their organizations.  For some, this is purely to handle peak workloads and overflow. Others are realizing the economic, efficiency, and productivity increases that come with downsizing or altogether eliminating on-site clusters. As a CIO working to meet demands from all levels of your organization, adopting a cloud HPC solution will allow you to begin to tackle these hurdles.

Interested in exploring what cloud HPC may be able to accomplish for your organization?

Please don’t hesitate to schedule a no-cost consultation with Rescale’s industry experts. You can contact me directly at sarah@rescale.com or +1.855.RESCALE (+1.855.737.2253) to schedule your consultation.

This article was written by Sarah Dietz.

drf
A quick Google search for “REST API versioning” turns up lots of discussion around how a version should be specified in the request. There is no shortage of passionate debate around RESTful principles and the pros and cons of whether embedding version numbers in urls, using custom request headers, or leveraging the existing accept header is the best way to go. Unfortunately, there isn’t any one correct answer that will satisfy everyone. Regardless of which approach you end up using, some camp will proclaim that you are doing it wrong. Respective of the approach that is used, once the version is parsed out of the request, there is another, larger, question around how you should manage your API code to support the different schema’s used across different versions. Surprisingly, there is little discussion out there around how to best accomplish this.

At Rescale, we have built our API with the Django Rest Framework. The upcoming 3.1 release will offer some API versioning support. The project lead has wisely decided to sidestep the versioning debate by providing a framework that allows API builders to select between different strategies for specifying the version in the request. However, there isn’t much official guidance around what the best practices are for dealing with this version in your code. Indeed, the documentation mostly just punts on the issue by stating “how you vary your behavior is up to you”.

One of Stripe’s engineers posted a nice high-level summary of how Stripe deals with backwards compatibility with their API. The author describes a transformation pipeline that requests and responses are passed through. One of the main appeals of this approach is that the core API logic is always dealing with the current version of the API and there are separate “compatibility” layers to deal with the different versions. Requests from the client pass through a “request compatibility” layer to transform them into the current schema before being handed off to the core API logic. Responses from the core API logic are passed through a “response compatibility” layer to downgrade the response into the schema format of the requested version.

In this post, I want to explore a potential approach for supporting this type of transformation pipeline with DRF. The natural place then to inject the transform logic in DRF is within the Serializer as it is involved in both validating the request data (Serializer.to_internal_value) as well as preparing response data to return from an APIView method (Serializer.to_representation).

The general idea is to create an ordered series of transformations that will be applied to request data to convert it into the current schema. Similarly, response data will be passed through the transformations in the reverse order to convert data in the current schema to the version requested by the client. This ends up looking very similar to the forwards and backwards methods on database migrations.

As a basic example of how a response transform would be used, the following is a simple serializer that returns back a mailing list name and list of subscriber emails:

The payload from an endpoint that uses this serializer might return JSON formatted as:

Some time later, we decide that this endpoint also needs to return the date that each subscriber signed up for the mailing list. This is going to be a breaking change for any client that is using the original version of the API as each element in the subscribers array is now going to be an object instead of a simple string:

The serializer needs to updated to return data in this new format. To support backwards compatibility with the original version of the API, it will also need to be modified to derive from a VersioningMixin class and specify the location of the Transform classes (more on this in a bit):

Whenever a new API version is introduced for this serializer, a new numbered Transform class needs to be added to the api.transforms.mailinglist module. Each Transform handles the the downgrade of version N to version N-1 by munging the response data dict:

The Transform class is the analogue of a schema migration and contains methods to transform request and response data. Each Transform class name needs to have a numerical value as a suffix. The VersioningMixin class uses this to identify the order that Transforms should be applied to the request or response data.

The VerisoningMixin class provides the Serializer.to_internal_value and Serializer.to_representation overrides that will look up the Transforms pointed to by the transform_base property on the serializer and apply them in order to convert requests into the current API version or downgrade responses from the current API version to the requested version. In the following code snippet, settings.API_VERSION refers to the latest, current API verison number and the request.version field is set to the requested API version from the client:

The main benefit of this approach is the  APIView (the classes that will generally implement the core API logic and use Serializers for request/response processing) only need to worry about the latest schema version. In addition, writing a Transform requires knowledge of the only current and previous version of the API. When creating version 10 of a particular response, there is just a single Transform between v10 and v9 that needs to be created. A request asking for v7, will be first transformed from v10 to v9 by the new Transform. The existing v9 to v8 and v8 to v7 Transforms will handle the rest.

We certainly do not believe that this is a panacea for all backwards compatibility issues that will crop up. There are certainly some performance issues to consider with having to constantly run requests and responses through a series of potentially expensive transformations. Further, in the same way that it is sometimes impossible to create backwards migrations for database schema changes, there are certainly more complex breaking API changes that are not easily resolvable by this approach. However, for basic API changes, this seems like it could be a nice way to isolate concerns and avoid embedding conditional goop inside the core API logic for versioning purposes.

This article was written by Ryan Kaneshiro.