The Real Cost of High Performance Computing


The cost of buying HPC equipment is around 1/3 of the true cost of owning an HPC.  Yes, that is true and everyone needs to understand that before they make a cost comparison to Rescale pricing.

When I speak to customers inevitably there is a comparison to the cost of an internal HPC to Rescale.  It is common for companies to misunderstand their internal costs, and the comparison made to cloud pricing is flawed.  Several cost factors are ignored, not understood or assumed to be fixed (i.e they would incur the costs regardless). Also, even if the costs are correctly tabulated, an “apples to oranges” comparison typically results.

So what are the estimated costs for an HPC in the 200-1000 core range?  Below 200 cores, shortcuts can be made in terms of support labor and facilities, so the cost calculus is a bit different.  Below is an estimate based on conversations with clients. The numbers may be +/-10%, but it illustrates the general cost profile.

On-premise Cost ($/core-hour, HPC+ like system)
Equipment Only $0.04
Equipment+Electricity $0.06
Equipment+Electricity+Labor $0.09
Equipment+Electricity+Labor+Facilities $0.12

The above cost assumes roughly 100% utilization, approximately 10Kwh, and a 40% discount on hardware.  That is probably a very liberal estimate on true costs.  If your utilization is actually 80% of capacity (probably more realistic), then your actual cost is $0.15/core-hour versus $0.12/core-hour, a big difference.  It quickly becomes apparent that supporting an HPC for your peak usage becomes cost prohibitive because low utilization rapidly increases your per core-hour cost. Also, the true cost of electricity plays a non-trivial role in the true cost since electricity costs are roughly 15% of total costs.

Labor is typically underestimated. Someone needs to support the HPC resources. Since most HPCs are Linux-based and IT departments tend to be more Windows-based, supporting an HPC can be challenging.  In my experience, the cost of labor to support these systems is usually one of the primary expenses of procuring and supporting these HPC systems. I was previously at a major consulting company that only had one person who could support Linux: the Director of IT.  Their support costs were very high due to the high labor expense used to support the system. Supporting the system includes when resources fail, employee support, software and system updates, and speciality labor costs and can quickly become a significant expense.

It is actually common for companies to quote $0.04/core-hour as their internal cost.  The reality is that that really only encompasses the cost of the equipment.  The equipment expense is really only about 1/3 of the true cost. At a minimum, the cost of labor and electricity needs to be added. What cost is added for facilities is a bit more debatable.  One question is whether you are building new facilities as part of an expansion. In many cases, the answer to this question is no, however, there is also opportunity cost. If you did not have an HPC, how would you repurpose the floor space?  In general, you should add facilities cost. Whether it is $.03/core-hour, and amount higher, or lower depends on your particular situation.

The next error that is made is the direct comparison to Rescale pricing.  First, if we assume your cost is $0.15/core-hour (fully burdened cost and 80% utilization), then we need to establish what Rescale costs to compare.  Many people compare internal cost to on-demand costs, which is spot pricing (pricing for hardware based on a specific time and availablility) for high priority runs.  The on-demand cost is $0.25/core-hour for HPC+. This is a pure apples to oranges comparison.  It is like comparing a daily rate for a rental car to a daily rate for a car you purchase. It is simply an inaccurate comparison.

The correct comparison is to evaluate internal costs to three-year pre-paid cloud cost.  A pre-paid plan buys a certain level of capacity (24/7) of a given hardware type for a set period.  Since most purchased HPCs are used for at least three years, it’s best to compare purchased costs to a three-year pre-paid cloud plan. For HPC+, that is $0.05/core-hour; a fantastic savings compared to internal costs.

The other way to make a comparison is to on-demand pricing.  In that case, you would change your internal costs by a utilization factor to cover peak demand.  Let’s say that in order to support peak demand, your utilization rate would fall to 40% (not a far-fetched number), then your internal cost to support peak usage is $0.38/core-hour.  On-demand pricing for Rescale’s HPC+ core type is $0.25/core-hour, again, a fantastic bargain.

Ok to summarize, here is a basic guide:

  1.     Determine your true internal cost. You must include equipment, labor, and electricity.  You should also include facilities.  Using a number of $0.15/core-hour for an HPC+ like system is not a bad estimate for full burdened costs and 100% utilization.
  2.     Estimate your utilization. I would say 80% is probably not a bad estimate.  Adjust your internal cost by your estimated utilization (simple division).
  3.     Compare your internal costs to Rescale pre-paid plans. That is a direct apples-to-apples comparison.
  4.     If you want to estimate what it would cost you to support your peaks, and compare to Rescale’s on-demand pricing, you need to factor down your utilization rate.  Probably something in the 40-50% range is more realistic.

Everyone’s situation is a bit different, and the above numbers were generated from past experience. That said, we at Rescale would be happy to work with any organization on a true total-cost-of-ownership (TCO) study of your current HPC and perform an “apples-to-apples” comparison to the relevant Rescale pricing plan.

This article was written by Tony Spagnuolo.