owningvsrenting

The cost of buying HPC equipment is around 1/3 of the true cost of owning an HPC.  Yes, that is true and everyone needs to understand that before they make a cost comparison to Rescale pricing.

When I speak to customers inevitably there is a comparison to the cost of an internal HPC to Rescale.  It is common for companies to misunderstand their internal costs, and the comparison made to cloud pricing is flawed.  Several cost factors are ignored, not understood or assumed to be fixed (i.e they would incur the costs regardless). Also, even if the costs are correctly tabulated, an “apples to oranges” comparison typically results.

So what are the estimated costs for an HPC in the 200-1000 core range?  Below 200 cores, shortcuts can be made in terms of support labor and facilities, so the cost calculus is a bit different.  Below is an estimate based on conversations with clients. The numbers may be +/-10%, but it illustrates the general cost profile.

On-premise Cost ($/core-hour, HPC+ like system)
Equipment Only $0.04
Equipment+Electricity $0.06
Equipment+Electricity+Labor $0.09
Equipment+Electricity+Labor+Facilities $0.12

The above cost assumes roughly 100% utilization, approximately 10Kwh, and a 40% discount on hardware.  That is probably a very liberal estimate on true costs.  If your utilization is actually 80% of capacity (probably more realistic), then your actual cost is $0.15/core-hour versus $0.12/core-hour, a big difference.  It quickly becomes apparent that supporting an HPC for your peak usage becomes cost prohibitive because low utilization rapidly increases your per core-hour cost. Also, the true cost of electricity plays a non-trivial role in the true cost since electricity costs are roughly 15% of total costs.

Labor is typically underestimated. Someone needs to support the HPC resources. Since most HPCs are Linux-based and IT departments tend to be more Windows-based, supporting an HPC can be challenging.  In my experience, the cost of labor to support these systems is usually one of the primary expenses of procuring and supporting these HPC systems. I was previously at a major consulting company that only had one person who could support Linux: the Director of IT.  Their support costs were very high due to the high labor expense used to support the system. Supporting the system includes when resources fail, employee support, software and system updates, and speciality labor costs and can quickly become a significant expense.

It is actually common for companies to quote $0.04/core-hour as their internal cost.  The reality is that that really only encompasses the cost of the equipment.  The equipment expense is really only about 1/3 of the true cost. At a minimum, the cost of labor and electricity needs to be added. What cost is added for facilities is a bit more debatable.  One question is whether you are building new facilities as part of an expansion. In many cases, the answer to this question is no, however, there is also opportunity cost. If you did not have an HPC, how would you repurpose the floor space?  In general, you should add facilities cost. Whether it is $.03/core-hour, and amount higher, or lower depends on your particular situation.

The next error that is made is the direct comparison to Rescale pricing.  First, if we assume your cost is $0.15/core-hour (fully burdened cost and 80% utilization), then we need to establish what Rescale costs to compare.  Many people compare internal cost to on-demand costs, which is spot pricing (pricing for hardware based on a specific time and availablility) for high priority runs.  The on-demand cost is $0.25/core-hour for HPC+. This is a pure apples to oranges comparison.  It is like comparing a daily rate for a rental car to a daily rate for a car you purchase. It is simply an inaccurate comparison.

The correct comparison is to evaluate internal costs to three-year pre-paid cloud cost.  A pre-paid plan buys a certain level of capacity (24/7) of a given hardware type for a set period.  Since most purchased HPCs are used for at least three years, it’s best to compare purchased costs to a three-year pre-paid cloud plan. For HPC+, that is $0.05/core-hour; a fantastic savings compared to internal costs.

The other way to make a comparison is to on-demand pricing.  In that case, you would change your internal costs by a utilization factor to cover peak demand.  Let’s say that in order to support peak demand, your utilization rate would fall to 40% (not a far-fetched number), then your internal cost to support peak usage is $0.38/core-hour.  On-demand pricing for Rescale’s HPC+ core type is $0.25/core-hour, again, a fantastic bargain.

Ok to summarize, here is a basic guide:

  1.     Determine your true internal cost. You must include equipment, labor, and electricity.  You should also include facilities.  Using a number of $0.15/core-hour for an HPC+ like system is not a bad estimate for full burdened costs and 100% utilization.
  2.     Estimate your utilization. I would say 80% is probably not a bad estimate.  Adjust your internal cost by your estimated utilization (simple division).
  3.     Compare your internal costs to Rescale pre-paid plans. That is a direct apples-to-apples comparison.
  4.     If you want to estimate what it would cost you to support your peaks, and compare to Rescale’s on-demand pricing, you need to factor down your utilization rate.  Probably something in the 40-50% range is more realistic.

Everyone’s situation is a bit different, and the above numbers were generated from past experience. That said, we at Rescale would be happy to work with any organization on a true total-cost-of-ownership (TCO) study of your current HPC and perform an “apples-to-apples” comparison to the relevant Rescale pricing plan.

This article was written by Tony Spagnuolo.

blog-ebola
Lately, news from West Africa is almost always on Ebola. This is not a new problem, however, this historical Ebola epidemic continues to have a significant impact in the region. At the very least, you will see what looks like a tree with sparkling lights (see the videos below). In actuality, the following is an ab initio model of the disease spreading in a hypothetical network. This is not a blog about how Ebola will take over the world, but perhaps it will hopefully give you an insight as to why it will not.

Model

The limited locality and the number of cases reported makes Ebola a disease requiring estimates that likely do not reflect reality. With limited information available we take a direct approach by simulating the person-to-person and person-to-public interaction. The model consists of five stages.

The stages of Ebola are: not exposed, infected, contagious, undercare, and deceased or recovered. Everyone except patient(s) zero is initially not exposed to the virus. The original patient(s) progress through the stages and eventually become contagious. We simulate a case where 5 people randomly selected are infected. The incubation period is normally distributed with a mean of 19 days. Followed by 3 days of the contagious period and finally 10 days of medical care. These two periods are both normally distributed about their respective means.

There are two infection paths. The first is the relationships between those we know. The second is through the public space. A person can only become infected through both of these paths and when they interact with a person who is contagious. A person who is contagious eventually becomes too sick and goes under medical quarantine. At this point, the patient is isolated from society. We also assume that the number of medical staff who become infected is insignificant.

Beyond the simplifications described, there are no control measures to counter the spread of Ebola except the quarantine imposed on the sick. The parameters used are at best, estimates, and likely do not reflect reality. We set the probability of infection by interacting with a contagious person at 50% and the infection by interacting in public at 1%. The probability of death is 60%. Below is a video of this simulation with 100,000 people in a hypothetical network for 300 days. The stages are colored, with white representing a healthy person, yellow is infected, red is contagious, orange is quarantined, black represents death and blue represents recovery.

It is apparent from the video that the propagation of waves originates from the major cluster in the network.  We can see these manifested in the figure below. Without the video above, the oscillation in the figure below can be difficult to understand. This was a simple model, but it should be clear that quarantine is a crucial and an effective measure against the spread the virus. Below in Graph 1 is of the simulated model representing the cases that became infected.

Untitled1

Graph 1: Simulated model representing the Ebola cases that became infected.

Influenza vs. Ebola

Arguably, what should concern most of us is not Ebola but the seasonal flu. Several millions become severely ill of which a quarter to a half million people die from the Influenza virus annually. Unlike Ebola with a high fatality, the seasonal flu is significantly less fatal but more infectious.

We can simulate the spread of the flu using a simple SEIR model. It is often described by a series of equations. We can choose to solve this nonlinear system of ordinary differential equations (ODEs) and obtain a solution for the entire population sample. However, we would like to see the actual propagation of the wave of infection throughout the population. We will take a similar modeling approach as Ebola. Shown below in Figure 1, is the network representing 5,000 people and their relationships.

Untitled2

Figure 1: Network representing 5,000 people and their connected relationships.

If some have immunity, what is the impact on the population? We simulate the exact same case except 10% of the population have immunity. Figure 2 is the same network except with the cyan colored nodes representing those with immunity.

Untitled3

Figure 2: Network representing 5,000 people and their relationships. Cyan colored dots represent people with immunity to Influenza.

Graph 2 below, compares the two simulations. The solid lines are the number of cases for the first simulation and the dotted lines are the number of cases for the second simulation. The rate of infection is mitigated for the case where, initially, some people were immune. The financial cost to the population can be inferred by the area under the curve. It goes without saying, this inference also applies to Ebola and corresponds with the severe impact it can have on developing economies. 

Untitled4

Graph 2: A comparison of the two Influenza models. The dotted lines represent the model that simulated people with immunity to Influenza.

After an analytical look at both Ebola and Influenza, hopefully, it is a little clearer where you should spend your worrying energy.

This article was written by Hiraku Nakamura.