Why high utilization doesn’t work for TSA and why it doesn’t work for HPC

© CC-BY-SA 2.0 2010 “Security at Denver International Airport” by oddharmonic via Flickr

Executive Summary:

  • In the case of big compute power, the purchase of large capital assets can create an organizational misalignment of incentives that places the needs of the end user last
  • Achieving high utilization rates of on-premise computing is a pyrrhic victory; it creates winners and losers and puts a governor on the pace of innovation
  • Information technology leaders with high utilization rates of on-premise compute should establish a cloud bypass for work to encourage a culture of agility, innovation, and “outside-the-stacks” thinking
  • When calculating total cost of ownership (TCO) of on-premise computing, user experience, workflow cycle times, responsiveness to new requirements, and other factors must be considered

Airport Travelers and HPC Users Have the Same Complaints
While standing in line in airport security at LAX recently, travelers behind me began engaging in a familiar sport: wondering if there were better alternatives to the US airport security screening process. As some lines proved to be faster than others, the complaints ranged from line choice to the efficacy of the entire system. Having recently returned from several meetings with future users of cloud computing, the complaints were similar: wait times, capacity limitations, and perceived unfairness in the system.

High utilization rates of on-premise computing assets are often cited in a cost-based defense of maintaining a pure on-premise strategy for big compute (HPC) workloads. The argument goes like this: the higher the utilization rate of an on-premise system, the more costly it is to lift and shift those workloads to the cloud. This frequently is a result of a total cost of ownership (TCO) study comparing an incomplete set of variables:

The above TCO comparison is woefully incomplete, but the missing pieces aside, even more visibly apparent is the key assumption underlying cloud computing: 100% utilization. The use of the assumption is understandable. Capital investments require financial justification and, depending on their scale, often detailed NPV analysis. Unfortunately, it is difficult to compare a fixed and capitalized expenditure to a variable and operational expenditure for these analyses. Forecasting opex requires detailed logging of compute usage and assumptions that past behavior can predict future requirements. For simplicity, it is easier to simply assume 100% utilization of cloud computing and move on. However, the organizational implications for 100% utilization of cloud computing versus 100% utilization of on-premise assets are very different. 100% utilization of a constrained on-premise compute asset implies queue times, a constant reevaluation of internal resource priorities, and slow reaction times to new requirements. 100% utilization of a certain portion of the immense cloud has none of these disadvantages.

This brings us back to our TSA story.

A TSA Nightmare
Imagine one day, the TSA agents at a particular airport received a peculiar directive: the taxpayers are extremely sensitive to the purchase of capitalized assets; and, as a result, it is now an agency priority to achieve 95% or greater capacity utilization of the newly installed scanners. What would be the consequences?

First, 95% utilization would require passenger processing through the line at all hours of the night, regardless of the fact that airplanes were only leaving and arriving between 6AM and midnight. Second, every 19 out of 20 passengers that arrived at the security line should expect a queue, regardless of the time they arrived. Third, during peak travel periods, wait times would increase exponentially. Fourth, in the long run, to achieve the targets, the TSA agents would be incentivized to shut down additional security lines and laterally transfer “excess” scanners to other airports. Somewhere in the aftermath is the passenger whose needs have been subordinated to the quest for high utilization rates. The psychology of the passenger changes, also. The passenger begins planning for long queue times, devoting otherwise productive time to gaming a system with limited predictability.

In the case of the purchase of a large, fixed-capacity compute system, the misalignment of incentives begin almost immediately after the purchase of the asset. Finance wants to optimize the return on the asset, putting pressure on Information Technology leaders to use the smallest possible asset at the highest levels of utilization for the longest amount of time. Meanwhile, hardware requirements continue to diverge and evolve outside the walls of the company, artificially constraining the company to decisions made years prior when business conditions were unlikely similar to present day. The very nature of a fixed asset creates winners and losers as workloads from some portions of the company are prioritized over others. Unlike airline travelers, however, engineers, researchers, and data scientists can be given options to bypass the system.

The cloud has inherent advantages relative to its on-premise counterpart. As a result, cloud big compute has earned its seat at the table in any organization that values agility, fast innovation cycles, and new approaches to problems. On-premise resources are inherently capacity-constrained and over time can place psychological governors to how employees think about finding solutions to problems. For example, an engineer may simply assume she has no other option and over-design a part rather than run a design study to understand sensitivity to key parameters. The cloud is not a panacea for all problems that need big compute. However, Information Technology leaders can do their part to encourage a culture of innovation by merely having a capable cloud strategy.

The cloud is more than TSA PreCheck, it is driving up on the tarmac and getting on the plane.

Learn more about the advantages of moving HPC to the cloud by downloading our free white paper: Motivations and IT Roadmap for Cloud HPC

This article was written by Matt McKee.

mattblog2

The above tweet in my newsfeed caught my attention because it succinctly echoed thoughts several IT leaders have recently shared with me. Recent research, like this study from Accenture, reinforce Tim’s observation:

  • 95% of respondents have a five-year cloud strategy already in place
  • Four of five executives reported that less than half of their business functions are currently operated in public cloud, but noted increasing intent on moving more of their operations to the cloud in the coming years
  • 89% of respondents agree that implementing cloud strategies is a competitive advantage which allows their companies to leverage innovation through agility
  • While half of respondents cite security as their biggest concern with the public model, more than 80% believe public cloud security is more robust and transparent than what they’re able to provide in-house

Frustrating the “how” decisions by IT leaders is the velocity in the expansion of the cloud market. The low barrier to entry is flooding the market with SaaS, IaaS, and PaaS technologies but the overwhelming amount of options are as likely to lead to paralysis as decision. Selecting an enterprise high-performance computing partner with the right strategy – the “how” – is critical. If you want more thoughts on the “why,” a link at the end of the article will share a perspective from one of our partners.

Here are five characteristics IT leaders should look for when selecting a cloud solution for high-performance computing:

The service has an enterprise strategy. This is seemingly obvious, but is a question worth asking when surveying a variety of available solutions. A service that has an enterprise strategy is a service that supports a diverse customer environment (variety of software vendors, software tools, workflows and/or hardware types) and replaces the burden of IT administration with IT management control. In our experience, enterprises commonly need:

1) Scalability
2) Flexibility
3) Compatibility with hybrid environments
4) Support for a diversity of workflows
5) Management tools
6) Integration strategies

Scalability, flexibility, and hybrid environments will be addressed below. The other three elements (workflows, management tools, and integration) are important, but largely a feature and function discussion that we’ll take on in another article.

The service is scalable for the enterprise. Product design, data research, and R&D cycles — and thus their corresponding infrastructure needs — are anything but smooth or predictable. Therefore, an enterprise solution needs to be able to rapidly scale from sixteen to thousands of cores to meet the needs of the enterprise. The engineering team that must undertake a massive redesign in the 11th hour should not be capped by limited capacity and queues. Delivering a solution to this problem is simply expensive without a service that offers wide-ranging access to multiple public clouds with on-demand pricing options.

The solution supports hybrid environments. As the Accenture study showed, many organizations are taking a “cloud first” approach. However, the diversity of workloads in the enterprise environment, varying size of data sets, and legacy systems and operations will likely keep on-premise infrastructure around for the short-term. CIOs may be “getting out of the data center business,” but as companies move along the cloud spectrum, there will be some companies that have a need for a mixed model. Thus, an enterprise solution allows organizations to manage both cloud and on-premise resources from a single platform. Not only does this assist the IT team with management, monitoring, and control, but also simplifies the experience for the end user.

The solution features the best of the cloud’s flexibility and agility. Much to the chagrin of IT leaders, hardware requirements for engineering, data scientists, and researchers are becoming increasingly more diverse. As hardware is increasingly optimized for different purposes and software developers are optimizing codes on platforms with divergent strategies, flexibility should be a concern of any IT leader. Enterprises need the agility to support the demands of an enterprise software tool suite – both today and tomorrow. One-size-fits-all solutions for HPC hardware will become obsolete in an enterprise with a sizable engineering staff (or even a small, but diverse staff). Enterprise flexibility and agility is delivered by a single platform/environment that brings different clouds, hardware, software, and pricing models together in one place. Alignment of cost and demand is the promise of the cloud. With a wide variety of hardware available, enterprises should be able to make decisions about computing that align with business drivers. What does this mean in practice? This means being able to take advantage of everything from low-cost public cloud spot markets to instantly-available, cutting-edge processors. This also means not requiring lock-in to pre-paid models that expire. Select a partner that can navigate the cloud environment on behalf of the enterprise’s evolving requirements.

The solution is secure. I included security because without it, invariably a reader would find this list woefully incomplete. However, I would contend that a vast majority of mature cloud HPC solutions can satisfy the requirements of 98% of companies. More often, security concerns are a red herring to disguise a cultural or organizational bias against a “XaaS” approach.

What about data transfer and large data sets? I often get this question, for good reason. There are several mitigation strategies for this issue. However, an entire article should be dedicated to this topic – we’ll follow-up on that in a later post.

To conclude, as the market for cloud services rapidly expands and most organizations are satisfied with the answers to “why” cloud, enterprise IT is faced with a multitude of decisions on the “how” cloud. For high-performance computing, this article should give managers some critical factors to consider when evaluating solutions.

And one last thing, a “why” cloud link, as promised: 6 Advantages of Cloud Computing

This article was written by Matt McKee.