single-responsibility

The SOLID principles of object-oriented design provide good guidelines for writing code that is easy to change, but for some of the principles, the motivation and value can be difficult to understand. Open/Closed is particularly vexing: software entities (classes, modules, functions, etc.) should be open for extension, but closed for modification What does it mean to be extended but not modified? Does this principle lead inherently to deep, convoluted class hierarchies? There has been some discussion and criticism of SOLID and the Open/Closed principle recently, so we thought we’d share some of the Rescale development team’s experiences coming to an understanding of Open/Closed and using it to write classes whose behavior is easy to change. We view Open/Closed as a complement to the Single Responsibility Principle since it encourages developers to write classes with focused responsibilities whose behavior is easy to change with minimal code modifications.

An explanation

First off, let’s explore some of our initial questions. What does it mean for code to be extended? What does it mean for it to be modified? Code is modified when we change the code itself–by adding conditionals to methods, pulling in extra arguments, switching behavior based on properties of arguments, and so on. This is often developers’ first thought on how to make existing code accomplish a new task–just add another conditional. Code is extended when we change its behavior without modifying the code itself– by injecting different collaborators. For our code to even be amenable to such extension, we have to be following many of the other SOLID principles–injecting different objects for each aspect of the task we wish to accomplish. That in turn requires that each aspect of the task be accomplished by a different object, which is the heart of the Single Responsibility Principle–objects should do only one thing. Open/Closed encourages developers to write code that distributes responsibilities and provides the benefits of making behavior easy to change without growing long methods and large classes.

An example

Let’s illustrate all this theory with an example. At Rescale, we recently refactored a bunch of code to use a REST API client. API clients are often written with different methods for each type of request callers wish to make, and that’s how ours started. We had an interface like:

Each of the method implementations looked like:

This was a pain to work with. Every time we wanted to get a new piece of data from the API we had to add a new, redundant method. This client wasn’t open for extension, and in order to change its behavior we had to modify the class. That stemmed from a muddying of responsibilities. Even though the method above looks simple, it actually has several different responsibilities and corresponding pieces of knowledge. It knows how to format paths for each request type. It knows how to make http requests to the API and knows how to parse json for each request. We decided to refactor this to give our API client just one responsibility–making requests. Instead, we would extend it by passing in Request objects that took over the other responsibilities. Now we have an API client interface with just one method:

Since each type of request is relatively static, we have static factory methods to create request objects.

And, thanks to Java generics, callers get back the correct type of object. We now have natural looking code like:

And retrieving new information from the API is easy. We don’t have to modify the API client, we just extend it by passing in a different request.

Summary

There’s a lot of confusion about what Open/Closed implies, but it simply requires that objects have focused responsibilities and interact with other objects that are injected–that way developers are free to inject different collaborators to change behavior. Anyone who has worked in a large codebase has seen sections that were developed without Open/Closed in mind. This code is ostensibly simple and “just gets the job done”. The problem is that as the business desires inevitably change, the only way to change the job is to change the code. Methods accumulate conditionals and switch statements, classes accumulate fields, and the code becomes harder to understand at a glance. The poor design feeds on itself because the easiest way to accomplish a task is to simply add a few more lines to a method. If developers keep Open/Closed in mind, then they will be able to spot opportunities to create objects with extensible behavior and keep development costs from growing out of control.

This article was written by Alex Kudlick.

blog

Looking forward to 2015, we have some exciting new updates in the works here at Rescale.  Primarily based on customer feedback and alignment with our long term roadmap, you can expect some major updates in the following areas:

New Hardware & Lower Prices

We already have a great selection of hardware configurations at various competitive price points.  That said, one of the big benefits of being a Rescale customer is that you can take advantage of the new hardware technologies and reduced pricing as older configurations depreciate.  Very soon, we will be launching the newest Intel Haswell processors on Rescale which should provide up to 20% performance improvements for a typical simulation.  As we have consistently done in the past, we will continue to reduce prices as older hardware depreciates and pass the cost savings along to our customers.  Price sensitive users will also benefit from increased capacity of our Low Priority pool.

Expanded Network of Data Centers

As our international customers know well, we already have data centers in Asia and Europe.  With continued growth in these new regions, so does the requirement for certain companies to have their data reside locally in their specific region.  We are continuing to partner and build out capacity in new data centers to complement the existing 20+ locations we already have available today.

Enhanced Functionality for Developers

Just in the past few days, we’ve opened up our API and CLI to the public as the Rescale toolbelt for developers.  These exciting new tools make it possible for any engineer or developer to programmatically access the Rescale platform.  Now anyone can deeply integrate Rescale functionality into their internal systems.

Improved Visualization for Pre- and Post-Processing

New visualization, pre-processing, and post-processing tools are critical to our customers productivity, allowing for quick manipulation and investigation of models without the need to transfer the files away from Rescale’s platform.  Soon we will be launching full remote visualization capability for all users, allowing for easy pre- and post-processing without the need to transfer files locally to customers’ desktop or laptop.

Software and Scheduler Integrations

Deeper integration with the software packages allows our customers to more easily execute simulations directly from the GUIs of our partners and integrate the workflow capabilities seamlessly into the Rescale interface.  Hybrid cloud environments will be able to be deployed turnkey with the new native scheduler integrations we will be releasing in 2015.

Enhanced Enterprise Administration

Administrative functionality for managing Rescale within the enterprise is a big focus for us in 2015.  Customers can expect expanded administrative functionality, including, budgeting tools, group and user management, administrative monitoring dashboards, and much more.

If you’re interested in becoming an early adopter for any of the updates mentioned above, please don’t hesitate to contact us directly at support@rescale.com and we will make sure you are one of the first to be able to give it a try!

This article was written by Joris Poort.

owningvsrenting

The cost of buying HPC equipment is around 1/3 of the true cost of owning an HPC.  Yes, that is true and everyone needs to understand that before they make a cost comparison to Rescale pricing.

When I speak to customers inevitably there is a comparison to the cost of an internal HPC to Rescale.  It is common for companies to misunderstand their internal costs, and the comparison made to cloud pricing is flawed.  Several cost factors are ignored, not understood or assumed to be fixed (i.e they would incur the costs regardless). Also, even if the costs are correctly tabulated, an “apples to oranges” comparison typically results.

So what are the estimated costs for an HPC in the 200-1000 core range?  Below 200 cores, shortcuts can be made in terms of support labor and facilities, so the cost calculus is a bit different.  Below is an estimate based on conversations with clients. The numbers may be +/-10%, but it illustrates the general cost profile.

On-premise Cost ($/core-hour, HPC+ like system)
Equipment Only $0.04
Equipment+Electricity $0.06
Equipment+Electricity+Labor $0.09
Equipment+Electricity+Labor+Facilities $0.12

The above cost assumes roughly 100% utilization, approximately 10Kwh, and a 40% discount on hardware.  That is probably a very liberal estimate on true costs.  If your utilization is actually 80% of capacity (probably more realistic), then your actual cost is $0.15/core-hour versus $0.12/core-hour, a big difference.  It quickly becomes apparent that supporting an HPC for your peak usage becomes cost prohibitive because low utilization rapidly increases your per core-hour cost. Also, the true cost of electricity plays a non-trivial role in the true cost since electricity costs are roughly 15% of total costs.

Labor is typically underestimated. Someone needs to support the HPC resources. Since most HPCs are Linux-based and IT departments tend to be more Windows-based, supporting an HPC can be challenging.  In my experience, the cost of labor to support these systems is usually one of the primary expenses of procuring and supporting these HPC systems. I was previously at a major consulting company that only had one person who could support Linux: the Director of IT.  Their support costs were very high due to the high labor expense used to support the system. Supporting the system includes when resources fail, employee support, software and system updates, and speciality labor costs and can quickly become a significant expense.

It is actually common for companies to quote $0.04/core-hour as their internal cost.  The reality is that that really only encompasses the cost of the equipment.  The equipment expense is really only about 1/3 of the true cost. At a minimum, the cost of labor and electricity needs to be added. What cost is added for facilities is a bit more debatable.  One question is whether you are building new facilities as part of an expansion. In many cases, the answer to this question is no, however, there is also opportunity cost. If you did not have an HPC, how would you repurpose the floor space?  In general, you should add facilities cost. Whether it is $.03/core-hour, and amount higher, or lower depends on your particular situation.

The next error that is made is the direct comparison to Rescale pricing.  First, if we assume your cost is $0.15/core-hour (fully burdened cost and 80% utilization), then we need to establish what Rescale costs to compare.  Many people compare internal cost to on-demand costs, which is spot pricing (pricing for hardware based on a specific time and availablility) for high priority runs.  The on-demand cost is $0.25/core-hour for HPC+. This is a pure apples to oranges comparison.  It is like comparing a daily rate for a rental car to a daily rate for a car you purchase. It is simply an inaccurate comparison.

The correct comparison is to evaluate internal costs to three-year pre-paid cloud cost.  A pre-paid plan buys a certain level of capacity (24/7) of a given hardware type for a set period.  Since most purchased HPCs are used for at least three years, it’s best to compare purchased costs to a three-year pre-paid cloud plan. For HPC+, that is $0.05/core-hour; a fantastic savings compared to internal costs.

The other way to make a comparison is to on-demand pricing.  In that case, you would change your internal costs by a utilization factor to cover peak demand.  Let’s say that in order to support peak demand, your utilization rate would fall to 40% (not a far-fetched number), then your internal cost to support peak usage is $0.38/core-hour.  On-demand pricing for Rescale’s HPC+ core type is $0.25/core-hour, again, a fantastic bargain.

Ok to summarize, here is a basic guide:

  1.     Determine your true internal cost. You must include equipment, labor, and electricity.  You should also include facilities.  Using a number of $0.15/core-hour for an HPC+ like system is not a bad estimate for full burdened costs and 100% utilization.
  2.     Estimate your utilization. I would say 80% is probably not a bad estimate.  Adjust your internal cost by your estimated utilization (simple division).
  3.     Compare your internal costs to Rescale pre-paid plans. That is a direct apples-to-apples comparison.
  4.     If you want to estimate what it would cost you to support your peaks, and compare to Rescale’s on-demand pricing, you need to factor down your utilization rate.  Probably something in the 40-50% range is more realistic.

Everyone’s situation is a bit different, and the above numbers were generated from past experience. That said, we at Rescale would be happy to work with any organization on a true total-cost-of-ownership (TCO) study of your current HPC and perform an “apples-to-apples” comparison to the relevant Rescale pricing plan.

This article was written by Tony Spagnuolo.

blog-ebola
Lately, news from West Africa is almost always on Ebola. This is not a new problem, however, this historical Ebola epidemic continues to have a significant impact in the region. At the very least, you will see what looks like a tree with sparkling lights (see the videos below). In actuality, the following is an ab initio model of the disease spreading in a hypothetical network. This is not a blog about how Ebola will take over the world, but perhaps it will hopefully give you an insight as to why it will not.

Model

The limited locality and the number of cases reported makes Ebola a disease requiring estimates that likely do not reflect reality. With limited information available we take a direct approach by simulating the person-to-person and person-to-public interaction. The model consists of five stages.

The stages of Ebola are: not exposed, infected, contagious, undercare, and deceased or recovered. Everyone except patient(s) zero is initially not exposed to the virus. The original patient(s) progress through the stages and eventually become contagious. We simulate a case where 5 people randomly selected are infected. The incubation period is normally distributed with a mean of 19 days. Followed by 3 days of the contagious period and finally 10 days of medical care. These two periods are both normally distributed about their respective means.

There are two infection paths. The first is the relationships between those we know. The second is through the public space. A person can only become infected through both of these paths and when they interact with a person who is contagious. A person who is contagious eventually becomes too sick and goes under medical quarantine. At this point, the patient is isolated from society. We also assume that the number of medical staff who become infected is insignificant.

Beyond the simplifications described, there are no control measures to counter the spread of Ebola except the quarantine imposed on the sick. The parameters used are at best, estimates, and likely do not reflect reality. We set the probability of infection by interacting with a contagious person at 50% and the infection by interacting in public at 1%. The probability of death is 60%. Below is a video of this simulation with 100,000 people in a hypothetical network for 300 days. The stages are colored, with white representing a healthy person, yellow is infected, red is contagious, orange is quarantined, black represents death and blue represents recovery.

It is apparent from the video that the propagation of waves originates from the major cluster in the network.  We can see these manifested in the figure below. Without the video above, the oscillation in the figure below can be difficult to understand. This was a simple model, but it should be clear that quarantine is a crucial and an effective measure against the spread the virus. Below in Graph 1 is of the simulated model representing the cases that became infected.

Untitled1

Graph 1: Simulated model representing the Ebola cases that became infected.

Influenza vs. Ebola

Arguably, what should concern most of us is not Ebola but the seasonal flu. Several millions become severely ill of which a quarter to a half million people die from the Influenza virus annually. Unlike Ebola with a high fatality, the seasonal flu is significantly less fatal but more infectious.

We can simulate the spread of the flu using a simple SEIR model. It is often described by a series of equations. We can choose to solve this nonlinear system of ordinary differential equations (ODEs) and obtain a solution for the entire population sample. However, we would like to see the actual propagation of the wave of infection throughout the population. We will take a similar modeling approach as Ebola. Shown below in Figure 1, is the network representing 5,000 people and their relationships.

Untitled2

Figure 1: Network representing 5,000 people and their connected relationships.

If some have immunity, what is the impact on the population? We simulate the exact same case except 10% of the population have immunity. Figure 2 is the same network except with the cyan colored nodes representing those with immunity.

Untitled3

Figure 2: Network representing 5,000 people and their relationships. Cyan colored dots represent people with immunity to Influenza.

Graph 2 below, compares the two simulations. The solid lines are the number of cases for the first simulation and the dotted lines are the number of cases for the second simulation. The rate of infection is mitigated for the case where, initially, some people were immune. The financial cost to the population can be inferred by the area under the curve. It goes without saying, this inference also applies to Ebola and corresponds with the severe impact it can have on developing economies. 

Untitled4

Graph 2: A comparison of the two Influenza models. The dotted lines represent the model that simulated people with immunity to Influenza.

After an analytical look at both Ebola and Influenza, hopefully, it is a little clearer where you should spend your worrying energy.

This article was written by Hiraku Nakamura.