congress

At Rescale, we are excited to work with government agencies in the area of high-performance computing applications. In recent years, the government has shifted its approach to IT infrastructure management in favor of cloud computing, indicating that a number of concerns about the cloud are no longer seen as issues.

According to the U.S. Federal Cloud Computing Strategy, the U.S. government instituted the Cloud First policy to accelerate the pace of cloud adoption, resulting in a projected spending of $118M on public cloud offerings in 2014. The Cloud First policy mandates that agencies take full advantage of cloud computing benefits to maximize capacity utilization, improve IT flexibility and responsiveness, and minimize cost.

Cloud deployments mitigate costs by sharing services and infrastructure, which, in turn, means that the government must also address the stringent compliance and security requirements (FISMA, FIPS and FedRAMP mandates) to avoid compromising national security. The Federal Risk and Authorization Management Program (FedRAMP) provides a standardized approach to security assessment, authorization, and continuous monitoring for government cloud products and services.

Cloud adoption is also helping government agencies improve operational flexibility. The U.S. Army, Air Force, Navy, Department of Justice, Department of Agriculture (USDA), and Department of Education, among others have been early cloud adopters, setting the trend and direction for others to follow. The USDA, for example, launched a broad initiative to upgrade and streamline the 21 separate e-mail systems it contracted from various providers. The USDA decided to aggregate demand in a single cloud provider and retire its related internal assets. Doing so effectively transformed the USDA email assets to email service, which vastly improved agility and scalability.

One of the latest examples of security-conscious cloud transformation is the CIA contracting with Amazon to provide a new cloud computing solution. Clearly, the government is satisfied that security concerns can be effectively managed in cloud installations, and that the efficiency and agility of cloud-based resources serve the government’s purposes well.

With respect to security, at Rescale we maintain the highest levels of security practices and are certified by multiple security certification programs. We understand that proprietary IP and mission-critical data must remain within full control of their owners. For this reason, we employ the same encryption technology used in banking for all of our data transmission, and encrypt data once more for storage in the cloud. You can learn more about Rescale’s security here. We look forward to a future where cloud computing is widely adopted by enterprises and governments alike.

To learn more about Rescale, please visit, www.rescale.com. To begin using Rescale for engineering and science simulations, please contact info@rescale.com.

This article was written by Rescale.

gce
Google has officially thrown its gauntlet into the IaaS cloud computing ring by opening up access to the Google Compute Engine (GCE) service to the general public. One of the differentiating features touted by Google is the performance of its networking infrastructure.

We decided to take the service for a quick spin to see what the interconnect performance was like within the context of the HPC application domain. In particular, we were interested in measuring the latency between two machines in an MPI cluster.

For our test, we spun up two instances, setup an OpenMPI cluster, and then ran the osu_latency benchmark from the OSU Micro-Benchmarks test suite to measure the amount of time it takes to send a 0-byte message between nodes in a ping-pong fashion. The numbers reported below are the one-way latency numbers averaged over 3 trials. A new pair of machines was launched for each trial.

Instance Type Trial #1 Trial #2 Trial #3 Average
n1-standard-1 183.12 172.57 169.90 175.20
n1-standard-2 192.27 202.51 196.20 196.99
n1-standard-4 169.97 170.96 177.03 172.65
n1-highcpu-2 176.34 210.81 192.04 193.06
n1-highcpu-4 205.00 176.11 159.95 180.35
n1-highmem-2 176.80 177.73 189.72 181.42
n1-highmem-4 173.78 175.94 185.85 178.52

*all latency numbers measured in microseconds

The reported latency numbers are roughly the same for all of the instance types we tested. The variance between tests is likely due to contention from other tenants on the machine. Benchmarking cloud compute instances is a notoriously tricky problem. In the future, we’ll look at running a more exhaustive test across more instances and over different time periods.

As a point of comparison, we see latencies between 70-90 microseconds when running the same test with Amazon EC2 instances. It is important to point out that this is not a true apples-to-apples comparison: Amazon offers special cluster compute instance types as well as placement groups. The latter allows for better bandwidth and reduced latencies between machines in the same group. The GCE latency numbers appear to be closer to what Edward Walker reported for non-cluster compute instances on EC2. It appears likely that Google is focusing on the more typical workload of hosting web services for now and will eventually turn their focus towards tuning their infrastructure for other domains such as HPC. At the moment, it seems like GCE is better suited for workloads that are more “embarrassingly parallel” in nature.

It should be noted that these types of micro benchmarks do not necessarily represent the performance that will be seen when running real-world applications.  We encourage users to perform macro-level, application-specific testing to get a true sense of the expected performance. There are several ways to mitigate latency penalties:

  • For certain classes of simulation problems, it may be possible to decompose models into separate pieces that can then be evaluated in parallel. A shift in thinking is required with the advent of the public cloud. Rather than having a single on-premise cluster, it is possible to launch many smaller clusters that can operate over the decomposed pieces at the same time.
  • Leveraging hybrid Open MP / MPI applications when possible. Reducing the amount of chattiness between cluster nodes is an excellent approach for avoiding latency costs altogether.

We look forward to seeing the continued arms race amongst the various cloud providers, and expect that HPC performance will continue to improve.  As an example, Microsoft has recently announced a new HPC offering for Azure that promises Infiniband connectivity between instances. As in most cases, competition between large cloud computing providers is very good for the end customer. At Rescale, we are excited about the opportunities to continue providing our customers with the best possible performance.

This article was written by Ryan Kaneshiro.