With the proliferation of data driven applications over last few years, many applications have come to rely on advanced analyses of huge data sets. While MapReduce based frameworks, such as Hadoop, make it easier to scale your algorithms to big data sets, their restrictive programming framework makes it challenging to fit your existing computations to MapReduce’s model.

Array-based languages such as R, provide a much more natural and expressive framework in which to write your statistical analyses. Since it was introduced in 1996, R has gone on to become one the languages of choice for statisticians and data scientists worldwide. While the package boasts a thriving community and an interactive ecosystem, it also has significant limitations. R is single threaded and does not scale well with larger data sets. This disadvantage renders effective statistical packages, written by statisticians, useless for non-trivial datasets.

If your algorithm is very data intensive, then Hadoop proves to be a better solution. However, with some clever refactoring of your existing R code, it is possible to eke out better performance from R. With that being said, I decided to test out the MPI wraparound for R, Rmpi.


PageRank is the popular link analysis algorithm, which kick-started the Google powerhouse. The link analysis algorithm represents the likelihood of any person randomly surfing the Internet and arriving at a particular webpage. The PageRank computations require several passes before converging. When one node links into another node it contributes a part of its PageRank score to the score of the overall PageRank of that node. If a node with a higher PageRank links into a second node, then the second node automatically receives a higher PageRank. The update function of the algorithm is:

PR(A) = (1-d) + d (PR(T1)/C(T1) + … + PR(Tn)/C(Tn))

PR(A) – PageRank of node A

d – dampening factor

PR(Tn)/C(Tn) – portion of Tn’s score contributed to PR(A) where T1… Tn are the nodes that link into A

The most naive version of PageRank assumes all nodes have the same rank initially, and that each node distributes equal scores to its neighbors until convergence. This algorithm converges when the difference between successive updates to the PageRank vector is less than some very small value.

Strategies for parallelization

There are two main data structures in this calculation. The adjacency matrix represents the graph, and the PageRank vector contains scores for all of the nodes. While each thread will function on only its portion of rows of the adjacency matrix, the entire PageRank vector needs to be made available to the slave threads after each iteration. This ensures that the updated PageRanks will be used for the next iteration. I used MPI operation scatterv to distribute the adjacency matrix to the slave threads, and the operation allgatherv was used to gather the results of the partial PageRank vector after each iteration.

Experiments on the Rescale platform

I used the Stanford Network Analysis Project data sets for my experiments, in particular the High Energy Physics Citation Network data set. The graph contains 34,546 nodes and 421,578 edges. The runtime for the vanilla sequential version was 653.89 seconds.

The runtime graph follows an exponential decaying distribution with the gains in the runtime petering out between 12 (86.733 seconds) and 16 (68.38 seconds) threads.

While it is still challenging to scale to very large data sets, R can be scaled for moderately sized data sets using frameworks, such as Rmpi, as demonstrated on the Rescale platform.

You can import this simulation to your Rescale account here. If you do not yet have a Rescale account, please go to to sign up for a free account today.

To learn more about Rescale, please visit, To begin using Rescale for engineering and science simulations, please contact

This article was written by Rescale.


San Francisco, CA – Rescale is pleased to announce they have recently joined MSC Software’s Technology Partner Program. Rescale has certified the compatibility of three key MSC Software tools with Rescale’s cloud engineering platform: the multi-body dynamics simulation tool, Adams; the advanced nonlinear & multi-physics package, Marc, and structural & multidiscipline solver, MSC Nastran.

“The addition of Rescale to our Technology Partner Program provides our users, who need more computing resources, an option to leverage their MSC investment with Rescale’s cloud engineering platform and unique web-based interface,” said John Janevic, Vice President of Strategic Operations.

Rescale provides a comprehensive suite of features – on-demand access to high performance computing, full integration with simulation tools such as MSC Software packages, and an intuitive user interface – all delivered through any web browser. In addition, Rescale also adheres to the highest industry standards for security at every level of the Rescale experience; engineers can transfer, manage, and store data with the utmost confidence. Interested customer should approach Rescale directly for more information on running projects with MSC Software tools in the cloud.

Rescale CEO, Joris Poort, explained, “MSC Software offers proven, industry-leading solvers, and we are pleased to offer their tools on our platform.  As an MSC Technology Partner, we look forward to working closely with MSC to provide a flexible environment for their multidisciplinary software.”

For more information about Rescale and MSC Software:
MSC Software:

About Rescale:
Rescale provides a secure, pay-per-use, web-based platform that helps engineers and scientists build, compute, and analyze large simulations on demand. Incorporated in 2011, Rescale is located in San Francisco, CA and works with customers in the aerospace, automotive, oil & gas, and life sciences industries.

About MSC Software:
MSC Software is one of the ten original software companies and the worldwide leader in multidiscipline simulation. As a trusted partner, MSC Software helps companies improve quality, save time, and reduce costs associated with design and test of manufactured products. Their products accurately and reliably predict how products will behave in the real world to help engineers design more innovative products, — quickly and cost effectively.

MSC Software’s technology is used by leading manufactures for linear and nonlinear finite element analysis (FEA), acoustics, fluid-structure interaction (FSI), multi-physics, optimization, fatigue and durability, multi-body dynamics, and control systems simulation.

To learn more about Rescale, please visit, To begin using Rescale for engineering and science simulations, please contact

This article was written by Ilea Graedel.


Here at Rescale, we’ve had an excellent year in 2013 and are looking forward to another great year ahead. Below are some of the highlights on what we accomplished this past year, and quick preview of what’s to come in 2014!

We are always focused on making sure Rescale complies and adheres to the highest security standards in the industry. We have completed our SOC Type 2 compliance audits and have an ITAR platform that follows U.S. International Traffic in Arms Regulations (ITAR) requirements. Your company can continue to trust Rescale with your data and sensitive information knowing that we are devoted to making sure you’re protected. For more details on security, see our Security page.

In 2013, we announced both reduced pricing and new pricing tiers to give users more flexibility. Utilizing low priority and pre-paid pricing options, Rescale can help reduce your overall costs by over 50%. For further detail on our current pricing tiers, visit our Pricing page.

Based on user requests and feedback, we expanded our software library to over 30 simulation tools. As we work with software partners, we continue to see an increase in short-term licensing models. This gives you more flexibility to run your simulations on demand while easily scaling up your licenses when you need it most.  For an overview of our active tools, check out our Tools page.

At the end of 2013, we refreshed our resources with a new look and framework that allows us to continue to provide you a broad selection of tutorials and examples. We also expanded the depth of documentation and self-help resources that are designed to quickly get you up and running on the platform.  Several self-help videos released in 2013 include, job setup, sharing, and cloning.  Take a look at our new Resources page and continue to check back for frequent updates.

2014 Preview
As we look forward to 2014, there are many exciting projects in the works, including:

  • Platform Interface Update – We are refreshing our user interface to make it easier to begin running simulations, while expanding the possibilities for authoring and executing complex workflows.
  • Rescale API – The Rescale API will allow users to submit Rescale jobs programmatically and integrate our platform directly into your favorite simulation tools.
  • Updated Hardware – We continue to expand our hardware capabilities and performance. As we upgrade our hardware, you can expect significant improvements in simulation speeds. Based on your feedback, specific areas we are focussing on are interconnect and latency improvements.
  • Rescale On-Premise – A current project is an on-premise solution that will provide a Rescale interface for your on-premise hardware. This will allow you to run Rescale jobs utilizing your own hardware or a hybrid of your hardware and Rescale’s infrastructure.

To continue to serve you best, we are also actively hiring to expand our team.

As we embark upon an exciting new year, Rescale would like to thank you, our users, for your continued support and feedback to help make Rescale the leading cloud simulation platform. We look forward to continuing to serve you in 2014!

To learn more about Rescale, please visit, To begin using Rescale for engineering and science simulations, please contact

This article was written by Joris Poort.