Portable Extensible Toolkit for Scientific Computation (PETSc) is a suite of data structures and routines developed by Argonne National Laboratory for the scalable (parallel) solution of scientific applications modeled by partial differential equations. PETSc is one of the world’s most widely used parallel numerical software libraries for partial differential equations and sparse matrix computations.
Traditionally, when a scientist or algorithm developer finishes a new parallel algorithm in PETSc, they need to run it in a multi-core computer cluster to test its scalability and speedup. A cluster is typically a shared compute resource at a university, government, or business with significant administrative work to maintain the HPC resource. In addition, to create a run the scientist or developer has to prepare the environment, which can be difficult and time consuming, and anything unexpected during the run may cause failure and no output data generated.
With Rescale, testing the scalability of the PETSc algorithm becomes much simpler. A scientist or developer can specify the hardware type and number of cores and then run the job with an internet connection and web browser.
Algorithm to Test
The algorithm I’m going to test comes from the official tutorial of PETSc package. The code solves a linear system in parallel with KSP. I chose KSP because it is one of the most commonly used operations in the PETSc package. I made a slight change that outputs the timestamps of the operations of each PETSc function call for all processes to a log file for further analysis.
What KSP does is solve the linear equation AX=B for vector X – which has n elements – where A is an m x n sized matrix and B is a vector that has m elements.
Run Your Algorithm on Rescale
After you sign up on Rescale, you can create a new job that allows you to compile and run your PETSc algorithm.
On the Setup page, select PETSc from the Analysis Code section. In the Hardware Settings, select Core Type as HPC+ with 8 cores. The image below shows what your screen should look like.
On the Workflow page, upload the source code and makefile. Alternatively, you can also choose to upload the compiled executable binary file instead. In the Analysis command, input the command you want to execute.
make; mpirun -n 8 ./ksp_test -n 1024 -m 1024 -logfile ksp_test_1024_1024_8.log
The Workflow page should look like the following:
Click Submit in the lower right corner to execute your job to Rescale. After job submission, the Status page will allow you to monitor the job in real time.
When the job is done, you can view and download the output files and log files from the Results page.
KSP Test Results
In my scalability test, I chose 1, 2, 4, 8, 16, and 32 cores, with Rescale’s HPC+ core type. The size of matrix A was 1024 x 1024. Here are the results of the average process execution time, number of iterations, and time per iteration.
From the average process execution time, we can see that as the number of cores increased the time decreased – up to 16 cores – then it unexpectedly increased with 32 cores. This occurred because I needed to take the number of iterations into consideration.
Each time the algorithm started, the matrix and right hand vector were randomly generated. This meant that the iterations needed to converge to “norm of error” were different each time. In the following chart, we see the number of iterations for each run from our test.
The last chart shows the number of iterations per second, which is calculated by number of iterations / average process execution time. From the charts, we can see that parallel KSP scales well with the increasing number of cores.
If you have a Rescale account, you can click HERE to clone the KSP test job I mentioned and try running the simulation with different hardware settings, number of cores, and parameters. You can also click HERE to clone a “PETSc helloworld” sample job. If you do not have an account, please click HERE to sign up.
This article was written by Irwen Song.