Performance Tuning Research

Beginning in 2023, I joined a research study regarding the performance analysis and tuning of algorithims.

I have been fortunate enough to join a professor at Santa Clara University in a field of research concerning randomized Dense and Sparse Sketching, a branch of Linear Algebra. This is an approach to reducing matrices and vectors, theoretically resulting in the faster computation of matrix multiplication. For my research specifically, I focused on RandBLAS' Dense and Sparse sketching (BLAS stands for Basic Linear Algebra Subprograms), and the subroutines they contained.

My approach to analyzing these subroutines was to utilize a performance counter software called PAPI (Pathway Activity Profiling). This tool, when put to use, is capable of returning system information recorded during the time of the sketches. This contains, but is not limited to, Cycles, Instructions, and Runtime. Notably of these three, Instructions would provide me essential information as to if the subroutine being measured was parallelized, a topic I will address later.

I began by first installing a plethora of libraries, all used to run either RandBLAS or its dependencies. I also had to create adaptable functions for PAPI that could be easily integrated into various files that I wished to measure inside of. This took quite some time (for some reasons out of my control at times), but once set up, I was able to begin my analysis. This process consisted of connecting to SCU's WaveHPC, a multi-core cluster of high-performance computing servers. On this platform I would run my simulations on a CPU Node, which is an adaptable node in which I can allocate the number of threads and cores I wish to utilize.

An automated Shell script would then alter the number of threads (a common parameter I changed), as well as matrix sizes. These simulations would then be ran and stored into a .csv file. I chose this format opposed to a database such as MySQL as I would later use .csv as an input for a Machine Learning interpretation software, which would visualize my data.

For the analysis of the data, I chose to use Dashing, which had the capability of reading and interpreting what was the most influential events (this is the name given to data obtained from the simulation) that would affect runtime. It would then display these results into easy-to-analyze formats, such as heatmaps or bar charts.

Data obtained from these simulations was generally inconclusive. However, some events did tend to have a recurring effect on runtime and overall performance, and thus will be kept in mind for future optimization efforts. If you are interested in learning more about those events specifically, or other details of my work, you may view my formal report for the last Quarter during which I worked on this research.

At the time of writing, I still have one more Quarter remaining, thus making my work incomplete.


If you'd like to view my research yourself, I have uploaded my scripts to a GitHub repository linked here.

To view my Spring Quarter 2024 Report, click here.