REDSPYComplex code bases with several layers of abstractions have abundant inefficiencies that affect the execution time. Value redundancy is a kind of inefficiency where the same values are repeatedly computed, stored, or retrieved over the course of execution. Not all redundancies can be easily detected or eliminated with compiler optimization passes due to the inherent limitations of the static analysis.
DR-BW: Identifying Bandwidth Contention in NUMA Architectures with Supervised LearningNon-Uniform Memory Access (NUMA) architectures are widely used in mainstream multi-socket computer systems to scale memory bandwidth. Without a NUMA-aware design, programs can suffer from significant performance degradation due to inter-socket bandwidth contention. However, identifying bandwidth contention is challenging. Existing methods measure bandwidth consumption. However, consumption alone is insufficient to quantify bandwidth contention. Furthermore, existing methods diagnose bandwidth for the entire program execution, but lack the ability to associate bandwidth performance to the source code and data structures involved. To address these challenges, we propose DR-BW, a new tool based on machine learning to identify bandwidth contention in NUMA architectures and provide optimization guidance. DR-BW first trains a set of micro benchmarks and extracts useful features to identify bandwidth contention via a supervised machine learning model. Our experiments show that DR-BW achieves more than 96% accuracy. Second, DR-BW associates memory accesses that incur bandwidth contention with data objects, which provides intuitive guidance for optimization. Third, we apply DR-BW to a number of real benchmarks. Our optimization based on the insights obtained from DR-BW yields up to a 6.5× speedup in modern NUMA architectures.
Redundant Loads: A Software Inefficiency IndicatorModern software packages have become increasingly complex with millions of lines of code and references to many external libraries. Redundant operations are a common performance limiter in these code bases. Missed compiler optimization opportunities, inappropriate data structure and algorithm choices, and developers' inattention to performance are some common reasons for the existence of redundant operations. Developers mainly depend on compilers to eliminate redundant operations. However, compilers' static analysis often misses optimization opportunities due to ambiguities and limited analysis scope; automatic optimizations to algorithmic and data structural problems are out of scope. We develop LoadSpy, a whole-program profiler to pinpoint redundant memory load operations, which are often a symptom of many redundant operations. The strength of LoadSpy exists in identifying and quantifying redundant load operations in programs and associating the redundancies with program execution contexts and scopes to focus developers' attention on problematic code. LoadSpy works on fully optimized binaries, adopts various optimization techniques to reduce its overhead, and provides a rich graphic user interface, which make it a complete developer tool. Applying LoadSpy showed that a large fraction of redundant loads is common in modern software packages despite highest levels of automatic compiler optimizations. Guided by LoadSpy, we optimize several well-known benchmarks and real-world applications, yielding significant speedups.
Watching for Software Inefficiencies with WitchInefficiencies abound in complex, layered software. A variety of inefficiencies show up as wasteful memory operations. Many existing tools instrument every load and store instruction to monitor memory, which significantly slows execution and consumes enormously extra memory. Our lightweight framework, Witch, samples consecutive accesses to the same memory location by exploiting two ubiquitous hardware features: the performance monitoring units (PMU) and debug registers. Witch performs no instrumentation. Hence, witchcraft---tools built atop Witch---can detect a variety of software inefficiencies while introducing negligible slowdown and insignificant memory consumption and yet maintaining accuracy comparable to exhaustive instrumentation tools. Witch allowed us to scale our analysis to a large number of code bases. Guided by witchcraft, we detected several performance problems in important code bases; eliminating these inefficiencies resulted in significant speedups.
ImpactMiner: a tool for change impact analysisDevelopers are often faced with a natural language change request (such as a bug report) and tasked with identifying all code elements that must be modified in order to fulfill the request (e.g., fix a bug or implement a new feature). In order to accomplish this task, developers frequently and routinely perform change impact analysis. This formal demonstration paper presents ImpactMiner, a tool that implements an integrated approach to software change impact analysis. The proposed approach estimates an impact set using an adaptive combination of static textual analysis, dynamic execution tracing, and mining software repositories techniques. ImpactMiner is available from our online appendix http://www.cs.wm.edu/semeru/ImpactMiner/