The PIPER (Performance Insight for Programmers and Exascale Runtimes) project is developing new techniques for measuring, analyzing, attributing, and presenting performance data on exascale systems.
- Lawrence Livermore National Laboratory (team pages)
- Pacific Northwest National Laboratory (team pages)
- Rice University (team pages)
- University of Maryland (team pages)
- University of Utah (team pages)
- University of Wisconsin (team pages)
Exascale architectures and applications will be much more complex than today's systems, and to achieve high performance, radical changes will be required in high performance computing (HPC) applications and in the system software stack. In such a variable environment, performance tools are essential to enable users to optimize application and system code. Tools must provide online performance feedback to runtime systems to guide online adaptation, and they must output intuitive summaries and visualizations to help developers identify performance problems.
To provide this essential functionality for the extreme-scale software stack, we are developing new abstractions, techniques, and novel tools for data measurement, analysis, attribution, diagnostic feedback, and visualization. This enables Performance Insights for Programmers and Exascale Runtimes (PIPER). This project cuts across the entire software stack by collecting data in all system components through novel abstractions and integrated introspection, providing data attribution and analysis in a system-wide context and across programming models, enabling global correlations of performance data from independent sources in different domains, and delivering dynamic feedback to run-time systems and applications through auto-tuning as well as interactive visualizations.
PIPER consists of four thrust areas, organized into three phases. The following figure gives an overall view of our approach:
- Thrust 1: We design and implement a series of new scalable measurement techniques to pinpoint and quantify the main roadblocks on the way to exascale, including lack of parallelism, energy consumption, and load imbalance.
- Thrust 2: We combine a broad range of stack-wide metrics and measurements to gain a global picture of the application's execution running on top of the highly complex and possibly adaptive exascale system architecture.
- Thrust 3: We exploit the stack-wide correlated data and develop a suite of new feature-based analysis and visualization techniques that allow us to gain true insight into a code's behavior and relay this information back to the user in an intuitive fashion
- Thrust 4: We apply the analysis results to enable feedback into the system stack enabling autonomic optimization loops both for high- and low-level adaptations.
We are implementing our research in a set of modular components that can be deployed across various execution and programming models. Wherever possible, we leverage the extensive tool infrastructures available through prior work in our project team and integrate the results of our research back into these existing production-level tool sets.
Architecture and Interaction with the Software Stack
We implement our research in a set of modular components that can be deployed across various execution and programming models, covering both legacy (MPI+X) models and new models developed in other X-Stack2 projects. Wherever possible, we leverage the extensive tool infrastructures available through prior work in our project team and integrate the results of our research back into these existing production-level tool sets. Furthermore, we make the results of our research available to a broad audience and work with the larger tools community to achieve a wider adaption. The figure below provides an initial high-level sketch of our envisioned architecture that will provide the PIPER functionality:
We target measurements from the entire hardware/software stack. That is, we expect to use measurements from both the underlying system hardware as well as custom measurements derived from the application. The measurements themselves leverage a series of adaptive instrumentation techniques. As part of the measurement operation, we associate the measurements with a local call stack. The correlated local stack/performance measurement data feeds an analysis pipeline consisting of both node-local analysis methods, and distributed, wider-context analysis methods. The resulting data store supports a high-level query interface used by visualization and data analysis reporting tools informing the user. Such a system also enables dynamic tuning, and feedback-directed optimization.
- Dyninst 9.0 - Dynamic Instrumentation Library
- MRNet 5.0 - Tree-based Overlay Network
- OMPT/OMPD - Tool Interfaces for OpenMP
Bottleneck Detection / Analysis
- Boxfish - Visual performance analysis through data centric mappings
- Ravel - MPI trace visualization using logical timelines
- MemAxes/Mitos - Visualization of on-node memory traffic