"Compiler and run-time approaches to enable large scale irregular programs," Milind Kulkarni, Purdue University
While regular algorithms, characterized by operations on dense matrices and arrays, have long been the mainstay of scientific, high-performance computing, irregular algorithms, which feature unpredictable accesses to pointer-based data structures, are becoming increasingly common in high performance computing, arising in graph analysis, data mining and visualization, among other domains.
Unfortunately, the defining characteristics of irregular applications, their dynamic, unpredictable, data-dependent access patterns and data layouts, make achieving high performance on large scale systems difficult. Scaling applications to peta- and exa-scale requires carefully controlling communication and data movement and placement, an inherently difficult task when access patterns and data layouts are unpredictable! Most irregular applications that attain high performance must be painstakingly hand-written and hand-tuned, with few common principles or paradigms uniting various implementations and easing future development.
Despite the increasing importance of irregular applications, there is little programmer knowledge, and even less compiler ability, devoted to optimizing them. This project aims to solve these problems.
By allowing programmers to write irregular applications in high level forms, with at most a few annotations highlighting key structural properties, programmers can focus on developing their algorithms and methods. The compiler and run-time system can take on the tedious task of optimizing the application for execution at large scales, and can automatically provide efficient implementations. This will provide portability and ease maintenance for existing irregular applications, but, more importantly, open up whole new domains of computational science to large-scale, high-performance computing.
Goals and overview
The goal of this project is to develop techniques that can address the communication, locality and scalability challenges that emerge as irregular applications are deployed on increasingly larger-scale systems. We are developing a system that can (a) transform irregular applications to improve their locality and computational intensity and reduce communication needs; and (b) perform run-time scheduling and tuning to optimize the application for a specific input and system.
The project has three interlocking components:
- We are developing abstraction techniques to expose the underlying abstract structure of irregular computations. We use a combination of inference techniques to automatically glean the underlying structure of the computation and programming model abstractions (e.g., pragmas or domain-specific constructs) that allow this abstract structure to be directly exposed.
- We are designing transformation techniques that transform both the computational structure of an application to improve locality of reference and computational intensity and the layout of the application’s data structures to improve spatial locality and minimize communication. These transformations will target large-scale distributed memory systems and heterogeneous architectures that use both CPUs and GPUs.
- We are building run-time tuning and scheduling systems to dynamically tune an application for a given system, once the input makes the computational structure and data distribution clear. This backend will build on existing distributed graph libraries (e.g., the BGL) and runtime systems, promoting performance portability.
- Michael Goldfarb, Youngjoon Jo and Milind Kulkarni, General transformations for GPU execution of tree traversals. The International Conference for High Performance Computing, Networking, Storage and Analysis (Supercomputing). November, 2013.