"Containment Domains: Programming and Execution Model Support for Resiliency," Mattan Erez, The University of Texas at Austin
Containment domains are a programming construct with weak transactional semantics specifically designed to enable applications to express resiliency concerns. The goal of containment domains is to enable resilient-by-default applications that can be progressively and systematically optimized and tuned to improve their performance and efficiency. The key to containment domains is abandoning the prevailing one-size-fits all approach to resiliency and embrace the diversity of application needs and resiliency mechanisms. The new capabilities include tuning and specializing error detection, state preservation and restoration, and recovery schemes. Containment domains are nested to take advantage of the machine hierarchy and to enable effective low-complexity uncoordinated localized recovery. They also provide the means of expressing algorithm-specific detection and recovery. These characteristics are critical to achieve the power- and resource-efficiency needed for extreme-scale computing while guaranteeing correct results. The programming model is accompanied by an execution model that provides interfaces to runtime services. The containment domains runtime consolidates various resiliency techniques and optimizations and provides the interfaces for co-tuning application and system level resiliency. The combination of programming and execution model is used to implement a default resiliency scheme which is then amenable to progressive and aggressive optimization.
This page is under construction.
Quad chart: File:ContainmentDomains QUAD 2013.pdf
For publications, posters, and other documents, please see http://lph.ece.utexas.edu/public/CDs.