Resilience Research Questions
The main question is where does resilience fit into the X-stack runtime abstract architecture, with guiding questions being:
1) What features of other levels of the stack (algorithm, programming model, compiler, runtime, and hardware) should resilience depend on?
- Carbin: All of them. Uncertainty will be a first class concern in future systems.
2) How can resilience schemes best exploit application, runtime, or programming model semantics?
- Carbin: Developers expose application-specific flexibility via programming model. Runtime has different capabilities. Resilience schemes coordinate.
3) What are the biggest missing pieces needed from the various layers to make resilience schemes succeed?
- Carbin: Coordinated understanding of uncertainty across stack (UQ?). We’ve explored uncertainty/approximation only in limited scopes.
4) What is the impact on resilience of the wide range of expected operating scenarios with respect to dynamically changing resources, application characteristics, and the wide range of possible error and failure rates?
- Not obvious. State-of-the art: explore a variety reasoning approaches and mechanisms. Open problem: balancing complexity and benefit
Please add your comments (with you name) after each question.