Keywords: Chaos Engineering, System Resilience, Failure Injection, Performance Metrics, Fault Tolerance
Abstract
Chaos Engineering is an essential practice for testing system resilience by intentionally injecting failures and analyzing the system’s response. This journal explores key parameters to measure in Chaos Engineering experiments, including system performance, availability, fault tolerance, and user experience metrics. By systematically monitoring these parameters, organizations can proactively identify weaknesses, enhance failover mechanisms, and optimize recovery strategies. The study also provides a structured experiment template to help teams document and analyze chaos experiments effectively. The ultimate goal is to build confidence in a system’s ability to withstand turbulent operational conditions and ensure reliable service delivery.