Building resilient systems inside the mesh: abstraction and automation of Virtual Service generation

Feb-26 21:20 UTC

Language: English


Istio’s Virtual Service API provides a language agnostic way of implementing graceful retries on failures until a timeout budget is exhausted. Precise timeouts and retries per endpoint result in better performance. Having hundreds of gRPC services means there will be as many YAML files to be configured, tested and managed, however. I will explain how we built a scalable way of managing retries and timeouts across the service mesh per service per RPC. Achieved by developing an API with annotations interface for specifying the retry and timeouts parameters in the proto files where the gRPC services are defined. Our build system, Please, uses custom rules to automatically generate and validate Virtual Services during build time. Thus, solving the problem of dealing with a large number of YAML configurations. Furthermore, this approach empowers Service Owners across the organisation to easily define SLOs without the need to understand the Virtual Service API and functionality in detail.