Healing

2021 Jun 19

Concept of healing process in NSM

Healing

Description

Healing is a crucial functionality of the NSM, designed to ensure the high availability and resilience of workloads using NSM. This feature leverages automation and self-healing mechanisms to detect and address issues that may impact the reliability and performance of network services deployed on the environment.

Benefits

Enhanced Reliability: healing enhances the reliability of applications by automatically recovering from failures, reducing manual intervention, and minimizing downtime.
Improved User Experience: The feature ensures a consistent user experience by maintaining service availability, even when underlying infrastructure or application components experience issues.
Efficient Resource Utilization: Through proactive scaling and resource management, healing optimizes resource utilization, ensuring efficient NSM domain operation.
Simplified Operations: Healing simplifies operational tasks, reducing the burden on administrators and DevOps teams.

Concept

NSM healing is pretty simple:

  1. Healing is always starting on the Network Service Mesh Client.
  2. Network Service Client uses monitor connections api to monitor it’s NSMgr to keep the connection up to date.
  3. If the Network Service Client gets an event from the NSMgr with a changing state, deleting a connection, or closing a stream, then the client forces a new request. Wherein the existing connection closes only if connectivity is gone.

And that’s it!

What’s triggering the healing?

The healing process begins only according to these criteria:

  1. Control plane is down. Occurs if some application of NSM control plane has been restarted/crashed.
  2. Data plane is down. Occurs if forwarder restarted, or an error happens with backend framework like vpp crashes or network hardware down if SRI-OV was used.

Healing is an indispensable tool for organizations leveraging NSM, enabling them to build and maintain highly available and resilient applications in dynamic, cloud-native environments.

References