Tips For Stateful Kubernetes Data Backup And Recovery

All systems need a plan for data backup and recovery. It doesn’t matter if your application’s running on the cloud, on-premises or in a refrigerator at the edge of a network—it will likely need to store and access data somewhere. But in our highly connected, distributed world, there’s always the chance for a ransomware attack or misconfiguration to put this persistent storage at risk. Therefore, every instance requires a plan to protect and restore data when something goes haywire.

Lately, we’ve noticed a rising interest in using stateful deployments with Kubernetes. The issue is that although microservices and containers are inherently distributed and ephemeral, the underlying data storage must remain intact. What’s more, organizations need a solid disaster recovery plan that fits the new cloud-native strata and has a decoupled life cycle. Also, to fully restore, you really need to recover not only the data but the metadata surrounding it, such as secrets, permissions, maps, certificates and networking information.

DevOps Connect:DevSecOps @ RSAC 2022

I recently met with Gaurav Rishi, the VP of product at Kasten by Veeam, to explore the state of data protection for Kubernetes stateful deployments. According to Rishi, DevOps teams working on cloud-native platforms need a new backup and recovery process that goes beyond legacy virtual machine (VM)-based modes. Below, we’ll consider the history of data management and outline some tips to bring data backup and recovery to your cloud-native platforms and applications.

Kubernetes Data Maturity

There are a few key reasons why data protection in stateful Kubernetes applications might be necessary. First is the increase of stateful as opposed to stateless applications, says Rishi. From a technology standpoint, many users first began their container and Kubernetes journey using a stateless approach. But although the intent was first to create building blocks that don’t have a state, the cloud-native community soon realized it was necessary for many business applications to be stateful.

He also notes increasingly dynamic application architectures. Today, “polyglot persistence” is everywhere—companies are no longer supporting just one relational database, but you often encounter an application that uses multiple databases under the covers. You might be running databases within Kubernetes clusters or working with managed databases or databases-as-a-service (DBaaS). Somewhat ironically, “We’re at a point where databases are the most popular workloads on containers,” says Rishi.

Now that use of PersistentVolumes (PVs) is commonplace, the second aspect is backing up storage volumes behind persistent volumes. To do so will require bridging the gap between development and operations teams to fit the constantly changing roles and scopes in IT. “We need to ensure we are keeping the business application intact from a business continuity perspective,” says Rishi.

Tips for Protecting Stateful K8s Data

To build more stable modern systems with high fault tolerance, DevOps must incorporate solid data backup and recovery tactics. So, what are some best practices DevOps operators should consider when backing up and recovering data? Rishi shares a few helpful tips:

Use a Kubernetes-native backup architecture. First, you really need a backup solution that’s purpose-built for cloud-native, says Rishi. Due to the nuances inherent in cloud-native architecture, VM-based data management doesn’t work well in this world. A Kubernetes-native backup solution should be aware of what’s running in the K8s environment and understand the dependencies within a microservices ecosystem.

Use automation within your data recovery plan. The number of applications is exponentially expanding, and the majority of applications will soon be cloud-native. Yet most analysts identify a widening talent gap in meeting the needs of this new paradigm. Thus, more automation will be necessary to detect new apps, back them up and automatically rehydrate data during the recovery process. “Your backup is only good as your recovery plan,” Rishi explains.

Recover in the right order: When recovering different modules, the order of operations matters. For example, if you are recovering a suite of microservices after a disaster, the logical components supporting the database and security should be restored first. Recreate the clusters, restore data in persistent volumes and rehydrate databases before reinstating microservices that render parts of the application. For some cloud-native backup solutions, the order of operations for rehydrating services can be independently defined in a YAML file.

Build a process that’s agnostic to different database flavors. There are many varying database types in use today, from SQL to NoSQL, PostgreSQL, MongoDB to CockroachDB, among others. And a large organization might be using a combination of different databases, hosting them in hybrid, multi-cloud environments. Furthermore, the application might be backed up using one solution but stored using a completely different tool. Therefore, Rishi recommends backup and recovery processes that aren’t tied to a particular database.

Consider DevSecOps tools. Since engineers have come to expect self-service DevSecOps tools, they will likely anticipate similar shift left tooling to handle persistent storage for container environments.

Stateful Data Protection: The Next Phase of Kubernetes Maturity

Although Kubernetes has been open source since 2014, Rishi jokes that in some ways, it’s still “eight years young.” Kubernetes as a culture is still maturing and, along with it, companies are still in the process of establishing standards around many areas such as platform governance, multi-cluster security, authentication and scalability.

First, the community had to solve the networking aspect of this new computing style. The next phase is the storage aspect, predicts Rishi. This is becoming more important to lock down as ransomware and cloud-native exploits continue to make the headlines. There is also a general lack of skills around hardening cloud-native infrastructure, which could expose misconfigurations and access control issues. Secure-by-default practices should be enacted, but data backup procedures should always be in place as a last line of defense.

“Every organization understands the need for backup and recovery,” says Rishi. And although the importance is widely known, organizations shouldn’t lose sight of the unique context of data recovery within cloud-native models.