Kubernetes StatefulSets

prerequisites

I recommend you know the basic knowledge of Kubernetes Pods before reading this blog. You can check this blog for details about Kubernetes Pods.

What Is A StatefulSet

StatefulSet is a Kubernetes object designed to manage stateful applications. Like a Deployment, a StatefulSet scales up a set of pods to the desired number that you define in a config file. Pods in a StatefulSet runs the same containers that are defined in the Pod spec inside the StatefulSet spec. Unlike a Deployment, every Pod of a StatefulSet owns a sticky and stable identity. A StatefulSet also provides the guarantee about ordered deployment, deletion, scaling, and rolling updates for its Pods.

A StatefulSet Example

A complete StatefulSet consists of two components:

  • A StatefulSet object used to create and manage its Pods.

ZooKeeper Service

A Zookeeper service is a distributed coordination system for distributed applications. It allows you to read, write data, and observe data updates. Data is stored and replicated in each ZooKeeper server and these servers work together as a ZooKeeper Ensemble.

Image for post
Image for post

Headless Service

A Headless Service is responsible for controlling the network domain for a StatefulSet. The way to create a headless service is to specify clusterIP == None.

apiVersion: v1
kind: Service
metadata:
namespace: default
name: zk-hs
labels:
app: zk-hs
spec:
ports:
- port: 2888
name: server
- port: 3888
name: leader-election
clusterIP: None
selector:
app: zk

StatefulSet Spec

The following spec demonstrates how to use a StatefulSet to run a ZooKeeper service:

apiVersion: apps/v1
kind: StatefulSet
metadata:
namespace: default
name: zk

# StatefulSet spec
spec:
serviceName: zk-hs
selector:
matchLabels:
app: zk # It has to match .spec.template.metadata.labels
replicas: 5
podManagementPolicy: OrderedReady
updateStrategy:
type: RollingUpdate

# volumeClaimTemplates creates a Persistent Volume for each StatefulSet Pods.
volumeClaimTemplates:
- metadata:
name: datadir
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: standard
resources:
requests:
storage: 10Gi

# Pod spec
template:
metadata:
labels:
app: zk
spec:
affinity:
nodeAffinity:
...
podAntiAffinity:
...
# Containers running in each Pod
containers:
- name: k8szk
image: gcr.io/google_samples/k8szk:v3
ports:
- containerPort: 2181
name: client
- containerPort: 2888
name: server
- containerPort: 3888
name: leader-election
env:
...
readinessProbe:
exec:
command:
- "zkOk.sh"
initialDelaySeconds: 10
timeoutSeconds: 5
livenessProbe:
...
volumeMounts:
- name: datadir
mountPath: /var/lib/zookeeper

Metadata

The field metadata contains metadata of this StatefulSet, which includes the name of this StatefulSet and the Namespace it belongs to. You can also put labels and annotations in this field.

Stateful Set Spec and Pod Template

The field spec defines the specification of this StatefulSet and the field spec.template defines a template for creating the Pods this StatefulSet manages.

Pod Selector

Like a Deployment, a StatefulSet uses the field spec.selctor to find which Pods to manage. You can check this doc for details about the usage of Pod Selector.

Replica

The field spec.replica specifies the desired number of Pods for the StatefulSet. It is recommended to run an odd number of Pods for some stateful applications like ZooKeepers, based on the consideration of the efficiency of some operations. For example, a ZooKeeper service marks a data write complete only when more than half of its servers send an acknowledgment back to the leader. Take a six pods ZooKeeper service as an example. The service remains available as long as at least four servers (ceil(6/2 + 1)) are available, which means your service can tolerate the failure of two servers. Nevertheless, it can still tolerate two-servers failure when the server number is lowered down to five. Meanwhile, this also improves write efficiency as now it only needs 3 servers' acknowledgment to complete a write request. Therefore, having the sixth server, in this case, does not give you any additional advantage in terms of write efficiency and server availability.

Pod Identify

A StatefulSet Pod is assigned a unique ID (aka. Pod Name) from its Headless Service when it is created. This ID sticks to the Pod during the life cycle of the StatefulSet. The pattern of constructing ID is ${statefulSetName}-${ordinal}. For example, Kubernetes will create five Pods with five unique IDs zk-0, zk-1, zk-2, zk-3 and zk-4 for the above ZooKeeper service.

Pod Management Policy

You can choose whether to create/update/delete a StatefulSet’s Pod in order or in parallel by specifying spec.podManagementPolicy == OrderedReady or spec.podManagementPolicy == Parallel. OrderedReady is the default setting and it controls the Pods to be created with the order 0, 1, 2, ..., N and to be deleted with the order N, N-1, ..., 1, 0. In addition, it has to wait for the current Pod to become Ready or terminated prior to terminating or launching the next Pod. Parallel launches or terminates all the Pods simultaneously. It does not rely on the state of the current Pod to lunch or terminate the next Pod.

Update Strategy

There are several rolling update strategies available for StatefulSets. RollingUpdate is the default strategy and it deletes and recreates each Pod for a StatefulSet when a rolling update occurs.

Pod Affinity

Like a Deployment, the ideal scenario of running a StatefulSet is to distribute its Pods to different nodes in different zones and avoid running multiple Pods in the same node. The spec.template.spec.affinity field allows you to specify node affinity and inter-pod affinity (or anti-affinity) for the SatefulSet Pods. You can check this doc for details about using node/pod affinity in Kubernetes

volumeClaimTemplates

The field spec.volumeClaimTemplates is used to provide stable storage for StatefulSets. As shown in the following picture, the field spec.volumeClaimTemplates creates a Persistent Volume Claim ( datadir-zk-0), a Persistent Volume ( pv-0000), and a 10 GB standard persistent disk for Pod zk-0. These storage settings have the same life cycle as the StatefulSet, which means the storage for a Stateful Pod is stable and persistent. Any StatefulSet Pod will not lose its data whenever it is terminated and recreated.

Image for post
Image for post

What Is Next

I recommend you read this blog if you are curious about how to utilize Kubernetes Deployments to run stateless applications in Kubernetes.

Reference

A software engineer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store