Kubernetes cluster
Testing Performance and Scalability

Evgenii Frikin

Huawei huawei title

1.0.0

2021-05-30

About me

Evgenii Frikin

My experience:
In IT for more than 10 years
Started my career as a SysAdmin
SRE On-Call for about 3 years
Currently working in R&D

My contacts:

efrikin

Agenda

k8s

Introduction
Problem
      Scalability testing
      Workload testing
Solution
      Overview
      Compare
Deploy
Test
      Measurements
      Compare of results
Experience

agenda

Introduction

Introduction

guarantee

Introduction

guarantee

One of the most important aspects of Kubernetes is its scalability and performance characteristic. As Kubernetes user, administrator or operator of a cluster we would expect to have some guarantees in those areas

What Kubernetes guarantees?
— Kubernetes Team

Introduction

Developers

New features in Kubernetes (e.g Scheduler or API)
Outside of Kubernetes (e.g CNI, CSI or k8s Operator)

Introduction

Administrator/Operator

Performance
Scalability
Stability

Introduction

Architect

Capacity planning
Cost calculation
Scalability architecture

Introduction

Service Level Indicators define what and how we measure
Service Level Objectives set specific requirements
- cluster configuration
- user of Kubernetes extensibility features
- load cluster

Introduction

Introduction

Introduction

General problems

General problems

Extensibility testing

General problems

Extensibility testing
Small memory leaks

General problems

Extensibility testing
Small memory leaks
Scalability and performance testing

General problems

Extensibility testing
Small memory leaks
Scalability and performance testing
Problems with the master components

Scalability testing problems

Scalability testing problems

Spawning large-clusters is expensive and time-consuming

With kubespray application it takes from 20-30min to
Requires costly hardware resources

Scalability testing problems

Need to do once for each release Kubernetes or it components

collect real-world performance and scalability data
compare performance data between releases

Scalability testing problems

Need a light-weight mechanism for fast deploy k8s cluster

quickly evaluate new ideas
implement and check different performance improvements

Workload testing problems

Workload testing problems

Unfriendly for users

All e2e tests are written in Go
Complicated process of running and debugging tests
- Understanding how tests really work
- Testing new features requires code changes

Workload testing problems

Most components that are developed outside of Kubernetes

CRI (containerd, CRI-O, Docker, etc)
CNI (Calico, Cilium, Flannel, etc)
CSI (Ceph, LINSTOR, AWS, etc)
…

Workload testing problems

Golang test definition example

...
testsuites.DriverInfo{
    Name:        "csi-nfsplugin",
    MaxFileSize: testpatterns.FileSizeLarge,
    SupportedFsType: sets.NewString(
        "",
    ),
    Capabilities: map[testsuites.Capability]bool{
        testsuites.CapPersistence: true,
        testsuites.CapExec:        true,
    },
}
...

YAML test definition example

StorageClass:
  FromName: true
SnapshotClass:
  FromName: true
DriverInfo:
  Name: hostpath.csi.k8s.io
  Capabilities:
    block: true
    controllerExpansion: true
    exec: true
    multipods: true
    persistence: true
    pvcDataSource: true
InlineVolumes:
- Attributes: {}

Test startup example

ginkgo -p -focus='External.Storage.*csi-hostpath' -skip='\[Feature:\|\[Disruptive\]' \
       e2e.test -- -storage.testdriver=/tmp/hostpath-testdriver.yaml

Scalability testing solutions

Scalability testing solutions

Bare metal/VMs

solutions k8s install bare

Scalability testing solutions

Bare metal/VMs
kind/minikube/microk8s

solutions k8s install kind

Scalability testing solutions

Baremetal/VMs
kind/minikube/microk8s
real cluster + kubemark

solutions k8s install kubemark

Why kubemark?

Kubemark is a performance testing tool which allows users to do experiments on simulated clusters.

What is Kubemark?
— Kubernetes blog

Why kubemark?

Real cluster

kubemark arch 0

Why kubemark?

Real cluster
Master node for hollow cluster

kubemark arch 1

Why kubemark?

Real cluster
Master node for hollow cluster
Hollow nodes
- Hollow kubelet
- Hollow kube-proxy

Node problem detector

kubemark arch 2

Why kubemark?

Hollow cluster = real cluster

Why kubemark?

Capability to run many instances on a single host

HollowNode doesn’t modify the environment in which it is run

Why kubemark?

Cheap scale tests

~100 HollowNodes per core (~10 millicores and 10MB RAM per pod)
Simulated cluster = Deploy time of a real cluster + Deploy time of HollowNodes
kubectl tool used for all operations scaling operations

Workload testing solutions

helm or kubectl apply -f + some YAML files

Workload testing solutions

helm or kubectl apply -f + some YAML files
clusterloader2

Clusterloader2

Why clusterloader2?

ClusterLoader2 is Kubernetes test framework, which can deploy large numbers of various user-defined objects to a cluster

Using Clusterloader
— Documentation OpenShift

Why clusterloader2?

Simple

Kubeconfig
Definition of sest (YAML)
Providers (gke, kubemark, aws, local, etc)

Why clusterloader2?

User-oriented

No Golang
Easy to understand

Why clusterloader2?

Testable

Measurable SLI/SLO
Declarative paradigm

Why clusterloader2?

Extra metrics

PodStartupLatency
MemoryProfile
MetricsForE2E
…

Deploy of hollow cluster

Deploy of hollow cluster

Table 1. Hardware
RAM		CPU			Network
Total Capacity	Type	Model	Cores/Threads	Arch
~392GB	DDR4	Kunpeng 920-4826	48/48	ARM64	100Gb/s

Table 2. Resources for real/hollow clusters
Role	CPU		RAM		Disk		Number of nodes
	Real	Hollow	Real	Hollow	Real	Hollow	Real	Hollow
Master	12		23GB		15GB		3
Monitoring	4		15GB		10GB		1
W-Node	8	40m	8GB	20MB	8GB	-	50	1

Deploy of hollow cluster

Create NS

kubectl create ns kubemark

Create confimap

kubectl create configmap node-configmap \
        -n kubemark --from-literal=content.type="test-cluster"

Create secret

kubectl create secret generic kubeconfig \
        --type=Opaque --namespace=kubemark \
        --from-file=kubelet.kubeconfig=${HOME}/.kube/config \
        --from-file=kubeproxy.kubeconfig=${HOME}/.kube/config

Deploy of hollow cluster

apiVersion: v1
kind: ReplicationController (1)
...
spec:
  replicas: 5 (2)
  selector:
    name: hollow-node
  template:
    metadata:
      labels:
        name: hollow-node
    spec:
...
      containers:
      - name: hollow-kubelet (3)
...
      - name: hollow-proxy (4)
...
        securityContext:
          privileged: true

1	ReplicationController is a standard resource
2	Replicas = Number of HollowNodes
3	hollow-kubelet is hollow kubelet
4	hollow-proxy is hollow kube-proxy

Deploy of hollow cluster

kubectl get nodes -l hollow

NAME               VERSION  OS-IMAGE                     KERNEL-VERSION        CONTAINER-RUNTIME
hollow-node-hq92c  v1.19.7  Debian GNU/Linux 7 (wheezy)  3.16.0-0.bpo.4-amd64  fakeRuntime://0.1.0
hollow-node-hqz6t  v1.19.7  Debian GNU/Linux 7 (wheezy)  3.16.0-0.bpo.4-amd64  fakeRuntime://0.1.0
hollow-node-hsg6x  v1.19.7  Debian GNU/Linux 7 (wheezy)  3.16.0-0.bpo.4-amd64  fakeRuntime://0.1.0
hollow-node-ht6rq  v1.19.7  Debian GNU/Linux 7 (wheezy)  3.16.0-0.bpo.4-amd64  fakeRuntime://0.1.0
hollow-node-hwpdq  v1.19.7  Debian GNU/Linux 7 (wheezy)  3.16.0-0.bpo.4-amd64  fakeRuntime://0.1.0
hollow-node-hxqsr  v1.19.7  Debian GNU/Linux 7 (wheezy)  3.16.0-0.bpo.4-amd64  fakeRuntime://0.1.0

clusterloader2 run

clusterloader2 run

config.yaml

{{$POD_COUNT := DefaultParam .POD_COUNT 100}} (1)
{{$POD_THROUGHPUT := DefaultParam .POD_THROUGHPUT 5}}
{{$CONTAINER_IMAGE := DefaultParam .CONTAINER_IMAGE "k8s.gcr.io/pause:3.1"}}
{{$POD_STARTUP_LATENCY_THRESHOLD := DefaultParam .POD_STARTUP_LATENCY_THRESHOLD "5s"}}
{{$OPERATION_TIMEOUT := DefaultParam .OPERATION_TIMEOUT "15m"}}
name: node-throughput (2)
...
steps:
- measurements: (3)
  - Identifier: APIResponsivenessPrometheusSimple
    Method: APIResponsivenessPrometheus
...
- phases:
  - namespaceRange:
      min: 1
      max: {{$POD_COUNT}}
    replicasPerNamespace: 1
    objectBundle:
    - basename: latency-pod-rc
      objectTemplatePath: rc.yaml (4)
...

Table 3. Constants for node-throughout test
POD COUNT	THROUGHPUT	IMAGE	LATENCY	OPS TIMEOUT
1000	300	pause:3.3	10s	15m

clusterloader2 run

rc.yaml

apiVersion: v1
kind: ReplicationController
spec:
  replicas: {{.Replicas}}
  selector:
    name: {{.Name}}
  template:
    metadata:
      labels:
        name: {{.Name}}
        group: {{.Group}}
    spec:
      automountServiceAccountToken: false
      containers:
      - image: {{.Image}}
        imagePullPolicy: IfNotPresent
        name: {{.Name}}
...

clusterloader2 run

docker run --rm --network host \
    -v ${HOME}/.kube/:${HOME}/.kube:ro \
    -v $(pwd)/testing:${HOME}/testing:ro \
    -ti clusterloader2 \
    --kubeconfig=${HOME}/.kube/config \ (1)
    --testconfig=${HOME}/testing/config.yaml \ (2)
    --provider=kubemark (3)

1	kubeconfig is a config file
2	testconfig is a definition of tests
3	Provider type

Results of clusterloader2 work

Results of measurements for etcd

Results of measurements for API-server

Deploy of large cluster

Deploy of large cluster

Table 4. Resources for large cluster
Role	CPU	RAM	Disk	Number of nodes
Master	36	96GB	30GB	3
Monitoring	12	80GB	40GB	1
W-Nodes	8	4GB	15GB	200
H-Nodes	40m	20MB	-	5000

Results of clusterloader2 work

Results of measurements for etcd

Stage	Nodes	RPC rate (req/s)	Memory (MB)	Client Traffic (kB/s)		Disk duration (ms)		Number of resources
Stage	Nodes	RPC rate (req/s)	Memory (MB)	in	out	DB fsync	WAL fsync	Number of resources
Idle	204	60	80	40	300	60	30	-
Idle hollow	5204	600	9000	185	2500	63	30	+5000
Test	5000	3800	9000	240	4000	100	40	+30000

Results of measurements for API server

Stage

Nodes

CPU

Memory (MB)

SLI (req/s)

Number of resources

READ

WRITE

Idle

204

1.2

1500

Idle hollow

5204

40000

520

560

+5000

Test

5000

40000

3500

590

+30000

Our experience

Our experience

Scaling k8s cluster up to 15k nodes(microk8s v18.7)

Real ~200 worker nodes and 1 master

Workload:

70k out of the box
130k after tuning etcd

Our experience

k8s our exp api

Scaling k8s cluster up to 18k nodes(k8s v19.7)

Real ~200 worker nodes and 3 master

Workload:

80k out of the box
142k after tuning API-server

Our experience

More than 20 real k8s clusters rendered inoperable during tests

default configuration not suitable for large large scale clusters
- etcd
- CNI
- API-server
memory leaks caused by large number of resources
- etcd
- API-server

Our experience

Other

CrashLoopBackOff
OOM

oom and crash

ANY QUESTIONS?

FEEL FREE TO ASK ME

efrikin.github.io/devopsconf2021

Kubernetes clusterTesting Performance and Scalability

About me

Agenda

Introduction

Introduction

Introduction

Introduction

Introduction

Introduction

Introduction

Introduction

Introduction

General problems

General problems

General problems

General problems

Scalability testing problems

Scalability testing problems

Scalability testing problems

Workload testing problems

Workload testing problems

Workload testing problems

Scalability testing solutions

Scalability testing solutions

Scalability testing solutions

Why kubemark?

Why kubemark?

Why kubemark?

Why kubemark?

Why kubemark?

Why kubemark?

Why kubemark?

Workload testing solutions

Workload testing solutions

Why clusterloader2?

Why clusterloader2?

Why clusterloader2?

Why clusterloader2?

Why clusterloader2?

Deploy of hollow cluster

Deploy of hollow cluster

Deploy of hollow cluster

Deploy of hollow cluster

clusterloader2 run

clusterloader2 run

Deploy of large cluster

Results of clusterloader2 work

Results of measurements for etcd

Results of measurements for API server

Our experience

Our experience

Our experience

Our experience

Kubernetes cluster
Testing Performance and Scalability