Kubernetes cluster
Testing Performance and Scalability


Evgenii Frikin

Huawei huawei title

1.0.0

2021-05-30

About me

about me
Evgenii Frikin

My experience:
In IT for more than 10 years
Started my career as a SysAdmin
SRE On-Call for about 3 years
Currently working in R&D

My contacts:

telegram logo   linkedin   octocat efrikin

Agenda

k8s

Introduction
Problem
      Scalability testing
      Workload testing
Solution
      Overview
      Compare
Deploy
Test
      Measurements
      Compare of results
Experience

agenda






Introduction

intro

Introduction

guarantee

Introduction

guarantee

One of the most important aspects of Kubernetes is its scalability and performance characteristic. As Kubernetes user, administrator or operator of a cluster we would expect to have some guarantees in those areas
What Kubernetes guarantees?
— Kubernetes Team

Introduction



Developers

  • New features in Kubernetes (e.g Scheduler or API)

  • Outside of Kubernetes (e.g CNI, CSI or k8s Operator)



dev use case

Introduction



Administrator/Operator

  • Performance admin use case

  • Scalability

  • Stability

Introduction



Architect

  • Capacity planning arch use case

  • Cost calculation

  • Scalability architecture

Introduction

  • Service Level Indicators define what and how we measure

  • Service Level Objectives set specific requirements

    • cluster configuration

    • user of Kubernetes extensibility features

    • load cluster

slislo

Introduction

sli slo real 1

Introduction

sli slo real 2

Introduction

sli slo real 3



300 pods





General problems

problems

General problems

  • Extensibility testing

extensibility

General problems

  • Extensibility testing

  • Small memory leaks

problems component

General problems

  • Extensibility testing

  • Small memory leaks

  • Scalability and performance testing

performance

General problems

  • Extensibility testing

  • Small memory leaks

  • Scalability and performance testing

  • Problems with the master components

expose k8s master





Scalability testing problems

problems

Scalability testing problems

Spawning large-clusters is expensive and time-consuming

  • With kubespray application it takes from 20-30min to

  • Requires costly hardware resources




problem scalability 1

Scalability testing problems

Need to do once for each release Kubernetes or it components

  • collect real-world performance and scalability data

  • compare performance data between releases




problem scalability 2

Scalability testing problems

Need a light-weight mechanism for fast deploy k8s cluster

  • quickly evaluate new ideas

  • implement and check different performance improvements




problem scalability 3





Workload testing problems

problems workload

Workload testing problems

Unfriendly for users

  • All e2e tests are written in Go

  • Complicated process of running and debugging tests

    • Understanding how tests really work

    • Testing new features requires code changes

problem test 1

Workload testing problems

Most components that are developed outside of Kubernetes

  • CRI (containerd, CRI-O, Docker, etc)

  • CNI (Calico, Cilium, Flannel, etc)

  • CSI (Ceph, LINSTOR, AWS, etc)

  • …​

problem test 2

Workload testing problems

Golang test definition example
...
testsuites.DriverInfo{
    Name:        "csi-nfsplugin",
    MaxFileSize: testpatterns.FileSizeLarge,
    SupportedFsType: sets.NewString(
        "",
    ),
    Capabilities: map[testsuites.Capability]bool{
        testsuites.CapPersistence: true,
        testsuites.CapExec:        true,
    },
}
...
YAML test definition example
StorageClass:
  FromName: true
SnapshotClass:
  FromName: true
DriverInfo:
  Name: hostpath.csi.k8s.io
  Capabilities:
    block: true
    controllerExpansion: true
    exec: true
    multipods: true
    persistence: true
    pvcDataSource: true
InlineVolumes:
- Attributes: {}
Test startup example
ginkgo -p -focus='External.Storage.*csi-hostpath' -skip='\[Feature:\|\[Disruptive\]' \
       e2e.test -- -storage.testdriver=/tmp/hostpath-testdriver.yaml
very hard





Scalability testing solutions

solutions

Scalability testing solutions

  • Bare metal/VMs

solutions k8s install bare

Scalability testing solutions

  • Bare metal/VMs

  • kind/minikube/microk8s

solutions k8s install kind

Scalability testing solutions

  • Baremetal/VMs

  • kind/minikube/microk8s

  • real cluster + kubemark

solutions k8s install kubemark



kubemark our solution

Why kubemark?







Kubemark is a performance testing tool which allows users to do experiments on simulated clusters.
What is Kubemark?
— Kubernetes blog

Why kubemark?




  • Real cluster

kubemark arch 0

Why kubemark?

  • Real cluster

  • Master node for hollow cluster

kubemark arch 1

Why kubemark?

  • Real cluster

  • Master node for hollow cluster

  • Hollow nodes

    • Hollow kubelet

    • Hollow kube-proxy

  • Node problem detector

kubemark arch 2

Why kubemark?


Hollow cluster = real cluster


hollow as real

Why kubemark?


Capability to run many instances on a single host

  • HollowNode doesn’t modify the environment in which it is run


hollow 1

Why kubemark?

Cheap scale tests

  • ~100 HollowNodes per core (~10 millicores and 10MB RAM per pod)

  • Simulated cluster = Deploy time of a real cluster + Deploy time of HollowNodes

  • kubectl tool used for all operations scaling operations


hollow pod deploy
cl2 mem

Workload testing solutions


  • helm or kubectl apply -f + some YAML files


cl2 solution helm

Workload testing solutions


  • helm or kubectl apply -f + some YAML files

  • clusterloader2


cl2 solution




cl2 meme

Clusterloader2

Why clusterloader2?







ClusterLoader2 is Kubernetes test framework, which can deploy large numbers of various user-defined objects to a cluster
Using Clusterloader
— Documentation OpenShift

Why clusterloader2?


Simple

  • Kubeconfig

  • Definition of sest (YAML)

  • Providers (gke, kubemark, aws, local, etc)

cl2 simple

Why clusterloader2?


User-oriented

  • No Golang

  • Easy to understand go vs yaml

Why clusterloader2?


Testable

  • Measurable SLI/SLO

  • Declarative paradigm testable

Why clusterloader2?


Extra metrics

  • PodStartupLatency

  • MemoryProfile

  • MetricsForE2E

  • …​ extra ob





Deploy of hollow cluster

deploy

Deploy of hollow cluster

Table 1. Hardware

RAM

CPU

Network

Total Capacity

Type

Model

Cores/Threads

Arch

~392GB

DDR4

Kunpeng 920-4826

48/48

ARM64

100Gb/s

Table 2. Resources for real/hollow clusters

Role

CPU

RAM

Disk

Number of nodes

Real

Hollow

Real

Hollow

Real

Hollow

Real

Hollow

Master

12

23GB

15GB

3

Monitoring

4

15GB

10GB

1

W-Node

8

40m

8GB

20MB

8GB

-

50

1

Deploy of hollow cluster




Create NS
kubectl create ns kubemark


Create confimap
kubectl create configmap node-configmap \
        -n kubemark --from-literal=content.type="test-cluster"


Create secret
kubectl create secret generic kubeconfig \
        --type=Opaque --namespace=kubemark \
        --from-file=kubelet.kubeconfig=${HOME}/.kube/config \
        --from-file=kubeproxy.kubeconfig=${HOME}/.kube/config

Deploy of hollow cluster

apiVersion: v1
kind: ReplicationController (1)
...
spec:
  replicas: 5 (2)
  selector:
    name: hollow-node
  template:
    metadata:
      labels:
        name: hollow-node
    spec:
...
      containers:
      - name: hollow-kubelet (3)
...
      - name: hollow-proxy (4)
...
        securityContext:
          privileged: true
1ReplicationController is a standard resource
2Replicas = Number of HollowNodes
3hollow-kubelet is hollow kubelet
4hollow-proxy is hollow kube-proxy

Deploy of hollow cluster

kubectl get nodes -l hollow
NAME               VERSION  OS-IMAGE                     KERNEL-VERSION        CONTAINER-RUNTIME
hollow-node-hq92c  v1.19.7  Debian GNU/Linux 7 (wheezy)  3.16.0-0.bpo.4-amd64  fakeRuntime://0.1.0
hollow-node-hqz6t  v1.19.7  Debian GNU/Linux 7 (wheezy)  3.16.0-0.bpo.4-amd64  fakeRuntime://0.1.0
hollow-node-hsg6x  v1.19.7  Debian GNU/Linux 7 (wheezy)  3.16.0-0.bpo.4-amd64  fakeRuntime://0.1.0
hollow-node-ht6rq  v1.19.7  Debian GNU/Linux 7 (wheezy)  3.16.0-0.bpo.4-amd64  fakeRuntime://0.1.0
hollow-node-hwpdq  v1.19.7  Debian GNU/Linux 7 (wheezy)  3.16.0-0.bpo.4-amd64  fakeRuntime://0.1.0
hollow-node-hxqsr  v1.19.7  Debian GNU/Linux 7 (wheezy)  3.16.0-0.bpo.4-amd64  fakeRuntime://0.1.0
kubemark deploy





clusterloader2 run

run test

clusterloader2 run

config.yaml
{{$POD_COUNT := DefaultParam .POD_COUNT 100}} (1)
{{$POD_THROUGHPUT := DefaultParam .POD_THROUGHPUT 5}}
{{$CONTAINER_IMAGE := DefaultParam .CONTAINER_IMAGE "k8s.gcr.io/pause:3.1"}}
{{$POD_STARTUP_LATENCY_THRESHOLD := DefaultParam .POD_STARTUP_LATENCY_THRESHOLD "5s"}}
{{$OPERATION_TIMEOUT := DefaultParam .OPERATION_TIMEOUT "15m"}}
name: node-throughput (2)
...
steps:
- measurements: (3)
  - Identifier: APIResponsivenessPrometheusSimple
    Method: APIResponsivenessPrometheus
...
- phases:
  - namespaceRange:
      min: 1
      max: {{$POD_COUNT}}
    replicasPerNamespace: 1
    objectBundle:
    - basename: latency-pod-rc
      objectTemplatePath: rc.yaml (4)
...
Table 3. Constants for node-throughout test

POD COUNT

THROUGHPUT

IMAGE

LATENCY

OPS TIMEOUT

1000

300

pause:3.3

10s

15m

clusterloader2 run

rc.yaml
apiVersion: v1
kind: ReplicationController
spec:
  replicas: {{.Replicas}}
  selector:
    name: {{.Name}}
  template:
    metadata:
      labels:
        name: {{.Name}}
        group: {{.Group}}
    spec:
      automountServiceAccountToken: false
      containers:
      - image: {{.Image}}
        imagePullPolicy: IfNotPresent
        name: {{.Name}}
...
clusterloader2 run
docker run --rm --network host \
    -v ${HOME}/.kube/:${HOME}/.kube:ro \
    -v $(pwd)/testing:${HOME}/testing:ro \
    -ti clusterloader2 \
    --kubeconfig=${HOME}/.kube/config \ (1)
    --testconfig=${HOME}/testing/config.yaml \ (2)
    --provider=kubemark (3)
1kubeconfig is a config file
2testconfig is a definition of tests
3Provider type







Results of clusterloader2 work

countdown


cl2 creation cleanup


cl2 real hollow cluster







Results of measurements for etcd

measurements


etcd 1000 pods fsync


etcd 1000 pods rpcrate


etcd 1000 pods memory







Results of measurements for API-server

measurements


api 1000 pods sli read


api 1000 pods sli write


api 1000 pods memory


api 1000 pods cpu


Deploy of large cluster

big scale

Deploy of large cluster



Table 4. Resources for large cluster

Role

CPU

RAM

Disk

Number of nodes

Master

36

96GB

30GB

3

Monitoring

12

80GB

40GB

1

W-Nodes

8

4GB

15GB

200

H-Nodes

40m

20MB

-

5000

Results of clusterloader2 work

5000 cl2 real compare

Results of measurements for etcd


Stage

Nodes

RPC rate (req/s)

Memory (MB)

Client Traffic (kB/s)

Disk duration (ms)

Number of resources

in

out

DB fsync

WAL fsync

Idle

204

60

80

40

300

60

30

-

Idle hollow

5204

600

9000

185

2500

63

30

+5000

Test

5000

3800

9000

240

4000

100

40

+30000

Results of measurements for API server



Stage

Nodes

CPU

Memory (MB)

SLI (req/s)

Number of resources

READ

WRITE

Idle

204

1.2

1500

30

43

-

Idle hollow

5204

3

40000

520

560

+5000

Test

5000

8

40000

3500

590

+30000






Our experience

rake

Our experience





Scaling k8s cluster up to 15k nodes(microk8s v18.7)
  • Real ~200 worker nodes and 1 master


Workload:
  • 70k out of the box

  • 130k after tuning etcd

Our experience





k8s our exp api

Scaling k8s cluster up to 18k nodes(k8s v19.7)
  • Real ~200 worker nodes and 3 master


Workload:
  • 80k out of the box

  • 142k after tuning API-server

Our experience




More than 20 real k8s clusters rendered inoperable during tests
  • default configuration not suitable for large large scale clusters mem leaks exp

    • etcd

    • CNI

    • API-server

  • memory leaks caused by large number of resources

    • etcd

    • API-server

Our experience

Other
  • CrashLoopBackOff

  • OOM

oom and crash

ANY QUESTIONS?

qa

FEEL FREE TO ASK ME

thx

efrikin.github.io/devopsconf2021