升级 kubernetes 集群 (1.19.x) -> 1.29.x

    升级 kubernetes 集群 (1.19.x) -> 1.20.x

    1. 升级 kubeadm

    apt install kubeadm=1.20.15
    
    Reading package lists... Done
    Building dependency tree       
    Reading state information... Done
    The following packages will be upgraded:
      kubeadm
    1 upgraded, 0 newly installed, 0 to remove and 329 not upgraded.
    Need to get 7,705 kB of archives.
    After this operation, 28.7 kB disk space will be freed.
    Get:1 http://mirrors.tencentyun.com/kubernetes/apt kubernetes-xenial/main amd64 kubeadm amd64 1.20.15-00 [7,705 kB]
    Fetched 7,705 kB in 1s (9,420 kB/s)  
    (Reading database ... 118717 files and directories currently installed.)
    Preparing to unpack .../kubeadm_1.20.15-00_amd64.deb ...
    Unpacking kubeadm (1.20.15-00) over (1.20.5-00) ...
    Setting up kubeadm (1.20.15-00) ...
    

    2. 升级前检查

    
    kubeadm upgrade plan
    
    
    root@l2:~# kubeadm upgrade plan
    [upgrade/config] Making sure the configuration is correct:
    [upgrade/config] Reading configuration from the cluster...
    [upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
    [preflight] Running pre-flight checks.
    [upgrade] Running cluster health checks
    [upgrade] Fetching available versions to upgrade to
    [upgrade/versions] Cluster version: v1.19.16
    [upgrade/versions] kubeadm version: v1.20.15
    I0810 21:20:17.404258 1175025 version.go:254] remote version is much newer: v1.27.4; falling back to: stable-1.20
    [upgrade/versions] Latest stable version: v1.20.15
    [upgrade/versions] Latest stable version: v1.20.15
    [upgrade/versions] Latest version in the v1.19 series: v1.19.16
    [upgrade/versions] Latest version in the v1.19 series: v1.19.16
    
    Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
    COMPONENT   CURRENT        AVAILABLE
    kubelet     2 x v1.19.16   v1.20.15
                1 x v1.20.5    v1.20.15
    
    Upgrade to the latest stable version:
    
    COMPONENT                 CURRENT    AVAILABLE
    kube-apiserver            v1.19.16   v1.20.15
    kube-controller-manager   v1.19.16   v1.20.15
    kube-scheduler            v1.19.16   v1.20.15
    kube-proxy                v1.19.16   v1.20.15
    CoreDNS                   1.7.0      1.7.0
    etcd                      3.4.13-0   3.4.13-0
    
    You can now apply the upgrade by executing the following command:
    
            kubeadm upgrade apply v1.20.15
    
    _____________________________________________________________________
    
    
    The table below shows the current state of component configs as understood by this version of kubeadm.
    Configs that have a "yes" mark in the "MANUAL UPGRADE REQUIRED" column require manual config upgrade or
    resetting to kubeadm defaults before a successful upgrade can be performed. The version to manually
    upgrade to is denoted in the "PREFERRED VERSION" column.
    
    API GROUP                 CURRENT VERSION   PREFERRED VERSION   MANUAL UPGRADE REQUIRED
    kubeproxy.config.k8s.io   v1alpha1          v1alpha1            no
    kubelet.config.k8s.io     v1beta1           v1beta1             no
    _____________________________________________________________________
    
    

    2. 升级 worker 节点

    2.1 升级 kubeadm

     apt install kubeadm=1.20.15-00
    
    kubeadm upgrade node
    
    [upgrade] Reading configuration from the cluster...
    [upgrade] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
    [preflight] Running pre-flight checks
    [preflight] Skipping prepull. Not a control plane node.
    [upgrade] Skipping phase. Not a control plane node.
    [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
    [upgrade] The configuration for this node was successfully updated!
    [upgrade] Now you should go ahead and upgrade the kubelet package using your package manager.
    

    2.2 升级 kubelet kubectl

    apt install kubelet=1.20.15-00 kubectl=1.20.15-00
    
    
    sudo systemctl daemon-reload && sudo systemctl restart kubelet
    

    升级 master 节点 kubelet kubectl

    apt install kubelet=1.20.15-00 kubectl=1.20.15-00
    

    升级 kubernetes 集群 (1.20.x) -> 1.21.x

    步骤与上述类似, 仅列出参考命令

    
    apt install kubeadm=1.21.13-00
    
    apt install kubelet=1.21.13-00 kubectl=1.21.13-00 kubeadm=1.21.13-00
    
    kubeadm upgrade node
    
    sudo systemctl daemon-reload && sudo systemctl restart kubelet
    

    升级 kubernetes 集群 (1.21.x) -> 1.22.x

     apt install kubeadm=1.22.10-00
    
     apt install kubelet=1.22.10-00 kubectl=1.22.10-00 kubeadm=1.22.10-00  && kubeadm upgrade node  && sudo systemctl daemon-reload && sudo systemctl restart kubelet
    
    

    升级 kubernetes 集群 (1.22.x) -> 1.23.x

     apt install kubeadm=1.23.17-00
    
    
     apt install kubelet=1.23.17-00 kubectl=1.23.17-00 kubeadm=1.23.17-00  && kubeadm upgrade node  && sudo systemctl daemon-reload && sudo systemctl restart kubelet
    

    升级 kubernetes 集群 (1.23.x) -> 1.24.x

    1.24 开始已不再支持 docker 运行时了, 需要慎重升级

    参考博文 https://www.lisenet.com/2022/upgrading-homelab-kubernetes-cluster-from-1-23-to-1-24/

    TODO...

    apt install kubeadm=1.24.6-00

    apt install kubelet=1.24.16-00 kubectl=1.24.16-00 kubeadm=1.24.16-00 && kubeadm upgrade node && sudo systemctl daemon-reload && sudo systemctl restart kubelet

    错误排查

    遇到报错: master not ready

    升级到 1.22.x 时报错 master not ready

    root@l2:~# kg nodes
    NAME   STATUS     ROLES                  AGE   VERSION
    l2     NotReady   control-plane,master   25h   v1.22.10
    l3     Ready      <none>                 24h   v1.22.10
    l4     Ready      <none>                 24h   v1.22.10
    

    排查系统日志 tail -f /var/log/syslog

    Aug 10 22:12:09 l2 kubelet[1218528]: E0810 22:12:09.823103 1218528 kubelet.go:2376] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized"
    Aug 10 22:12:12 l2 kubelet[1218528]: I0810 22:12:12.309685 1218528 cni.go:204] "Error validating CNI config list" configList="{\n  \"name\": \"cbr0\",\n  \"cniVersion\": \"0.3.1\",\n  \"plugins\": [\n    {\n      \"type\": \"flannel\",\n      \"delegate\": {\n        \"hairpinMode\": true,\n        \"isDefaultGateway\": true\n      }\n    },\n    {\n      \"type\": \"portmap\",\n      \"capabilities\": {\n        \"portMappings\": true\n      }\n    }\n  ]\n}\n" err="[failed to find plugin \"flannel\" in path [/opt/cni/bin]]"
    
    

    发现 flannel plugin 未准备好,

    删除 pod, 重新

    root@l2:~# kg pods -n kube-flannel
    NAME                    READY   STATUS    RESTARTS   AGE
    kube-flannel-ds-4kcc6   1/1     Running   0          25h
    kube-flannel-ds-6hq7z   1/1     Running   0          25h
    kube-flannel-ds-s4jx8   1/1     Running   0          25h
    
    
     kubectl delete pod kube-flannel-ds-4kcc6  kube-flannel-ds-6hq7z  kube-flannel-ds-s4jx8   -n kube-flannel
    
    

    等待...

    升级到 v1.24.15 报错 (尚未解决, 记录备份)

    报错日志如下:

        [ERROR ImagePull]: failed to pull image registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:v1.24.16: output: time="2023-08-10T22:23:52+08:00" level=fatal msg="validate service connection: CRI v1 image API is not implemented for endpoint \"unix:///var/run/dockershim.sock\": rpc error: code = Unimplemented desc = unknown service runtime.v1.ImageService"
    

    原因: kubernetes 从 1.24 开始, 不再支持 docker 运行时

    apt install containerd
    
    root@l2:~# apt install containerd
    Reading package lists... Done
    Building dependency tree       
    Reading state information... Done
    The following packages were automatically installed and are no longer required:
      aufs-tools cgroupfs-mount docker-ce-cli pigz
    Use 'apt autoremove' to remove them.
    The following additional packages will be installed:
      runc
    The following packages will be REMOVED:
      containerd.io docker-ce
    The following NEW packages will be installed:
      containerd runc
    0 upgraded, 2 newly installed, 2 to remove and 330 not upgraded.
    Need to get 36.3 MB of archives.
    After this operation, 86.6 MB disk space will be freed.
    Do you want to continue? [Y/n] y
    Get:1 http://mirrors.tencentyun.com/ubuntu focal-updates/main amd64 runc amd64 1.1.7-0ubuntu1~20.04.1 [3,819 kB]
    Get:2 http://mirrors.tencentyun.com/ubuntu focal-updates/main amd64 containerd amd64 1.7.2-0ubuntu1~20.04.1 [32.5 MB]
    Fetched 36.3 MB in 1s (27.8 MB/s)     
    (Reading database ... 118718 files and directories currently installed.)
    Removing docker-ce (5:19.03.15~3-0~debian-stretch) ...
    Removing containerd.io (1.4.3-1) ...
    Selecting previously unselected package runc.
    (Reading database ... 118696 files and directories currently installed.)
    Preparing to unpack .../runc_1.1.7-0ubuntu1~20.04.1_amd64.deb ...
    Unpacking runc (1.1.7-0ubuntu1~20.04.1) ...
    Selecting previously unselected package containerd.
    Preparing to unpack .../containerd_1.7.2-0ubuntu1~20.04.1_amd64.deb ...
    Unpacking containerd (1.7.2-0ubuntu1~20.04.1) ...
    Setting up runc (1.1.7-0ubuntu1~20.04.1) ...
    Setting up containerd (1.7.2-0ubuntu1~20.04.1) ...
    Processing triggers for man-db (2.9.1-1) ...
    root@l2:~# docker ps
    Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
    
    mkdir -p /etc/containerd
    
    containerd config default | sudo tee /etc/containerd/config.toml
    
    cat /var/lib/kubelet/kubeadm-flags.env
    sudo sed -i 's/--network-plugin=cni/--container-runtime=remote\ --container-runtime-endpoint=unix\:\/\/\/run\/containerd\/containerd.sock/g' /var/lib/kubelet/kubeadm-flags.env
    cat /var/lib/kubelet/kubeadm-flags.env
    
    root@l2:~# cat /var/lib/kubelet/kubeadm-flags.env
    KUBELET_KUBEADM_ARGS="--network-plugin=cni --pod-infra-container-image=registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.2"
    
    root@l2:~# sudo sed -i 's/--network-plugin=cni/--container-runtime=remote\ --container-runtime-endpoint=unix\:\/\/\/run\/containerd\/containerd.sock/g' /var/lib/kubelet/kubeadm-flags.env
    root@l2:~#  cat /var/lib/kubelet/kubeadm-flags.env
    KUBELET_KUBEADM_ARGS="--container-runtime=remote --container-runtime-endpoint=unix:///run/containerd/containerd.sock --pod-infra-container-image=registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.2"
    

    此问题并未解决, 集群机器出了一些问题, 我直接重新安装了,
    升级方式可参考 Kubernetes 安装指南

    升级 kubernetes 集群 (1.28.x) -> 1.29.x

    评论栏