[CKA] 명령어 정리 - 2

Nowon9159 2024. 1. 2. 14:12

CKA (https://www.credly.com/org/the-linux-foundation/badge/cka-certified-kubernetes-administrator)

Cluster Maintenance

OS Upgrades

노드 정상적으로 비우기

controlplane ~ ✖ k drain node01 --ignore-daemonsets
node/node01 already cordoned
Warning: ignoring DaemonSet-managed Pods: kube-flannel/kube-flannel-ds-86sk5, kube-system/kube-proxy-h5rcc
evicting pod default/blue-6b478c8dbf-qvshb
evicting pod default/blue-6b478c8dbf-74gxg
pod/blue-6b478c8dbf-qvshb evicted
pod/blue-6b478c8dbf-74gxg evicted
node/node01 drained

controlplane ~ ➜  k drain node01 --ignore-daemonsets --force
node/node01 already cordoned
Warning: deleting Pods that declare no controller: default/hr-app; ignoring DaemonSet-managed Pods: kube-flannel/kube-flannel-ds-86sk5, kube-system/kube-proxy-h5rcc
evicting pod default/hr-app
pod/hr-app evicted
node/node01 drained

replicaset이나 deployment로 생성하지 않은 일반 파드는 drain 과정에서 영구적으로 삭제된다. 그러나 --force 옵션을 지정해주면 해당하는 파드 설정 그대로 deployment를 생성해 다른 노드에 파드를 생성할 수 있도록 해줌

Cluster Upgrade Process

노드 버전 확인

controlplane ~ ✖ k get nodes
NAME           STATUS   ROLES           AGE   VERSION
controlplane   Ready    control-plane   41m   v1.26.0
node01         Ready    <none>          41m   v1.26.0

업그레이드 버전 확인

controlplane ~ ➜  kubeadm upgrade plan
~~~
I1221 02:34:51.058481   16023 version.go:256] remote version is much newer: v1.29.0; falling back to: stable-1.26
~~~
Upgrade to the latest version in the v1.26 series:

COMPONENT                 CURRENT   TARGET
kube-apiserver            v1.26.0   v1.26.12
kube-controller-manager   v1.26.0   v1.26.12
kube-scheduler            v1.26.0   v1.26.12
kube-proxy                v1.26.0   v1.26.12
CoreDNS                   v1.9.3    v1.9.3
etcd                      3.5.6-0   3.5.6-0
~~~

remote version은 쿠버네티스 최신 버전
그 밑에 1.26 에 해당하는 추천 TARGET 버전

control plane 업그레이드 과정 정리

# plan으로 업그레이드 버전 확인
controlplane ~ ✖ kubeadm upgrade plan

# drain으로 controlplane 비우기
controlplane ~ ➜  k drain controlplane --ignore-daemonsets 
node/controlplane cordoned
Warning: ignoring DaemonSet-managed Pods: kube-flannel/kube-flannel-ds-7v6dn, kube-system/kube-proxy-sctzx

# apt 업데이트
controlplane ~ ➜  apt update
Get:1 https://packages.cloud.google.com/apt kubernetes-xenial InRelease [8,993 B]      
Hit:2 https://download.docker.com/linux/ubuntu focal InRelease                         
Hit:3 http://security.ubuntu.com/ubuntu focal-security InRelease                 
Hit:4 http://archive.ubuntu.com/ubuntu focal InRelease
Hit:5 http://archive.ubuntu.com/ubuntu focal-updates InRelease
Hit:6 http://archive.ubuntu.com/ubuntu focal-backports InRelease
Fetched 8,993 B in 1s (9,327 B/s)
Reading package lists... Done
Building dependency tree       
Reading state information... Done
19 packages can be upgraded. Run 'apt list --upgradable' to see them.

# kubeadm 버전 업그레이드
controlplane ~ ➜  apt-get install kubeadm=1.27.0-00
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages will be upgraded:
  kubeadm
1 upgraded, 0 newly installed, 0 to remove and 18 not upgraded.
Need to get 9,931 kB of archives.
After this operation, 1,393 kB of additional disk space will be used.
Get:1 https://packages.cloud.google.com/apt kubernetes-xenial/main amd64 kubeadm amd64 1.27.0-00 [9,931 kB]
Fetched 9,931 kB in 0s (31.4 MB/s)
debconf: delaying package configuration, since apt-utils is not installed
(Reading database ... 20439 files and directories currently installed.)
Preparing to unpack .../kubeadm_1.27.0-00_amd64.deb ...
Unpacking kubeadm (1.27.0-00) over (1.26.0-00) ...
Setting up kubeadm (1.27.0-00) ...

# kubelet 버전 업그레이드
controlplane ~ ➜  apt-get install kubelet=1.27.0-00
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages will be upgraded:
  kubelet
1 upgraded, 0 newly installed, 0 to remove and 18 not upgraded.
Need to get 18.8 MB of archives.
After this operation, 15.1 MB disk space will be freed.
Get:1 https://packages.cloud.google.com/apt kubernetes-xenial/main amd64 kubelet amd64 1.27.0-00 [18.8 MB]
Fetched 18.8 MB in 0s (41.6 MB/s)
debconf: delaying package configuration, since apt-utils is not installed
(Reading database ... 20439 files and directories currently installed.)
Preparing to unpack .../kubelet_1.27.0-00_amd64.deb ...
/usr/sbin/policy-rc.d returned 101, not running 'stop kubelet.service'
Unpacking kubelet (1.27.0-00) over (1.26.0-00) ...
Setting up kubelet (1.27.0-00) ...
/usr/sbin/policy-rc.d returned 101, not running 'start kubelet.service'

# kubectl 버전 업그레이드
controlplane ~ ➜  apt-get install kubectl=1.27.0-00
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages will be upgraded:
  kubectl
1 upgraded, 0 newly installed, 0 to remove and 18 not upgraded.
Need to get 10.2 MB of archives.
After this operation, 1,225 kB of additional disk space will be used.
Get:1 https://packages.cloud.google.com/apt kubernetes-xenial/main amd64 kubectl amd64 1.27.0-00 [10.2 MB]
Fetched 10.2 MB in 0s (44.5 MB/s)
debconf: delaying package configuration, since apt-utils is not installed
(Reading database ... 20439 files and directories currently installed.)
Preparing to unpack .../kubectl_1.27.0-00_amd64.deb ...
Unpacking kubectl (1.27.0-00) over (1.26.0-00) ...
Setting up kubectl (1.27.0-00) ...

# kubeadm upgrade apply로 업그레이드
controlplane ~ ➜  kubeadm upgrade apply v1.27.0
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
~~~

# 데몬 리로드, kubelet 재시작
controlplane ~ ➜  systemctl daemon-reload
controlplane ~ ➜  systemctl restart kubelet

워커 노드 업그레이드

Backup and Restore Methods

사전 작업

# etcdctl 버전 변수로 지정
export ETCDCTL_API=3

# 백업 및 복원 비교를 위한 전체구성 저장
kubectl get all --all-namespaces -o yaml > example.yaml

etcdctl 명령어 사용 전에 etcd 버전을 3으로 맞춰줘야할 필요가 있음

스냅샷 세이브하기

controlplane ~ ✖ etcdctl snapshot save /opt/snapshot-pre-boot.db --endpoints=127.0.0.1:2379 --key=/etc/kubernetes/pki/etcd/server.key --cert=/etc/kubernetes/pki/etcd/server.crt --cacert=/etc/kubernetes/pki/etcd/ca.crt
Snapshot saved at /opt/snapshot-pre-boot.db

스냅샷 세이브 할때는 endpoints, key, cert, cacert를 지정해줘야 정상적으로 저장된다.
endpoint 주소는

스냅샷 etcdctl 이용해서 복원하기

# snapshot restore 로 dir 복원하기
# etcd snapshot restore --data-dir=<내가 복원하고자 하는 폴더>
# 복원하고자 하는 폴더가 기존에 존재하지 않아야 함.
etcdctl snapshot restore --data-dir=/var/lib/etcd-backup

# etcd static pod의 구성 정보 변경
vi /etc/kubernetes/manifests/etcd.yaml

- hostPath:
     # 기존 // path: /var/lib/etcd
      path: /var/lib/etcd-backup
      type: DirectoryOrCreate
    name: etcd-data

hostPath:name: etcd-data
path: /var/lib/etcd-backup type: DirectoryOrCreate
복원 과정은 다른것은 건들 필요 없고 restore 명령어로 폴더 생성해주고, etcd.yaml 파일의 hostPath 위치를 restore 명령어로 만들어진 폴더의 절대 경로로 변경해주면 된다.

Backup and Restore Methods2

클러스터 개수 확인

student-node ~ ✖ k config get-clusters
NAME
cluster1
cluster2

use-context를 이용해 다른 클러스터로 이동

student-node ~ ➜  k config use-context cluster1
Switched to context "cluster1".

외부 etcd 서버에 접근하기

# 외부 etcd 서버가 구성되어 있는 클러스터로 접근해서 ssh etcd-server 명령어 수행
student-node ~ ➜  ssh etcd-server
Welcome to Ubuntu 18.04.6 LTS (GNU/Linux 5.4.0-1106-gcp x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage
This system has been minimized by removing packages and content that are
not required on a system that users do not log into.

To restore this content, you can run the 'unminimize' command.
Last login: Fri Dec 22 06:15:54 2023 from 192.8.70.12

# 외부 etcd ip를 확인 후 해당 ip로 ssh 접근
student-node ~ ➜  ssh 192.8.70.6
Welcome to Ubuntu 18.04.6 LTS (GNU/Linux 5.4.0-1106-gcp x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage
This system has been minimized by removing packages and content that are
not required on a system that users do not log into.

To restore this content, you can run the 'unminimize' command.

외부 etcd data-dir 확인하는 법

# etcd는 프로세스로 올라와 있기 때문에 ps 명령어와 grep을 이용해 data-dir 부분을 확인해주면 된다.
etcd-server /var/lib/etcd-data ➜  ps -ef | grep etcd
etcd         819       1  0 05:11 ?        00:00:58 /usr/local/bin/etcd --name etcd-server --data-dir=/var/lib/etcd-data --cert-file=/etc/etcd/pki/etcd.pem --key-file=/etc/etcd/pki/etcd-key.pem --peer-cert-file=/etc/etcd/pki/etcd.pem --peer-key-file=/etc/etcd/pki/etcd-key.pem --trusted-ca-file=/etc/etcd/pki/ca.pem --peer-trusted-ca-file=/etc/etcd/pki/ca.pem --peer-client-cert-auth --client-cert-auth --initial-advertise-peer-urls https://192.8.70.6:2380 --listen-peer-urls https://192.8.70.6:2380 --advertise-client-urls https://192.8.70.6:2379 --listen-client-urls https://192.8.70.6:2379,https://127.0.0.1:2379 --initial-cluster-token etcd-cluster-1 --initial-cluster etcd-server=https://192.8.70.6:2380 --initial-cluster-state new
root        1973    1229  0 06:18 pts/0    00:00:00 grep etcd

Stacked etcd 와 External etcd 라는 개념이 있음
- Stacked 는 컨트롤 플레인 내부에서 다른 컴포넌트와 통신하는 etcd
  - External 는 컨트롤 플레인 내부가 아닌 외부에 다른 노드에 구성된 etcd, External 인지 확인하는 방법은 kube-apiserver의 etcd 설정이 어떻게 되어 있는지 확인하는 방법과 컨트롤 플레인의 pod를 체크하는 방법이 있음 (ip가 172로 시작하는 로컬 호스트 주소가 아니라면 external이다.)

etcd 데이터 직접 확인하는 법

etcd-server /var/lib/etcd-data ➜  ETCDCTL_API=3 etcdctl \
>  --endpoints=https://127.0.0.1:2379 \
>  --cacert=/etc/etcd/pki/ca.pem \
>  --cert=/etc/etcd/pki/etcd.pem \
>  --key=/etc/etcd/pki/etcd-key.pem \
>   member list
840c04836d906824, started, etcd-server, https://192.8.70.6:2380, https://192.8.70.6:2379, false

etcd-server /var/lib/etcd-data ➜  ETCDCTL_API=3 etcdctl  --endpoints=https://127.0.0.1:2379  --cacert=/etc/etcd/pki/ca.pem  --cert=/etc/etcd/pki/etcd.pem  --key=/etc/etcd/pki/etcd-key.pem member
NAME:
        member - Membership related commands

USAGE:
        etcdctl member <subcommand> [flags]

API VERSION:
        3.4

COMMANDS:
        add     Adds a member into the cluster
        list    Lists all members in the cluster
        promote Promotes a non-voting member in the cluster
        remove  Removes a member from the cluster
        update  Updates a member in the cluster

endpoints, cacert, cert, key를 지정해주고 member list 해주면 된다.
이건 사실 정확히 모르겠음 조금 더 찾아봐야할 듯
대략 유추 해보면 etcd-server에 데이터를 직접 확인하는 것으로 보임
궁금해서 찾아보니 etcdctl 내에 member라는 명령어가 있고, 여러가지 서브 커맨드가 있음

context 외부에 있는 etcd 백업 후 복사하는 과정

# 외부에 있는 서버로 접근
ssh cluster1-controlplane

# 외부 서버에서 etcd snapshot
cluster1-controlplane ~ ✖ ETCDCTL_API=3 etcdctl snapshot save --key=/etc/kubernetes/pki/etcd/server.key --cert=/etc/kubernetes/pki/etcd/server.crt --cacert=/etc/kubernetes/pki/etcd/ca.crt --endpoints=https://127.0.0.1:2379 ~/cluster1.db
Snapshot saved at /root/cluster1.db

cluster1-controlplane ~ ➜  exit
logout

# student-node 라는 원 서버 돌아와서 scp로 백업 파일 복사
# 왜 이렇게 했냐면 student-node라는 서버는 ssh 인증이 되어 있기 때문 그 외 cluster -> student로는 ssh 통신이 안되는 상황
student-node ~ ➜  scp cluster1-controlplane:~/cluster1.db /opt/cluster1.db
cluster1.db                                           100% 2012KB  91.8MB/s   00:00

External ETCD restore 하는 과정

# 백업 파일 준비해서 scp 이용해 restore 하고자 하는 노드로 전송
student-node ~ scp /opt/cluster2.db etcd-server:~/

# external etcd server로 이동
student-node ~ ssh etcd-server

# 서버 안에서 restore
# key, cert, cacert, endpoints 지정해서 restore
etcd-server ~ ➜  ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/etcd/pki/ca.pem --cert=/etc/etcd/pki/etcd.pem --key=/etc/etcd/pki/etcd-key.pem snapshot restore /root/cluster2.db --data-dir /var/lib/etcd-data-new
{"level":"info","ts":1662004927.2399247,"caller":"snapshot/v3_snapshot.go:296","msg":"restoring snapshot","path":"/root/cluster2.db","wal-dir":"/var/lib/etcd-data-new/member/wal","data-dir":"/var/lib/etcd-data-new","snap-dir":"/var/lib/etcd-data-new/member/snap"}
{"level":"info","ts":1662004927.2584803,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"cdf818194e3a8c32","local-member-id":"0","added-peer-id":"8e9e05c52164694d","added-peer-peer-urls":["http://localhost:2380"]}
{"level":"info","ts":1662004927.264258,"caller":"snapshot/v3_snapshot.go:309","msg":"restored snapshot","path":"/root/cluster2.db","wal-dir":"/var/lib/etcd-data-new/member/wal","data-dir":"/var/lib/etcd-data-new","snap-dir":"/var/lib/etcd-data-new/member/snap"}

# 폴더 소유 권한 수정 후 확인
etcd-server ~ ✖ chown -R etcd:etcd /var/lib/etcd-data-new

etcd-server ~ ➜  ls -ld /var/lib/etcd-data-new
drwx------ 3 etcd etcd 4096 Jan  2 03:57 /var/lib/etcd-data-new

# etcd 서버의 서비스 파일 수정
etcd-server ~ ➜  systemctl status etcd

etcd.service - etcd key-value store
   Loaded: loaded (/etc/systemd/system/etcd.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2023-12-22 07:36:56 UTC; 5min ago
     Docs: https://github.com/etcd-io/etcd
 Main PID: 3360 (etcd)
    Tasks: 43 (limit: 251379)

# service 파일 수정
etcd-server ~ ✖ vi /etc/systemd/system/etcd.service

# data-dir 파일을 restore 한 디렉터리로 지정
[Service]
User=etcd
Type=notify
ExecStart=/usr/local/bin/etcd \
  --name etcd-server \
  --data-dir=/var/lib/etcd-data-new \
  --cert-file=/etc/etcd/pki/etcd.pem \

# 데몬 및 서비스 리스타트
etcd-server ~ ➜  systemctl daemon-reload

etcd-server ~ ➜  systemctl restart etcd

chown 이용해서 권한 수정하는 경우 -R 옵션을 빼놓지 않아야 한다.

Security

View Certificate Details

Common Name (CN) 확인 법

openssl x509 -in <cert file path> -text -noout

Issuer : 인증서를 발급한 사람, 발행자
Subject : 인증서를 발급받은 사람, 소유자
X509v3 Subject Alternative Name 항목을 확인하면 AN 확인 가능
주로 인증서에 대해 분석하고 확인하는 데 중점을 두면 될듯.
api-server manifest 파일에 etcd 설정 시 api-server의 ca 파일이 아니라 etcd의 ca 파일이어야 함
보통 CN을 많이 물어볼 것 같은데 CN이라 함은 Subject 의 CN을 확인하면 된다.

    - --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt

kubeapi server의 환경설정에서 etcd-cafile 은 /pki 경로에 있는 ca 파일이 아니라 /pki/etcd/ 경로에 있는 파일임

Certificates API

CSR 생성하는 법

jane.csr

--- BEGIN ~~~ ---
MIROSLFKWENFKLASNVOWN ~~~
--- END ~~ ---

# csr 파일 내용을 base64로 다시 인코딩
cat jane.csr | base64 | tr -d "\n"

LS0tLS1CRUd~~~~

# 해당 파일을 CSR 파일 내용에 포함


jane-csr.yaml
---
apiVersion: certificates.k8s.io/v1beta1
kind: CertificateSigningRequest
metadata:
  name: jane
spec:
  groups: 
  - system:authenticated
  usages:
  - digital signature
  - key encipherment
  - server auth
  request:
      LS0tLS1CRUd~~~~

CSR 승인하는 법

kubectl certificate approve <CSR Name>

CSR 요청 확인, 거부, 삭제하는 법

# 확인
controlplane ~ ✖ k get csr
NAME          AGE     SIGNERNAME                                    REQUESTOR                  REQUESTEDDURATION   CONDITION
agent-smith   114s    kubernetes.io/kube-apiserver-client           agent-x                    <none>              Denied
akshay        4m53s   kubernetes.io/kube-apiserver-client           kubernetes-admin           <none>              Approved,Issued
csr-stg9s     15m     kubernetes.io/kube-apiserver-client-kubelet   system:node:controlplane   <none>              Approved,Issued

# 거부
controlplane ~ ➜  kubectl certificate deny agent-smith
certificatesigningrequest.certificates.k8s.io/agent-smith denied

# 삭제
controlplane ~ ➜  k delete csr agent-smith 
certificatesigningrequest.certificates.k8s.io "agent-smith" deleted

KubeConfig

기본 kubeconfig 위치

/root/.kube/config

context 특정 파일 이용해서 변경

controlplane ~/.kube ➜  kubectl config --kubeconfig=/root/my-kube-config use-context research
Switched to context "research".

kubeconfig yaml 파일

users:
- name: aws-user
  user:
    client-certificate: /etc/kubernetes/pki/users/aws-user/aws-user.crt
    client-key: /etc/kubernetes/pki/users/aws-user/aws-user.key
- name: dev-user
  user:
    client-certificate: /etc/kubernetes/pki/users/dev-user/developer-user.crt
    client-key: /etc/kubernetes/pki/users/dev-user/dev-user.key
- name: test-user
  user:
    client-certificate: /etc/kubernetes/pki/users/test-user/test-user.crt
    client-key: /etc/kubernetes/pki/users/test-user/test-user.key

인증 관련 문제로 출력되지 않는 에러

controlplane ~/.kube ➜  k get pods
error: unable to read client-cert /etc/kubernetes/pki/users/dev-user/developer-user.crt for dev-user due to open /etc/kubernetes/pki/users/dev-user/developer-user.crt: no such file or directory

config 파일에서 user의 경우 crt 파일이나 key 파일이 일치하지 않으면 문제가 발생한다. 만약 위와 같은 로그가 발생 시 인증서나 key를 확인하는 것도 필요할듯

Role Based Access Controls

확인 명령어

# api server authorization mode 확인
controlplane ~ ➜  k get po -n kube-system kube-apiserver-controlplane -o yaml
- --authorization-mode=Node,RBAC

# role 확인 명령어
controlplane ~ ✖ k get role
No resources found in default namespace.

# 아래와 같이 확인하면 리소스와 어떤 action 가능한지도 있음
controlplane ~ ➜  k get role -n kube-system kube-proxy -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  creationTimestamp: "2023-12-27T08:12:47Z"
  name: kube-proxy
  namespace: kube-system
  resourceVersion: "284"
  uid: c6de2a17-a497-41eb-b97b-362457caf43b
rules:
- apiGroups:
  - ""
  resourceNames:
  - kube-proxy
  resources:
  - configmaps
  verbs:
  - get

# 아래와 같이 확인하면 어떤 롤이 어떤 개체에 바인딩 되어 있는지 확인 가능
controlplane ~ ➜  k get rolebinding -n kube-system kube-proxy -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  creationTimestamp: "2023-12-27T12:26:02Z"
  name: kube-proxy
  namespace: kube-system
  resourceVersion: "292"
  uid: f3334052-d107-46d8-8d6c-fdd3d88b4486
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: kube-proxy
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:bootstrappers:kubeadm:default-node-token

# 특정 유저의 Role로 Action 실행하기
controlplane ~ ➜  k get po --as dev-user # --as 옵션을 이용해서 특정 user로 명령어를 실행할 수 있다.
Error from server (Forbidden): pods is forbidden: User "dev-user" cannot list resource "pods" in API group "" in the namespace "default"

# Role과 RoleBinding 생성하기
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  namespace: default
  name: developer
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["list", "create","delete"]

---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: dev-user-binding
subjects:
- kind: User
  name: dev-user
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: developer
  apiGroup: rbac.authorization.k8s.io

명령형으로 Role 과 RoleBinding 만들기

controlplane ~ ✖ k create role developer --verb list,create,delete --resource pods -o yaml --dry-run=client > role.yaml

controlplane ~ ➜  k create rolebinding dev-user-binding --role developer --user dev-user --dry-run=client -o yaml > rb.yaml

Cluster Roles

Get

controlplane ~ ✖ k get clusterrole
controlplane ~ ➜  k get clusterrolebindings.rbac.authorization.k8s.io

명령형으로 생성하기

controlplane ~ ✖ k create clusterrole node-admin --resource node --verb list,get,watch,delete,create
clusterrole.rbac.authorization.k8s.io/node-admin created

controlplane ~ ➜  k create clusterrolebinding michelle-binding --user michelle --clusterrole node-admin
clusterrolebinding.rbac.authorization.k8s.io/michelle-binding created

Storage role 생성하기

controlplane ~ ➜  k create clusterrole storage-admin --resource persistentvolumes,storageclasses --verb create,list,get,watch,delete
clusterrole.rbac.authorization.k8s.io/storage-admin created

controlplane ~ ➜  k create clusterrolebinding michelle-storage-admin --clusterrole storage-admin --user michelle
clusterrolebinding.rbac.authorization.k8s.io/michelle-storage-admin created

리소스를 다중으로 적고 생성하게 되면 모든 리소스에 대한 verb가 생성됨

Service Accounts

확인 명령어

controlplane ~ ➜  k get sa default -o yaml

파드에서 사용중인 serviceAccount 관련 정보 확인

controlplane ~ ➜  k get po web-dashboard-97c9c59f6-9gjzx -o yaml
~~~
volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-c9b9c
      readOnly: true
~~~
spec:
  serviceAccountName: default
~~~

생성

controlplane ~ ➜  k create sa dashboard-sa
serviceaccount/dashboard-sa created

deployment에 serviceAccount 설정

spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      name: web-dashboard
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        name: web-dashboard
    spec:
      serviceAccountName: dashboard-sa
      containers:

serviceAccountName 항목을 ..template.spec 밑에 작성해주면 된다.

Image Security

docker registry 관련 secret 생성

root@controlplane ~ ➜  k create secret docker-registry private-reg-cred --docker-username=dock_user --docker-password=dock_password --docker-server=myprivateregistry.com:5000 --docker-email=dock_user@myprivateregistry.com
secret/private-reg-cred created

deployment에서 private registry 이용하기

~~~
template:
    metadata:
      labels:
        app: web
    spec:
      imagePullSecrets:
      - name: private-reg-cred
~~~

Security Contexts

명령어 특정 User로 실행하기

---
apiVersion: v1
kind: Pod
metadata:
  name: ubuntu-sleeper
  namespace: default
spec:
  securityContext:
    runAsUser: 1010
  containers:
  - command:
    - sleep
    - "4800"
    image: ubuntu
    name: ubuntu-sleeper

spec.securityContext.runAsUser 항목에 원하는 유저의 ID를 적어주면된다.

apiVersion: v1
kind: Pod
metadata:
  name: multi-pod
spec:
  securityContext:
    runAsUser: 1001
  containers:
  -  image: ubuntu
     name: web
     command: ["sleep", "5000"]
     securityContext:
      runAsUser: 1002

  -  image: ubuntu
     name: sidecar
     command: ["sleep", "5000"]

만약 멀티 컨테이너 파드의 경우 컨테이너 안에 있는 securityContext가 우선시 됨
위의 예로는 web 컨테이너는 1002 User, sidecar 컨테이너는 1001 User

spec:
  containers:
  - command:
    - sleep
    - "4800"
    image: ubuntu
    name: ubuntu
    securityContext:
      capabilities:
        add: ["SYS_TIME"]
  nodeName: controlplane

capabilities를 추가하고자 하면 add: [list] 로 추가하면 된다. 다중으로 추가하는 경우 add: ["NET_ADMIN", "SYS_TIME"] 이렇게 추가하면 됨
spec 밑에 두는 securityContext와 container 밑에 두는 securityContext를 잘 구분해야할듯

Network Policies

Yaml 예시

    name: internal-policy
    namespace: default
  spec:
    egress:
    - to:
      - podSelector:
          matchLabels:
            name: payroll
      ports:
      - port: 8080
        protocol: TCP
    - to:
      - podSelector:
          matchLabels:
            name: mysql
      ports:
      - port: 3306
        protocol: TCP
    podSelector:
      matchLabels:
        name: internal
    policyTypes:
    - Egress

spec.egress 밑에 to 를 두거나 spec.ingress 밑에 from을 두어서 ingress 또는 egress 트래픽을 허용해줄수 있다.
주의해야할 점은 ingress와 egress를 동시에 허용해주려면 policyTypes에 Ingress와 Egress를 모두 기재해주어야 함

다중 포트 예시

apiVersion: networking.k8s.io/v1
  kind: NetworkPolicy
  metadata:
    namespace: default
    name: internal-policy
  spec:
    ingress:
    - from:
      - podSelector:
          matchLabels:
            name: internal
      ports:
      - port: 8080
        protocol: TCP
    podSelector:
      matchLabels:
        name: payroll
    egress:
    - to:
      - podSelector:
          matchLabels:
            name: payroll
      ports:
      - port: 8080
        protocol: TCP
    - to:
      - podSelector:
          matchLabels:
            name: mysql
      ports:
      - port: 3306
        protocol: TCP
    - ports:
      - port: 53
        protocol: TCP
      - port: 53
        protocol: UDP
    policyTypes:
    - Ingress
    - Egress

spec.egress[] to 부분을 보면 to를 여러 개 설정해 각기 다른 egress 정책을 설정 가능함.
다른 파드 셀렉터에 다른 포트를 지정하는 경우 유용할 듯하다. (AWS SG의 하나의 규칙이 to 라고 봐도 될듯함)
그리고 spec.egress[] 에서 ports 부분을 보면 달랑 포트만 설정되어 있는 것을 볼 수 있는데 이는 해당 포트에 대한 egress를 모두 허용한다는 이야기

Storage

Persistent Volume Claims

spec:
  containers:
  - name: event-simulator
    image: kodekloud/event-simulator
    env:
    - name: LOG_HANDLERS
      value: file
    volumeMounts:
    - mountPath: /log
      name: log-volume

  volumes:
  - name: log-volume
    hostPath:
      # directory location on host
      path: /var/log/webapp
      # this field is optional
      type: Directory

컨테이너 내부의 path를 hostPath를 이용해서 로컬 호스트의 볼륨을 사용할 수 있게 함

pv 생성 예시

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-log
spec:
  persistentVolumeReclaimPolicy: Retain
  accessModes:
    - ReadWriteMany
  capacity:
    storage: 100Mi
  hostPath:
    path: /pv/log

PVC Yaml 예시

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: claim-log-1
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 50Mi

파드에 PVC 할당 예시

apiVersion: v1
kind: Pod
metadata:
  name: webapp
spec:
  containers:
  - name: event-simulator
    image: kodekloud/event-simulator
    env:
    - name: LOG_HANDLERS
      value: file
    volumeMounts:
    - mountPath: /log
      name: log-volume
  volumes:
  - name: log-volume
    persistentVolumeClaim:
      claimName: claim-log-1

pvc 삭제

controlplane ~ ➜  k delete pvc claim-log-1 
persistentvolumeclaim "claim-log-1" deleted

pod에서 pvc를 사용중일 때에는 삭제되지 않고 hang 상태에 걸리며 pvc는 terminating 상태로 고정된다.

Storage Class

storageclass 확인

controlplane ~ ➜  k get storageclasses.storage.k8s.io 
NAME                        PROVISIONER                     RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
local-path (default)        rancher.io/local-path           Delete          WaitForFirstConsumer   false                  9m23s
local-storage               kubernetes.io/no-provisioner    Delete          WaitForFirstConsumer   false                  5s
portworx-io-priority-high   kubernetes.io/portworx-volume   Delete          Immediate              false                  5s

만약 PROVISIONER가 no-provisioner일 경우 동적 프로비저닝 되지 않는 local 볼륨이다.

스토리지 클래스 Yaml 예시

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: delayed-volume-sc
provisioner: kubernetes.io/no-provisioner
reclaimPolicy: Retain 
volumeBindingMode: WaitForFirstConsumer

이 섹션은 스토리지 클래스에 대한 이해 문제가 많은듯

Networking

Explore Environment

ip 인터페이스 확인하기

controlplane ~ ➜  ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UNKNOWN group default 
    link/ether 62:fd:46:a6:e3:38 brd ff:ff:ff:ff:ff:ff
    inet 10.244.0.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
3: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default qlen 1000
    link/ether 9e:81:1e:b4:0c:31 brd ff:ff:ff:ff:ff:ff
    inet 10.244.0.1/24 brd 10.244.0.255 scope global cni0
       valid_lft forever preferred_lft forever
4: veth9ff14959@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master cni0 state UP group default 
    link/ether 42:25:e2:1e:c8:84 brd ff:ff:ff:ff:ff:ff link-netns cni-deee82e4-8832-478d-1ed6-0ec4a3c1e4df
5: veth425d657c@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master cni0 state UP group default 
    link/ether 2e:5c:20:f1:42:b7 brd ff:ff:ff:ff:ff:ff link-netns cni-38f750b4-0253-1759-7c69-fff870be0128
4246: eth0@if4247: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default 
    link/ether 02:42:c0:10:80:0c brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.16.128.12/24 brd 192.16.128.255 scope global eth0
       valid_lft forever preferred_lft forever
4248: eth1@if4249: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:ac:19:00:2a brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet 172.25.0.42/24 brd 172.25.0.255 scope global eth1
       valid_lft forever preferred_lft forever

controlplane ~ ➜  ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UNKNOWN mode DEFAULT group default 
    link/ether 62:fd:46:a6:e3:38 brd ff:ff:ff:ff:ff:ff
3: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 9e:81:1e:b4:0c:31 brd ff:ff:ff:ff:ff:ff
4: veth9ff14959@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master cni0 state UP mode DEFAULT group default 
    link/ether 42:25:e2:1e:c8:84 brd ff:ff:ff:ff:ff:ff link-netns cni-deee82e4-8832-478d-1ed6-0ec4a3c1e4df
5: veth425d657c@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master cni0 state UP mode DEFAULT group default 
    link/ether 2e:5c:20:f1:42:b7 brd ff:ff:ff:ff:ff:ff link-netns cni-38f750b4-0253-1759-7c69-fff870be0128
4246: eth0@if4247: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 
    link/ether 02:42:c0:10:80:0c brd ff:ff:ff:ff:ff:ff link-netnsid 0
4248: eth1@if4249: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default 
    link/ether 02:42:ac:19:00:2a brd ff:ff:ff:ff:ff:ff link-netnsid 1

ip link 명령어는 아이피는 나오지 않으니 ip a를 통해 확인하면 될듯?

브릿지 확인하는 법

controlplane ~ ✖ ip link show type bridge
3: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 9e:81:1e:b4:0c:31 brd ff:ff:ff:ff:ff:ff

CNI

kubelet endpoint 확인하는 법

controlplane ~ ➜  ps -ef | grep kubelet
root        4689       1  0 07:25 ?        00:00:04 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock --pod-infra-container-image=registry.k8s.io/pause:3.9

cni 지원 플러그인 확인

controlplane ~ ➜  ls /opt/cni/bin
bandwidth  dhcp   firewall  host-device  ipvlan    macvlan  ptp  static  vlan
bridge     dummy  flannel   host-local   loopback  portmap  sbr  tuning  vrf

cni에서 설치되지 않은 플러그인 찾기

controlplane ~ ➜  ls -l /opt/cni/bin | grep -E "vlan|bridge|dhcp|cisco"
-rwxr-xr-x 1 root root  4299004 Jan 16  2023 bridge
-rwxr-xr-x 1 root root 10167415 Jan 16  2023 dhcp
-rwxr-xr-x 1 root root  3999593 Jan 16  2023 ipvlan
-rwxr-xr-x 1 root root  4029261 Jan 16  2023 macvlan
-rwxr-xr-x 1 root root  3993252 Jan 16  2023 vlan

반대로 생각하면 /opt/cni/bin 에 있는 플러그인은 모두 설치된 플러그인

현재 cni에서 사용중인 plugin 확인하기

controlplane ~ ➜  ls /etc/cni/net.d
10-flannel.conflist

Deploy Network Solution

weave plugin daemonset 설치

kubectl apply -f https://github.com/weaveworks/weave/releases/download/v2.8.1/weave-daemonset-k8s.yaml

Networking Weave

현재 사용하고 있는 cni 확인

controlplane ~ ➜  ls /etc/cni/net.d/
10-weave.conflist

weave agent/peers 개수 확인하기

controlplane ~ ✖ k get po -A
NAMESPACE     NAME                                   READY   STATUS    RESTARTS      AGE
kube-system   coredns-5d78c9869d-nbv9t               1/1     Running   0             69m
kube-system   coredns-5d78c9869d-vmd9p               1/1     Running   0             69m
kube-system   etcd-controlplane                      1/1     Running   0             69m
kube-system   kube-apiserver-controlplane            1/1     Running   0             69m
kube-system   kube-controller-manager-controlplane   1/1     Running   0             69m
kube-system   kube-proxy-2vchc                       1/1     Running   0             69m
kube-system   kube-proxy-pxp5g                       1/1     Running   0             69m
kube-system   kube-scheduler-controlplane            1/1     Running   0             69m
kube-system   weave-net-b5ddf                        2/2     Running   0             69m
kube-system   weave-net-s8hcr                        2/2     Running   1 (69m ago)   69m

weave-net 이름이 붙은 파드를 확인하면 된다.

bridge 타입의 리소스 확인 (weave가 생성한 bridge를 확인할때)

controlplane ~ ✖ ip link show type bridge
4: weave: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether ce:3a:ac:d2:fa:17 brd ff:ff:ff:ff:ff:ff

weave에서 설정한 Pod IP 주소 범위는?

env:
    - name: HOSTNAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: spec.nodeName
    - name: IPALLOC_RANGE
      value: 10.244.0.0/16

IPALLOC_RANGE 범위를 참고하면 된다.

특정 노드 default gateway 확인하는 법

controlplane ~ ✖ k run po --dry-run=client --image=busybox -o yaml temp > temp.yaml

controlplane ~ ➜  vi temp.yaml 

controlplane ~ ➜  k apply -f temp.yaml
pod/temp created

controlplane ~ ➜  k exec temp -- ip route
default via 10.244.192.0 dev eth0 
10.244.0.0/16 dev eth0 scope link  src 10.244.192.1 

controlplane ~ ➜  ssh node01

root@node01 ~ ➜  ip route
default via 172.25.0.1 dev eth1 
10.244.0.0/16 dev weave proto kernel scope link src 10.244.192.0 
172.25.0.0/24 dev eth1 proto kernel scope link src 172.25.0.60 
192.12.183.0/24 dev eth0 proto kernel scope link src 192.12.183.6

두 가지 방법이 있음 , busybox로 nodeName 이용해서 특정 노드에 파드 생성해준 뒤 ip route 명령어를 이용해서 나오는 default 주소를 확인하거나
ssh node01로 이동해서 ip route를 입력해 나오는 weave의 src 주소를 확인

Service Networking

node의 ip 범위 확인

controlplane ~ ➜  ip addr | grep eth0
4360: eth0@if4361: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default 
    inet 192.13.147.9/24 brd 192.13.147.255 scope global eth0

inet 뒤의 192.13.147.9/24 주소를 확인하면 프리픽스가 24인 것을 확인 가능

pod의 ip 범위 확인

controlplane ~ ➜  ip addr | grep weave
4: weave: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue state UP group default qlen 1000
    inet 10.244.0.1/16 brd 10.244.255.255 scope global weave
7: vethwe-bridge@vethwe-datapath: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
10: vethwepl7bf057a@if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
12: vethwepl06582fa@if11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default

pod의 범위 확인은 cni의 inet 10.244.0.1/16 주소를 확인하면 됨

서비스의 ip 범위 확인

controlplane ~ ➜  cat /etc/kubernetes/manifests/kube-apiserver.yaml | grep cluster-ip-range
    - --service-cluster-ip-range=10.96.0.0/12

kube-apiser yaml 파일에서 service-cluster-ip-range 항목이 서비스의 ip 범위이다.

CoreDNS in Kubernetes

사용중인 DNS 솔루션 확인

controlplane ~ ➜  k get po -n kube-system
NAME                                   READY   STATUS    RESTARTS   AGE
coredns-5d78c9869d-jpc8g               1/1     Running   0          118s
coredns-5d78c9869d-xn6cl               1/1     Running   0          118s
etcd-controlplane                      1/1     Running   0          2m14s
kube-apiserver-controlplane            1/1     Running   0          2m14s
kube-controller-manager-controlplane   1/1     Running   0          2m10s
kube-proxy-ct69c                       1/1     Running   0          118s
kube-scheduler-controlplane            1/1     Running   0          2m10s

kube-system 네임스페이스에 있는 파드를 확인하면 된다. 보통 DNS 솔루션은 kube-system 에 있음

사용중인 DNS 솔루션의 서비스 네임 확인

controlplane ~ ➜  k get svc -n kube-system
NAME       TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
kube-dns   ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   3m16s

보통 CoreDNS를 설치하면 kube-dns라는 이름으로 서비스가 생성된다.

CoreDNS 서비스의 config 파일 확인

controlplane ~ ➜  ps -ef | grep core
root        6544    6419  0 05:54 ?        00:00:00 /coredns -conf /etc/coredns/Corefile
root        7070    6900  0 05:54 ?        00:00:00 /coredns -conf /etc/coredns/Corefile
root       10383    8159  0 05:59 pts/0    00:00:00 grep --color=auto core

프로세스에서 core를 검색해보면 coredns -conf 인자 값으로 Corefile이라는 이름으로 config 파일이 구성되어 있는 것을 알 수 있음

Corefile이 CoreDNS Pod로 어떻게 주입되는지 확인

controlplane ~ ➜  k get po -n kube-system coredns-5d78c9869d-jpc8g -o yaml
apiVersion: v1
~~~
volumes:
  - configMap:
      defaultMode: 420
      items:
      - key: Corefile
        path: Corefile
      name: coredns
    name: config-volume

configMap의 형태로 volumeMount 되어 파드에 제공된다.

CoreDNS의 root domain 확인

controlplane ~ ➜  k get cm -n kube-system -o yaml coredns
apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        health {
           lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
           ttl 30
        }
        prometheus :9153
        forward . /etc/resolv.conf {
           max_concurrent 1000
        }
        cache 30
        loop
        reload
        loadbalance
    }
kind: ConfigMap
metadata:
  creationTimestamp: "2024-01-09T10:53:48Z"
  name: coredns
  namespace: kube-system
  resourceVersion: "275"
  uid: 7db77577-2087-498d-9d1f-fad73b0d0bb9

data 항목에서 Corefile 항목을 확인하면 kubernetes 다음에 있는 cluster.local이 root domain이다. 해당 자리에 오는 도메인이 루트 도메인

CKA - Ingress Networking - 1

ingress resource 확인하기

controlplane ~ ➜  k get ingress -A
NAMESPACE   NAME                 CLASS    HOSTS   ADDRESS       PORTS   AGE
app-space   ingress-wear-watch   <none>   *       10.105.9.73   80      99s

CKA - Ingress Networking - 2

Install

Cluster Installation using Kubeadm

Troubleshooting

Application Failure

주로 Selector나 deployment 등 2-Tier 구조에서 애플리케이션의 문제를 해결하는 문제가 많이 나왔음
문제 유형으로는

디플로이먼트의 포트가 상이함
셀렉터의 레이블이 일치하지 않음
env에 있는 DB 호스트 정보가 실제로 구동중인 DB Pod와 일치하지 않음
env에 있는 DB User 정보가 구동중인 DB Pod와 일치하지 않음
- User를 root로 변경하니 정상 처리 됨
노드 포트의 포트 번호가 문제와 상이함

Control Plane Failure

문제 유형

scheduler의 Exec 명령어 부분이 이상해서 고쳐 주었음. 밑에는 그 에러로그임

exec: "kube-schedulerrrr": executable file not found in $PATH: unknown'

controller manager의 kubeconfig 옵션이 이상하게 설정 되어 있었음. 원래 대로인 controller-manager.conf 로 변경 해주었음.

--kubeconfig=/etc/kubernetes/controller-manager-XXXX.conf

controller가 고장나 있었음. 로그를 체크 해보니 ca.crt 파일이 없다고 함. 그래서 yaml 파일을 살펴보다 hostPath가 잘못되어 있는것 발견해 고쳐주었음

controlplane /etc/kubernetes ✖ k logs -n kube-system kube-controller-manager-controlplane I0112 05:55:47.202450 1 serving.go:348] Generated self-signed cert in-memory E0112 05:55:47.415001 1 run.go:74] "command failed" err="unable to load client CA provider: open /etc/kubernetes/pki/ca.crt: no such file or directory"
hostPath:
path: /etc/kubernetes/WRONG-PKI-DIRECTORY type: DirectoryOrCreate`

Worker Node Failure