動かざることバグの如し

近づきたいよ 君の理想に

Kubernetesでmetrics-serverをインストールする方法

環境

なぜ必要なのか

kubernetes-dashboardをインストールするにあたってメトリクスを取る必要があるため

kubectl top nodesコマンドが使えるようになりたい。

# kubectl top nodes
error: Metrics API not available

インストール方法

公式サイトのREADME参照

kubernetes-sigs/metrics-server: Scalable and efficient source of container resource metrics for Kubernetes built-in autoscaling pipelines.

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

x509: cannot validate certificate for because it doesn't contain any IP SANsエラーになる

なかなかREADYにならないのでログを見てみると、以下のようなエラーログが大量に出ている。

$ kubectl logs -n kube-system -l k8s-app=metrics-server --container metrics-server
E0724 00:32:10.210303       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.16.11:10250/metrics/resource\": x509: cannot validate certificate for 192.168.16.11 because it doesn't contain any IP SANs" node="ubuntu01"
E0724 00:32:10.217559       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.16.12:10250/metrics/resource\": x509: cannot validate certificate for 192.168.16.12 because it doesn't contain any IP SANs" node="ubuntu02"
E0724 00:32:10.294571       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.16.14:10250/metrics/resource\": x509: cannot validate certificate for 192.168.16.14 because it doesn't contain any IP SANs" node="ubuntu04"
E0724 00:32:10.499675       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.16.13:10250/metrics/resource\": x509: cannot validate certificate for 192.168.16.13 because it doesn't contain any IP SANs" node="ubuntu03"
I0724 00:32:13.045420       1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0724 00:32:23.046614       1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0724 00:32:25.209840       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.16.11:10250/metrics/resource\": x509: cannot validate certificate for 192.168.16.11 because it doesn't contain any IP SANs" node="ubuntu01"
E0724 00:32:25.223995       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.16.14:10250/metrics/resource\": x509: cannot validate certificate for 192.168.16.14 because it doesn't contain any IP SANs" node="ubuntu04"
E0724 00:32:25.230216       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.16.12:10250/metrics/resource\": x509: cannot validate certificate for 192.168.16.12 because it doesn't contain any IP SANs" node="ubuntu02"
E0724 00:32:25.530773       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.16.13:10250/metrics/resource\": x509: cannot validate certificate for 192.168.16.13 because it doesn't contain any IP SANs" node="ubuntu03"

公式READMEにもある通り、

Kubelet certificate needs to be signed by cluster Certificate Authority

とある。なんで内部通信なのに証明書用意しなきゃいけないんだよってことで「--kubelet-insecure-tls」を起動パラメータに追加してエラーを殺す。

以下で編集し、

$ kubectl edit deploy metrics-server -n kube-system

以下の通りに編集して保存

spec:
  containers:
  - args:
    - --cert-dir=/tmp
    - --secure-port=4443
-    - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
+    - --kubelet-preferred-address-types=InternalIP
+    - --kubelet-insecure-tls
    - --kubelet-use-node-status-port
    - --metric-resolution=15s

でpodを再起動すればREADYになるはず

$ kubectl logs -n kube-system -l k8s-app=metrics-server --container metrics-server
I0728 12:50:00.736910       1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0728 12:50:00.737024       1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I0728 12:50:00.737127       1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0728 12:50:00.737751       1 dynamic_serving_content.go:131] "Starting controller" name="serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key"
I0728 12:50:00.737880       1 secure_serving.go:266] Serving securely on [::]:4443
I0728 12:50:00.738133       1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
W0728 12:50:00.738336       1 shared_informer.go:372] The sharedIndexInformer has started, run more than once is not allowed
I0728 12:50:00.837096       1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file 
I0728 12:50:00.837115       1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController 
I0728 12:50:00.837335       1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file 

まとめ

topリソースが使えるようになってた

$ kubectl top nodes
NAME       CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
ubuntu01   282m         3%     2154Mi          31%       
ubuntu02   183m         1%     2702Mi          37%       
ubuntu03   124m         3%     2311Mi          29%       
ubuntu04   325m         8%     2218Mi          62%       

参考リンク