使用 Helm 安装 Prometheus Stack

安装 Prometheus

添加 Prometheus chart repo 到 Helm

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

查看版本信息

$ helm search repo prometheus-community/kube-prometheus-stack
NAME                                            CHART VERSION   APP VERSION     DESCRIPTION
prometheus-community/kube-prometheus-stack      57.0.2          v0.72.0         kube-prometheus-stack collects Kubernetes manif...

修改配置文件

参考文档:https://cloud-atlas.readthedocs.io/zh-cn/latest/kubernetes/monitor/prometheus/kube-prometheus-stack_persistent_volume.html

kube-prometheus-stack.values 配置简单的本地 NFS 存储卷(案例包含了 prometheus/alertmanager/thanos/grafana)

$ helm pull  prometheus-community/kube-prometheus-stack
$ vim /apps/helm_chart/kube-prometheus-stack/values.yaml
---
    ## Storage is the definition of how storage will be used by the Alertmanager instances.
    storage:
      volumeClaimTemplate:
        spec:
          storageClassName: qiqios-nfs-storage
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 10Gi
...
grafana:
  enabled: true
  namespaceOverride: ""

  defaultDashboardsTimezone: Asia/Shanghai
  adminPassword: prom-operator
  persistence:
    enabled: true
    type: pvc
    storageClassName: qiqios-nfs-storage
    accessModes:
      - ReadWriteOnce
    size: 20Gi
    finalizers:
      - kubernetes.io/pvc-protection
...
# 配置Prometheus持久化NFS存储
  prometheus: 
    prometheusSpec:
      podMonitorSelectorNilUsesHelmValues: false
      serviceMonitorSelectorNilUsesHelmValues: false
    ## Prometheus StorageSpec for persistent data
    storageSpec: {}
    ## Using PersistentVolumeClaim
      volumeClaimTemplate:
        spec:
          storageClassName: qiqios-nfs-storage
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 20Gi
...
    ## Storage is the definition of how storage will be used by the ThanosRuler instances.
    ## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/user-guides/storage.md
    ##
    storage:
      volumeClaimTemplate:
        spec:
          storageClassName: qiqios-nfs-storage
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 5Gi

替换镜像

$ vim ./charts/kube-state-metrics/values.yaml
prometheusScrape: true
image:
  registry: registry.cn-hangzhou.aliyuncs.com
  repository: qiqios/kube-state-metrics

$ vim value.yaml
  alertmanagerSpec:
    ## Standard object's metadata. More info: 
    podMetadata: {}
    ## Image of Alertmanager
    ##
    image:
      registry: registry.cn-hangzhou.aliyuncs.com
      repository: qiqios/prometheus-alertmanager
      tag: v0.27.0
...
    patch:
      enabled: true
      image:
        registry: registry.cn-hangzhou.aliyuncs.com
        repository: qiqios/kube-webhook-certgen
        tag: v20221220-controller-v1.5.1-58-g787ea74b6
...
    ## Image of Prometheus.
    ##
    image:
      registry: registry.cn-hangzhou.aliyuncs.com
      repository: qiqios/prometheus
      tag: v2.50.1
...
    ## Image of ThanosRuler
    ##
    image:
      registry: registry.cn-hangzhou.aliyuncs.com
      repository: qiqios/thanos
      tag: v0.34.1

使用 Helm 更新版本重新部署

$ helm upgrade prometheus  --namespace monitoring --create-namespace -f values.yaml .
Release "prometheus" has been upgraded. Happy Helming!
NAME: prometheus
LAST DEPLOYED: Thu Mar 14 18:41:39 2024
NAMESPACE: monitoring
STATUS: deployed
REVISION: 3
NOTES:
kube-prometheus-stack has been installed. Check its status by running:
  kubectl --namespace monitoring get pods -l "release=prometheus"

$ helm list -n monitoring
NAME            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                           APP VERSION
prometheus      monitoring      3               2024-03-14 18:41:39.76472226 +0800 CST  deployed        kube-prometheus-stack-57.0.2    v0.72.0

查看资源组件情况

$ kubectl get pod -n monitoring
NAME                                                     READY   STATUS    RESTARTS   AGE
alertmanager-prometheus-kube-prometheus-alertmanager-0   2/2     Running   0          14m
prometheus-grafana-5d6d76fff-nmh57                       3/3     Running   0          19m
prometheus-kube-prometheus-operator-57756d9c7b-bx782     1/1     Running   0          19m
prometheus-kube-state-metrics-b8bd9d947-lq8nw            1/1     Running   0          24s
prometheus-prometheus-kube-prometheus-prometheus-0       2/2     Running   0          19m
prometheus-prometheus-node-exporter-hxvql                1/1     Running   0          19m
prometheus-prometheus-node-exporter-kt6b4                1/1     Running   0          19m
prometheus-prometheus-node-exporter-ql6v6                1/1     Running   0          19m
prometheus-prometheus-node-exporter-vqjxj                1/1     Running   0          19m

$ kubectl get svc -n monitoring
NAME                                      TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
alertmanager-operated                     ClusterIP   None            <none>        9093/TCP,9094/TCP,9094/UDP   19m
prometheus-grafana                        ClusterIP   10.68.99.41     <none>        80/TCP                       20m
prometheus-kube-prometheus-alertmanager   ClusterIP   10.68.176.180   <none>        9093/TCP,8080/TCP            20m
prometheus-kube-prometheus-operator       ClusterIP   10.68.15.44     <none>        443/TCP                      20m
prometheus-kube-prometheus-prometheus     ClusterIP   10.68.176.135   <none>        9090/TCP,8080/TCP            20m
prometheus-kube-state-metrics             ClusterIP   10.68.149.120   <none>        8080/TCP                     20m
prometheus-operated                       ClusterIP   None            <none>        9090/TCP                     19m
prometheus-prometheus-node-exporter       ClusterIP   10.68.244.169   <none>        9100/TCP                     20m

$ kubectl get pvc -n monitoring
NAME                                                                                                             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS         AGE
alertmanager-prometheus-kube-prometheus-alertmanager-db-alertmanager-prometheus-kube-prometheus-alertmanager-0   Bound    pvc-cec53309-239f-41bb-9ae3-628f4d01754d   10Gi       RWO            qiqios-nfs-storage   20m
prometheus-grafana                                                                                               Bound    pvc-05d82f6f-38c9-4ab0-9ad0-6dd58f8a63c8   20Gi       RWO            qiqios-nfs-storage   20m
prometheus-prometheus-kube-prometheus-prometheus-db-prometheus-prometheus-kube-prometheus-prometheus-0           Bound    pvc-b910f3f8-aafb-43a6-a333-1a4774d94c26   20Gi       RWO            qiqios-nfs-storage   20m

配置 Grafana 域名和证书

# grafana-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: grafana
  namespace: monitoring
  annotations:
    cert-manager.io/cluster-issuer: cert-manager-webhook-dnspod-cluster-issuer # 配置自动生成 https 证书
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - 'grafana.qiqios.com'
      secretName: grafana-letsencrypt-tls
  rules:
    - host: 'grafana.qiqios.com'
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: prometheus-grafana
                port:
                  number: 80

等待 HTTPS 证书签发完成

$ kubectl get certificate -n monitoring
NAME                      READY   SECRET                    AGE
grafana-letsencrypt-tls   True    grafana-letsencrypt-tls   81s

访问测试

$ curl -v https://grafana.qiqios.com --resolve grafana.qiqios.com:443:192.168.1.63
* Added grafana.qiqios.com:443:192.168.1.63 to DNS cache
* Hostname grafana.qiqios.com was found in DNS cache
*   Trying 192.168.1.63:443...
* TCP_NODELAY set
* Connected to grafana.qiqios.com (192.168.1.63) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=grafana.qiqios.com
*  start date: Mar 14 10:01:07 2024 GMT
*  expire date: Jun 12 10:01:06 2024 GMT
*  subjectAltName: host "grafana.qiqios.com" matched cert's "grafana.qiqios.com"
*  issuer: C=US; O=Let's Encrypt; CN=R3
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x556977dd2650)
> GET / HTTP/2
> Host: grafana.qiqios.com
> user-agent: curl/7.68.0
> accept: */*
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
< HTTP/2 302
< date: Thu, 14 Mar 2024 11:05:02 GMT
< content-type: text/html; charset=utf-8
< content-length: 29
< cache-control: no-store
< location: /login
< x-content-type-options: nosniff
< x-frame-options: deny
< x-xss-protection: 1; mode=block
< strict-transport-security: max-age=15724800; includeSubDomains
<
<a href="/login">Found</a>.

* Connection #0 to host grafana.qiqios.com left intact

使用 Helm 安装 Prometheus Stack
http://www.qiqios.cn/2024/03/10/2024-3-10-使用-Helm-安装-Prometheus-Stack/
作者
一亩三分地
发布于
2024年3月10日
许可协议