K8S集群基于GPU显卡指标实现HPA功能

前言

  1. 已经部署好了Prometheus server+dcgm-exporter相关组件,dcgm-exporter包含exporter、service、servicemonitor。
  2. 调整Prometheus的configmap,可以获取dcgm-exporter的采集指标
  3. dcgm-exporter 2.0 官网指标数据

原理: Kubernetes 支持HPA模块进行容器伸缩,默认支持CPU和内存等指标。原生的HPA基于Heapster,不支持GPU指标的伸缩,但是支持通过CustomMetrics的方式进行HPA指标的扩展。我们可以通过部署一个基于Prometheus Adapter 作为CustomMetricServer,它能将Prometheus指标注册的APIServer接口,提供HPA调用。 通过配置,HPA将CustomMetric作为扩缩容指标, 可以进行GPU指标的弹性伸缩。
size:800,1000

dcgm-exporter组件部署

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: "dcgm-exporter"
namespace: monitoring
labels:
app.kubernetes.io/name: "dcgm-exporter"
app.kubernetes.io/version: "2.1.0"
spec:
updateStrategy:
type: RollingUpdate
selector:
matchLabels:
app.kubernetes.io/name: "dcgm-exporter"
app.kubernetes.io/version: "2.1.0"
template:
metadata:
labels:
app.kubernetes.io/name: "dcgm-exporter"
app.kubernetes.io/version: "2.1.0"
name: "dcgm-exporter"
spec:
containers:
- image: "mirrors.com:80/rancher/dcgm-exporter:2.0.13-2.1.1-ubuntu18.04"
env:
- name: "DCGM_EXPORTER_LISTEN"
value: ":9400"
- name: "DCGM_EXPORTER_KUBERNETES"
value: "true"
name: "dcgm-exporter"
ports:
- name: "metrics"
containerPort: 9400
securityContext:
runAsNonRoot: false
runAsUser: 0
volumeMounts:
- name: "pod-gpu-resources"
readOnly: true
mountPath: "/var/lib/kubelet/pod-resources"
volumes:
- name: "pod-gpu-resources"
hostPath:
path: "/var/lib/kubelet/pod-resources"
nodeSelector:
gpu-type: T4
---
kind: Service
apiVersion: v1
metadata:
name: "dcgm-exporter"
namespace: monitoring
labels:
app.kubernetes.io/name: "dcgm-exporter"
app.kubernetes.io/version: "2.1.0"
spec:
selector:
app.kubernetes.io/name: "dcgm-exporter"
app.kubernetes.io/version: "2.1.0"
ports:
- name: "metrics"
port: 9400
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: dcgm-exporter
namespace: monitoring
labels:
app.kubernetes.io/name: dcgm-exporter
app.kubernetes.io/version: "2.1.0"
spec:
selector:
matchLabels:
app.kubernetes.io/name: dcgm-exporter
app.kubernetes.io/version: "2.1.0"
endpoints:
- port: metrics
interval: 30s # 根据需要调整抓取间隔
scheme: http
namespaceSelector:
matchNames:
- monitoring # 指定ServiceMonitor所在的命名空间

使用helm 部署安装Prometheus-adapter组件

项目地址:https://github.com/kubernetes-sigs/prometheus-adapter

helm地址:https://github.com/helm/helm/releases

1
2
3
4
#离线部署,需要找到公网机器,提前把对应的chart包下载
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm update
helm pull prometheus-community/prometheus-adapter
  • 修改values.yaml

adapter rules规则文档:https://github.com/kubernetes-sigs/prometheus-adapter/blob/master/docs/config-walkthrough.md

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
...
image:
repository: mirrors.com:80/monitoring/prometheus-adapter
tag: v0.12.0

...
prometheus:
# Value is templated
url: http://prometheus-k8s.monitoring.svc #匹配当前环境的Prometheus svc地址
port: 9090
path: ""

...
rules:
default: true

custom: #配置rule指标转化
- seriesQuery: '{UUID!=""}'
resources:
overrides:
node: {resource: "node"}
exported_pod: {resource: "pod"}
exported_namespace: {resource: "namespace"}
name:
matches: ^DCGM_FI_(.*)$
as: "${1}_over_time"
metricsQuery: ceil(avg_over_time(<<.Series>>{<<.LabelMatchers>>}[3m]))
- seriesQuery: '{UUID!=""}'
resources:
overrides:
node: {resource: "node"}
exported_pod: {resource: "pod"}
exported_namespace: {resource: "namespace"}
name:
matches: ^DCGM_FI_(.*)$
as: "${1}_current"
metricsQuery: <<.Series>>{<<.LabelMatchers>>}
  • 安装Prometheus-adapter
1
2
3
4
helm install prometheus-adapter -f values.yaml -n kube-system .

# 查看对应指标是否获取
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | grep 'DEV_GPU_UTIL_current'
  • 修改Prometheus的configmap, 确认是否配置自动发现
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
$ kubectl edit cm -n monitoring prometheus-config
apiVersion: v1
data:
config.yml: |
basic_auth_users:
admin: $2y$12$jlwC.4777WgcQaSb14aFROxK6sRvQCKNBAxgYzM6guEjD.E2/HH4e
prometheus.yml: |
global:
scrape_interval: 15s
scrape_timeout: 10s
evaluation_interval: 1m
scrape_configs:
- job_name: 'kubernetes-gpu'
kubernetes_sd_configs:
- role: pod
namespaces:
own_namespace: false
names:
- monitoring
relabel_configs:
- source_labels: [__address__]
action: keep
regex: '(.*):9400'
- source_labels: [__meta_kubernetes_pod_controller_name]
action: keep
regex: 'dcgm-exporter'
- source_labels: [__meta_kubernetes_pod_node_name]
action: replace
target_label: node
- source_labels: [__meta_kubernetes_pod_host_ip]
action: replace
target_label: node_ip

测试GPU 服务的弹性扩缩容

  • 创建推理服务
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
#bert.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
name: bert-intent-detection
spec:
replicas: 1
selector:
matchLabels:
app: bert-intent-detection
template:
metadata:
labels:
app: bert-intent-detection
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: gpu-type
operator: In
values:
- T4
containers:
- name: bert-container
image: mirrors.com:80/xiaomishu/bert-intent-detection:1.0.1
ports:
- containerPort: 80
resources:
limits:
nvidia.com/gpu: 1
---
apiVersion: v1
kind: Service
metadata:
name: bert-intent-detection-svc
labels:
app: bert-intent-detection
spec:
selector:
app: bert-intent-detection
ports:
- protocol: TCP
name: http
port: 8081
targetPort: 80

#服务测试
curl -v http://10.43.214.241:8081/predict?query=Music
* Trying 10.43.214.241:8081...
* Connected to 10.43.214.241 (10.43.214.241) port 8081 (#0)
> GET /predict?query=Music HTTP/1.1
> Host: 10.43.214.241:8081
> User-Agent: curl/7.71.1
> Accept: */*
>
* Mark bundle as not supporting multiuse
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Content-Type: text/html; charset=utf-8
< Content-Length: 9
< Server: Werkzeug/1.0.1 Python/3.6.9
< Date: Tue, 10 Dec 2024 06:06:15 GMT
<
* Closing connection 0
PlayMusi
  • 创建hpa,根据GPU 利用率进行扩缩容
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#bert-hpa.yaml,注意集群版本,当前环境是小于 1.23
apiVersion: autoscaling/v2beta1 # 使用autoscaling/v2beta1版本的HPA配置。
kind: HorizontalPodAutoscaler
metadata:
name: gpu-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: bert-intent-detection
minReplicas: 1
maxReplicas: 10
metrics:
- type: Pods
pods:
metricName: DEV_GPU_UTIL_current #当前GPU使用率
targetAverageValue: 20
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
gpu-hpa Deployment/bert-intent-detection 0/20 1 10 1 4h1m

$ kubectl describe hpa gpu-hpa
Name: gpu-hpa
Namespace: default
Labels: <none>
Annotations: <none>
CreationTimestamp: Tue, 10 Dec 2024 10:01:52 +0800
Reference: Deployment/bert-intent-detection
Metrics: ( current / target )
"DEV_GPU_UTIL_current" on pods: 0 / 20
Min replicas: 1
Max replicas: 10
Deployment pods: 1 current / 1 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ReadyForNewScale recommended size matches current size
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from pods metric DEV_GPU_UTIL_current
ScalingLimited True TooFewReplicas the desired replica count is less than the minimum replica count
Events: <none>
  • 压力测试

hey 工具地址:https://gitcode.com/gh_mirrors/he/hey/

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#压力测试扩容
hey -n 10000 -c 200 "http://10.43.214.241:8081/predict?query=Music"

$ kubectl get hpa -w
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
gpu-hpa Deployment/bert-intent-detection 19/20 1 10 1 4h7m
gpu-hpa Deployment/bert-intent-detection 36/20 1 10 1 4h7m
gpu-hpa Deployment/bert-intent-detection 36/20 1 10 2 4h8m

$ kubectl get pod -w
NAME READY STATUS RESTARTS AGE
bert-intent-detection-985fd9b57-cq25p 1/1 Running 0 16s
bert-intent-detection-985fd9b57-lqbxb 1/1 Running 0 4h21m

#压测停止,GPU利用率降低且低于20%后,pod五分钟后开始进行弹性缩容
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
gpu-hpa Deployment/bert-intent-detection 0/20 1 10 2 4h11m

$ kubectl get pod
NAME READY STATUS RESTARTS AGE
bert-intent-detection-985fd9b57-lqbxb 1/1 Running 0 4h35m

K8S集群基于GPU显卡指标实现HPA功能
http://example.com/2025/04/25/k8s集群基于GPU显卡实现hpa 功能/
作者
种田人
发布于
2025年4月25日
许可协议