HPA与VPA自动扩缩容

约 1516 字大约 5 分钟

kubernetesautoscaling

2025-06-15

自动扩缩容是 Kubernetes 的核心能力之一，能根据负载动态调整资源。本文将深入讲解 HPA（水平自动扩缩）、VPA（垂直自动扩缩）、KEDA（事件驱动扩缩）以及集群自动扩缩器的原理和实践。

扩缩容体系总览

HPA v2（Horizontal Pod Autoscaler）

基于 CPU/Memory 的扩缩

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: app-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: app
  minReplicas: 3
  maxReplicas: 50
  metrics:
    # CPU 利用率
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70      # 目标 CPU 利用率 70%
    # Memory 利用率
    - type: Resource
      resource:
        name: memory
        target:
          type: AverageValue
          averageValue: 500Mi         # 目标平均内存使用
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60  # 扩容稳定窗口
      policies:
        - type: Percent
          value: 100                  # 每次最多扩容 100%
          periodSeconds: 60
        - type: Pods
          value: 10                   # 每次最多扩容 10 个 Pod
          periodSeconds: 60
      selectPolicy: Max               # 取最大值
    scaleDown:
      stabilizationWindowSeconds: 300 # 缩容稳定窗口 5 分钟
      policies:
        - type: Percent
          value: 10                   # 每次最多缩容 10%
          periodSeconds: 60
      selectPolicy: Min               # 取最小值（更保守）

HPA 计算公式：

期望副本数 = ceil[当前副本数 * (当前指标值 / 目标指标值)]

例如当前 5 个副本，CPU 利用率 90%，目标 70%：ceil[5 * (90/70)] = ceil[6.43] = 7

基于自定义指标的扩缩

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: app-hpa-custom
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: app
  minReplicas: 2
  maxReplicas: 30
  metrics:
    # Pod 级别自定义指标
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "1000"        # 每个 Pod 平均 1000 RPS
    # 外部指标（如消息队列长度）
    - type: External
      external:
        metric:
          name: queue_messages_ready
          selector:
            matchLabels:
              queue: orders
        target:
          type: Value
          value: "100"                # 队列消息数低于 100
    # Object 指标（如 Ingress QPS）
    - type: Object
      object:
        describedObject:
          apiVersion: networking.k8s.io/v1
          kind: Ingress
          name: app-ingress
        metric:
          name: requests_per_second
        target:
          type: Value
          value: "5000"

Metrics Server

Metrics Server 是 HPA 基于资源指标（CPU/Memory）工作的前提：

# 安装 metrics-server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# 验证
kubectl top nodes
kubectl top pods -n production

Prometheus Adapter

通过 Prometheus Adapter 暴露自定义指标给 HPA：

# prometheus-adapter 配置
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-adapter-config
  namespace: monitoring
data:
  config.yaml: |
    rules:
      - seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
        resources:
          overrides:
            namespace: {resource: "namespace"}
            pod: {resource: "pod"}
        name:
          matches: "^(.*)_total$"
          as: "${1}_per_second"
        metricsQuery: 'rate(<<.Series>>{<<.LabelMatchers>>}[2m])'
      - seriesQuery: 'rabbitmq_queue_messages_ready'
        resources:
          overrides:
            namespace: {resource: "namespace"}
        name:
          as: "queue_messages_ready"
        metricsQuery: '<<.Series>>{<<.LabelMatchers>>}'

VPA（Vertical Pod Autoscaler）

VPA 自动调整 Pod 的 CPU 和 Memory 请求量：

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: app
  updatePolicy:
    updateMode: "Auto"            # Auto | Recreate | Initial | Off
  resourcePolicy:
    containerPolicies:
      - containerName: app
        minAllowed:
          cpu: 100m
          memory: 128Mi
        maxAllowed:
          cpu: 4
          memory: 8Gi
        controlledResources: ["cpu", "memory"]
        controlledValues: RequestsAndLimits
      - containerName: sidecar
        mode: "Off"               # 不对 sidecar 容器调整

VPA 组件架构

VPA 更新模式

模式	描述
Off	仅生成推荐值，不自动应用
Initial	仅在 Pod 创建时应用推荐值
Recreate	通过驱逐并重建 Pod 来应用
Auto	当前等同于 Recreate，未来可能支持原地更新

注意：HPA 和 VPA 不应同时基于相同指标（如 CPU）进行扩缩，会产生冲突。推荐组合方式：HPA 基于自定义指标，VPA 管理资源请求。

KEDA（Kubernetes Event-Driven Autoscaling）

KEDA 是基于事件源的扩缩器，支持从 0 到 N 的扩缩：

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: order-processor
  namespace: production
spec:
  scaleTargetRef:
    name: order-processor
  pollingInterval: 15
  cooldownPeriod: 300
  minReplicaCount: 0               # 可以缩到 0！
  maxReplicaCount: 100
  triggers:
    # RabbitMQ 队列
    - type: rabbitmq
      metadata:
        host: amqp://user:pass@rabbitmq.default.svc:5672/
        queueName: orders
        queueLength: "50"          # 每 50 条消息一个副本
    # Kafka topic lag
    - type: kafka
      metadata:
        bootstrapServers: kafka.default.svc:9092
        consumerGroup: order-group
        topic: orders
        lagThreshold: "100"
    # Prometheus 指标
    - type: prometheus
      metadata:
        serverAddress: http://prometheus.monitoring.svc:9090
        metricName: http_requests_total
        threshold: "1000"
        query: sum(rate(http_requests_total{deployment="order-processor"}[2m]))

KEDA 支持 60+ 种事件源，包括：AWS SQS、Azure Service Bus、GCP Pub/Sub、Redis、MySQL、PostgreSQL、Cron 等。

Cluster Autoscaler

集群自动扩缩器根据 Pod 调度需求自动增减节点：

# Cluster Autoscaler Deployment 配置示例
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  template:
    spec:
      containers:
        - name: cluster-autoscaler
          image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.28.0
          command:
            - ./cluster-autoscaler
            - --v=4
            - --cloud-provider=aws
            - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster
            - --balance-similar-node-groups=true
            - --skip-nodes-with-local-storage=false
            - --skip-nodes-with-system-pods=true
            - --scale-down-delay-after-add=10m
            - --scale-down-unneeded-time=10m
            - --scale-down-utilization-threshold=0.5
            - --max-graceful-termination-sec=600

扩容触发条件：存在因资源不足无法调度的 Pending Pod。缩容触发条件：节点资源利用率低于阈值（默认 50%）且持续一段时间。

扩缩策略选型

场景	推荐方案
Web 服务，流量波动	HPA (CPU/RPS) + Cluster Autoscaler
消息消费者	KEDA (队列长度)
批处理任务	KEDA (Cron/队列) 缩到 0
资源分配不合理	VPA (Off 模式先观察)
混合场景	HPA (自定义指标) + VPA (资源优化)

总结

自动扩缩容的关键实践：

先设置合理的 resource requests/limits，这是 HPA 和 VPA 工作的基础
配置 HPA 的 behavior 控制扩缩速度，避免频繁抖动
缩容要比扩容更保守，设置较长的稳定窗口
VPA Off 模式先观察，再决定是否自动应用
KEDA 适合事件驱动场景，支持缩到零节省成本
Cluster Autoscaler 配合 HPA 实现完整的弹性能力

贡献者

withesse

更新日志

2026/3/14 13:09

查看所有更新日志

9f6c2-feat: organize wiki content and refresh site setup于 2026/3/14