33.2 LiteLLM 閘道器部署

33.2.1 LiteLLM 簡介

LiteLLM 是一個開源的 LLM 閘道器，支援 100+ 個 LLM 提供商，包括 Anthropic、OpenAI、Cohere 等。它提供了統一的 API 介面，簡化了多提供商的使用和管理。

LiteLLM 的核心特性

多提供商支援 ：支援 100+ LLM 提供商
統一 API ：一致的 API 介面，簡化整合
智慧快取 ：內建快取機制，減少成本和延遲
速率限制 ：可配置的速率限制，控制使用
成本跟蹤 ：詳細的使用情況和成本分析
負載均衡 ：在多個 API 金鑰之間分配請求
失敗重試 ：自動重試失敗的請求
流式響應 ：支援流式輸出

LiteLLM 架構

┌─────────────────────────────────────────┐ │ Claude Code 客戶端 │ └─────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────┐ │ LiteLLM Proxy │ │ ┌──────────────────────────────┐ │ │ │ API 層 │ │ │ │ (Anthropic、OpenAI 等) │ │ │ └──────────────────────────────┘ │ │ ┌──────────────────────────────┐ │ │ │ 快取層 │ │ │ │ (Redis、Memcached) │ │ │ └──────────────────────────────┘ │ │ ┌──────────────────────────────┐ │ │ │ 監控層 │ │ │ │ (Prometheus、Grafana) │ │ │ └──────────────────────────────┘ │ └─────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────┐ │ LLM 提供商 │ │ (Anthropic、OpenAI、Cohere 等) │ └─────────────────────────────────────────┘

bash

## 33.2.2 安装和配置

### 1\. 安装 LiteLLM

#### 使用 Docker 安装（推荐）

    bash


    bash

    # 拉取 LiteLLM 镜像
    docker pull litellm/litellm:latest

    # 创建配置目录
    mkdir -p ~/litellm/config
    cd ~/litellm

    # 创建配置文件
    cat > config.yaml << EOF
    model_list:
      - model_name: claude-sonnet-4
        litellm_params:
          model: claude-sonnet-4-20250514
          api_key: os.environ/ANTHROPIC_API_KEY

      - model_name: claude-opus-4
        litellm_params:
          model: claude-opus-4-20250514
          api_key: os.environ/ANTHROPIC_API_KEY

      - model_name: claude-haiku-4
        litellm_params:
          model: claude-haiku-4-20250514
          api_key: os.environ/ANTHROPIC_API_KEY

    litellm_settings:
      drop_params: true
      set_verbose: true

    general_settings:
      master_key: sk-litellm-master-key-123456
      database_url: postgresql://user:password@localhost:5432/litellm

    security_settings:
      valid_api_keys:
        - sk-team-a-key-123
        - sk-team-b-key-456
    EOF

    # 启动 LiteLLM
    docker run -d \
      --name litellm \
      -p 4000:4000 \
      -v $(pwd)/config.yaml:/app/config.yaml \
      -e ANTHROPIC_API_KEY=sk-ant-xxx \
      litellm/litellm:latest

    ```#### 使用 Python 安裝

    # 安装 LiteLLM
    pip install litellm[proxy]
    # 初始化配置
    litellm init
    # 编辑配置文件
    nano litellm_config.yaml
    # 启动代理服务器
    litellm proxy --config litellm_config.yaml --port 4000

### 2\. 配置文件详解

    yaml


```yaml

    ```yaml

    # litellm_config.yaml

    # 模型列表

    model_list:

      # Anthropic Claude 模型


      - model_name: claude-sonnet-4

        litellm_params:
          model: claude-sonnet-4-20250514
          api_key: os.environ/ANTHROPIC_API_KEY
          api_base: https://api.anthropic.com
          max_tokens: 4096
          temperature: 0.7

      - model_name: claude-opus-4

        litellm_params:
          model: claude-opus-4-20250514
          api_key: os.environ/ANTHROPIC_API_KEY
          max_tokens: 4096

      - model_name: claude-haiku-4

        litellm_params:
          model: claude-haiku-4-20250514
          api_key: os.environ/ANTHROPIC_API_KEY
          max_tokens: 4096

      # Amazon Bedrock 模型


      - model_name: bedrock-claude-sonnet

        litellm_params:
          model: anthropic.claude-sonnet-4-5-20250929-v1:0
          api_base: https://bedrock-runtime.us-east-1.amazonaws.com
          api_key: os.environ/AWS_ACCESS_KEY_ID
          aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
          aws_region_name: us-east-1

      # Google Vertex AI 模型


      - model_name: vertex-claude-sonnet

        litellm_params:
          model: claude-sonnet-4-5@20250929
          api_base: https://us-central1-aiplatform.googleapis.com
          api_key: os.environ/GOOGLE_APPLICATION_CREDENTIALS
          vertex_project: os.environ/VERTEX_PROJECT_ID
          vertex_location: us-central1

    # LiteLLM 設定

    litellm_settings:
      drop_params: true              # 刪除未使用的引數
      set_verbose: true              # 啟用詳細日誌
      json_logs: true               # JSON 格式日誌
      success_callback: http://localhost:5000/callback  # 成功回撥
      failure_callback: http://localhost:5000/failure  # 失敗回撥

    # 通用設定

    general_settings:
      master_key: sk-litellm-master-key-123456  # 主金鑰
      database_url: postgresql://user:password@localhost:5432/litellm  # 資料庫 URL
      cache: redis://localhost:6379  # Redis 快取
      cache_seconds: 3600  # 快取時間（秒）

    # 安全設定

    security_settings:
      valid_api_keys:  # 有效的 API 金鑰

        - sk-team-a-key-123
        - sk-team-b-key-456
        - sk-team-c-key-789

      max_budget: 1000.0  # 最大預算（美元）
      budget_duration: monthly  # 預算週期
      rpm_limit: 100  # 每分鐘請求數限制
      tpm_limit: 10000  # 每分鐘令牌數限制

    # 負載均衡設定

    load_balancing_settings:
      routing_strategy: usage-based  # 路由策略：usage-based, round-robin, least-latency
      health_check: true  # 啟用健康檢查
      health_check_interval: 60  # 健康檢查間隔（秒）

    # 監控設定

    monitoring_settings:
      enable_prometheus: true  # 啟用 Prometheus
      prometheus_port: 9090  # Prometheus 埠
      enable_slack_alerts: true  # 啟用 Slack 告警
      slack_webhook_url: https://hooks.slack.com/services/xxx/yyy/zzz
      alert_thresholds:
        error_rate: 0.05  # 錯誤率閾值
        latency_p99: 5000  # P99 延遲閾值（毫秒）

    ```## 33.2.3 高级配置

    ### 1. 缓存配置

    # 缓存设置
    cache_settings:
    type: redis  # 缓存类型：redis, memory, none
    redis_url: redis://localhost:6379/0
    cache_ttl: 3600  # 缓存生存时间（秒）
    cache_key_prefix: litellm  # 缓存键前缀
    enable_cache_for_stream: false  # 是否为流式响应启用缓存
    cache_control_headers: true  # 是否使用缓存控制头

### 2\. 速率限制配置

    yaml

yaml


    ```yaml

    # 速率限制設定

    rate_limit_settings:
      enabled: true
      strategy: sliding_window  # 策略：sliding_window, token_bucket, fixed_window
      limits:

        - api_key: sk-team-a-key-123

          rpm: 100  # 每分鐘請求數
          tpm: 10000  # 每分鐘令牌數
          rpd: 10000  # 每天請求數

        - api_key: sk-team-b-key-456

          rpm: 50
          tpm: 5000
          rpd: 5000
      default_limits:
        rpm: 10
        tpm: 1000
        rpd: 100
      burst_size: 20  # 突發大小

    ```### 3. 预算控制配置

    # 预算设置
    budget_settings:
    enabled: true
    currency: USD
    budgets:
    - name: team-a-budget
    api_keys:
    - sk-team-a-key-123
    limit: 1000.0
    period: monthly
    alert_threshold: 0.8  # 在 80% 时告警
    hard_limit: true  # 达到限制时阻止请求
    - name: team-b-budget
    api_keys:
    - sk-team-b-key-456
    limit: 500.0
    period: monthly
    alert_threshold: 0.9
    hard_limit: false
    cost_tracking:
    enabled: true
    update_interval: 60  # 更新间隔（秒）
    storage: database  # 存储方式：database, file

### 4\. 监控和告警配置

    yaml

bash


    ```yaml

    # 監控設定

    monitoring_settings:
      prometheus:
        enabled: true
        port: 9090
        metrics:

          - request_count
          - request_duration
          - error_count
          - cache_hit_rate
          - token_usage
          - cost

      grafana:
        enabled: true
        dashboard_url: http://localhost:3000/d/litellm

      alerts:
        slack:
          enabled: true
          webhook_url: https://hooks.slack.com/services/xxx/yyy/zzz
          channels:

            - litellm-alerts
            - devops-notifications

          alert_rules:

            - name: high_error_rate

              condition: error_rate > 0.05
              duration: 5m
              severity: warning

            - name: high_latency

              condition: p99_latency > 5000
              duration: 2m
              severity: critical

            - name: budget_exceeded

              condition: budget_usage > 1.0
              severity: critical

        email:
          enabled: true
          smtp_server: smtp.gmail.com
          smtp_port: 587
          smtp_username: alerts@company.com
          smtp_password: ${SMTP_PASSWORD}
          from_address: litellm-alerts@company.com
          to_addresses:

            - devops@company.com
            - finance@company.com

    ```## 33.2.4 集成 Claude Code

    ### 1. 配置 Claude Code 使用 LiteLLM

    # 方法 1：使用统一端点（推荐）
    export ANTHROPIC_BASE_URL=https://litellm-server:4000
    export ANTHROPIC_AUTH_TOKEN=sk-litellm-static-key
    # 方法 2：使用 Anthropic 格式端点
    export ANTHROPIC_BASE_URL=https://litellm-server:4000/anthropic
    export ANTHROPIC_AUTH_TOKEN=sk-litellm-static-key

# 方法 3：使用 API 密钥辅助程序

# 创建辅助程序脚本

cat > ~/bin/get-litellm-key.sh << 'EOF' #!/bin/bash

# 从 Vault 获取密钥

vault kv get -field=api_key secret/litellm/claude-code EOF chmod +x ~/bin/get-litellm-key.sh

# 配置 Claude Code 使用辅助程序

cat > ~~/.claude-code/settings.json << EOF { "apiKeyHelper": "~~/bin/get-litellm-key.sh", "env": { "ANTHROPIC_BASE_URL": "<https://litellm-server:4000>" } } EOF

    bash


    ### 2. 验证配置

    ```python

    ```python

    class LiteLLMValidator:
        """LiteLLM 验证器"""

        def __init__(self, gateway_url: str, auth_token: str):
            self.gateway_url = gateway_url
            self.auth_token = auth_token

        def validate_connection(self) -> ValidationResult:
            """验证连接"""
            result = ValidationResult()

            try:
                # 测试健康检查端点
                response = requests.get(
                    f"{self.gateway_url}/health",
                    headers={'Authorization': f'Bearer {self.auth_token}'},
                    timeout=10
                )

                if response.status_code == 200:
                    result.success = True
                    result.message = "Connection successful"
                else:
                    result.success = False
                    result.message = f"Health check failed: {response.status_code}"

            except requests.exceptions.Timeout:
                result.success = False
                result.message = "Connection timeout"
            except requests.exceptions.ConnectionError:
                result.success = False
                result.message = "Connection error"
            except Exception as e:
                result.success = False
                result.message = f"Unexpected error: {str(e)}"

            return result

        def validate_model_access(self, model: str) -> ValidationResult:
            """验证模型访问"""
            result = ValidationResult()

            try:
                # 测试模型访问
                response = requests.post(
                    f"{self.gateway_url}/v1/completions",
                    headers={
                        'Authorization': f'Bearer {self.auth_token}',
                        'Content-Type': 'application/json'
                    },
                    json={
                        'model': model,
                        'prompt': 'Hello',
                        'max_tokens': 10
                    },
                    timeout=30
                )

                if response.status_code == 200:
                    result.success = True
                    result.message = f"Model {model} accessible"
                else:
                    result.success = False
                    result.message = f"Model access failed: {response.status_code}"
                    result.error = response.text

            except Exception as e:
                result.success = False
                result.message = f"Model access error: {str(e)}"

            return result

        def validate_all(self) -> ValidationReport:
            """验证所有配置"""
            report = ValidationReport()

            # 验证连接
            report.connection = self.validate_connection()

            # 验证模型访问
            models = ['claude-sonnet-4', 'claude-opus-4', 'claude-haiku-4']
            report.models = {}

            for model in models:
                report.models[model] = self.validate_model_access(model)

            # 生成摘要
            report.summary = self._generate_summary(report)

            return report

        def _generate_summary(self, report: ValidationReport) -> str:
            """生成验证摘要"""
            summary = "LiteLLM Validation Summary:\n\n"

            summary += f"Connection: {'✓' if report.connection.success else '✗'} "
            summary += f"{report.connection.message}\n\n"

            summary += "Model Access:\n"
            for model, result in report.models.items():
                status = '✓' if result.success else '✗'
                summary += f"  {status} {model}: {result.message}\n"

            return summary

    ```## 33.2.5 監控和維護

```python
    ### 1. Prometheus 监控

    # prometheus.yml
    global:
    scrape_interval: 15s
    evaluation_interval: 15s
    scrape_configs:
    - job_name: 'litellm'
    static_configs:
    - targets: ['litellm-server:9090']
    metrics_path: '/metrics'

### 2\. Grafana 仪表板

    json


    ```json

```python
    {
      "dashboard": {
        "title": "LiteLLM Dashboard",
        "panels": [
          {
            "title": "Request Rate",
            "targets": [
              {
                "expr": "rate(litellm_request_count[1m])"
              }
            ]
          },
          {
            "title": "Error Rate",
            "targets": [
              {
                "expr": "rate(litellm_error_count[1m]) / rate(litellm_request_count[1m])"
              }
            ]
          },
          {
            "title": "P99 Latency",
            "targets": [
              {
                "expr": "histogram_quantile(0.99, rate(litellm_request_duration_bucket[1m]))"
              }
            ]
          },
          {
            "title": "Cache Hit Rate",
            "targets": [
              {
                "expr": "rate(litellm_cache_hits[1m]) / rate(litellm_cache_requests[1m])"
              }
            ]
          },
          {
            "title": "Token Usage",
            "targets": [
              {
                "expr": "rate(litellm_token_usage[1m])"
              }
            ]
          },
          {
            "title": "Cost",
            "targets": [
              {
                "expr": "litellm_cost_total"
              }
            ]
          }
        ]
      }
    }

    ```### 3. 日誌管理

    class LiteLLMLogManager:
    """LiteLLM 日誌管理器"""
    def __init__(self, log_file: str):
    self.log_file = log_file
    self.log_parser = LiteLLMLogParser()
    def analyze_logs(self,
    start_time: datetime = None,
    end_time: datetime = None) -> LogAnalysis:
    """分析日誌"""
    analysis = LogAnalysis()

    # 讀取日誌檔案

    with open(self.log_file, 'r') as f:
    logs = f.readlines()

    # 解析日誌

    parsed_logs = []
    for log in logs:
    try:
    parsed = self.log_parser.parse(log)
    parsed_logs.append(parsed)
    except Exception as e:
    logger.warning(f"Failed to parse log: {e}")

    # 過濾時間範圍

    if start_time or end_time:
    parsed_logs = [
    log for log in parsed_logs
    if (not start_time or log.timestamp >= start_time) and
    (not end_time or log.timestamp <= end_time)
    ]

    # 分析日誌

    analysis.total_requests = len(parsed_logs)
    analysis.successful_requests = sum(

    1 for log in parsed_logs if log.status == 'success'

    )
    analysis.failed_requests = sum(

    1 for log in parsed_logs if log.status == 'error'

    )
    analysis.error_rate = (
    analysis.failed_requests / analysis.total_requests
    if analysis.total_requests > 0 else 0
    )

    # 分析延遲

    latencies = [log.duration for log in parsed_logs if log.duration]
    if latencies:
    analysis.avg_latency = sum(latencies) / len(latencies)
    analysis.p50_latency = np.percentile(latencies, 50)
    analysis.p95_latency = np.percentile(latencies, 95)
    analysis.p99_latency = np.percentile(latencies, 99)

    # 分析令牌使用

    analysis.total_tokens = sum(
    log.input_tokens + log.output_tokens
    for log in parsed_logs
    )

    # 分析成本

    analysis.total_cost = sum(log.cost for log in parsed_logs)
    return analysis
    def generate_report(self, analysis: LogAnalysis) -> str:
    """生成報告"""
    report = "LiteLLM Log Analysis Report\n"
    report += "=" * 50 + "\n\n"
    report += "Request Summary:\n"
    report += f"  Total: {analysis.total_requests}\n"
    report += f"  Successful: {analysis.successful_requests}\n"
    report += f"  Failed: {analysis.failed_requests}\n"
    report += f"  Error Rate: {analysis.error_rate:.2%}\n\n"
    report += "Latency (ms):\n"
    report += f"  Average: {analysis.avg_latency:.0f}\n"
    report += f"  P50: {analysis.p50_latency:.0f}\n"
    report += f"  P95: {analysis.p95_latency:.0f}\n"
    report += f"  P99: {analysis.p99_latency:.0f}\n\n"
    report += "Token Usage:\n"
    report += f"  Total: {analysis.total_tokens:,}\n\n"
    report += "Cost:\n"
    report += f"  Total: ${analysis.total_cost:.2f}\n"
    return report

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297

33.2 LiteLLM 閘道器部署 ​

33.2.1 LiteLLM 簡介 ​

LiteLLM 的核心特性 ​

LiteLLM 架構 ​

33.2 LiteLLM 閘道器部署

33.2.1 LiteLLM 簡介

LiteLLM 的核心特性

LiteLLM 架構