33.2 LiteLLM 网关部署

33.2.1 LiteLLM 简介

LiteLLM 是一个开源的 LLM 网关，支持 100+ 个 LLM 提供商，包括 Anthropic、OpenAI、Cohere 等。它提供了统一的 API 接口，简化了多提供商的使用和管理。

LiteLLM 的核心特性

多提供商支持 ：支持 100+ LLM 提供商
统一 API ：一致的 API 接口，简化集成
智能缓存 ：内置缓存机制，减少成本和延迟
速率限制 ：可配置的速率限制，控制使用
成本跟踪 ：详细的使用情况和成本分析
负载均衡 ：在多个 API 密钥之间分配请求
失败重试 ：自动重试失败的请求
流式响应 ：支持流式输出

LiteLLM 架构

┌─────────────────────────────────────────┐ │ Claude Code 客户端 │ └─────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────┐ │ LiteLLM Proxy │ │ ┌──────────────────────────────┐ │ │ │ API 层 │ │ │ │ (Anthropic、OpenAI 等) │ │ │ └──────────────────────────────┘ │ │ ┌──────────────────────────────┐ │ │ │ 缓存层 │ │ │ │ (Redis、Memcached) │ │ │ └──────────────────────────────┘ │ │ ┌──────────────────────────────┐ │ │ │ 监控层 │ │ │ │ (Prometheus、Grafana) │ │ │ └──────────────────────────────┘ │ └─────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────┐ │ LLM 提供商 │ │ (Anthropic、OpenAI、Cohere 等) │ └─────────────────────────────────────────┘

bash

## 33.2.2 安装和配置

### 1\. 安装 LiteLLM

#### 使用 Docker 安装（推荐）

    bash


    bash

    # 拉取 LiteLLM 镜像
    docker pull litellm/litellm:latest

    # 创建配置目录
    mkdir -p ~/litellm/config
    cd ~/litellm

    # 创建配置文件
    cat > config.yaml << EOF
    model_list:
      - model_name: claude-sonnet-4
        litellm_params:
          model: claude-sonnet-4-20250514
          api_key: os.environ/ANTHROPIC_API_KEY

      - model_name: claude-opus-4
        litellm_params:
          model: claude-opus-4-20250514
          api_key: os.environ/ANTHROPIC_API_KEY

      - model_name: claude-haiku-4
        litellm_params:
          model: claude-haiku-4-20250514
          api_key: os.environ/ANTHROPIC_API_KEY

    litellm_settings:
      drop_params: true
      set_verbose: true

    general_settings:
      master_key: sk-litellm-master-key-123456
      database_url: postgresql://user:password@localhost:5432/litellm

    security_settings:
      valid_api_keys:
        - sk-team-a-key-123
        - sk-team-b-key-456
    EOF

    # 启动 LiteLLM
    docker run -d \
      --name litellm \
      -p 4000:4000 \
      -v $(pwd)/config.yaml:/app/config.yaml \
      -e ANTHROPIC_API_KEY=sk-ant-xxx \
      litellm/litellm:latest

    ```#### 使用 Python 安装

    # 安装 LiteLLM
    pip install litellm[proxy]
    # 初始化配置
    litellm init
    # 编辑配置文件
    nano litellm_config.yaml
    # 启动代理服务器
    litellm proxy --config litellm_config.yaml --port 4000

### 2\. 配置文件详解

    yaml


```yaml

    ```yaml

    # litellm_config.yaml

    # 模型列表

    model_list:

      # Anthropic Claude 模型


      - model_name: claude-sonnet-4

        litellm_params:
          model: claude-sonnet-4-20250514
          api_key: os.environ/ANTHROPIC_API_KEY
          api_base: https://api.anthropic.com
          max_tokens: 4096
          temperature: 0.7

      - model_name: claude-opus-4

        litellm_params:
          model: claude-opus-4-20250514
          api_key: os.environ/ANTHROPIC_API_KEY
          max_tokens: 4096

      - model_name: claude-haiku-4

        litellm_params:
          model: claude-haiku-4-20250514
          api_key: os.environ/ANTHROPIC_API_KEY
          max_tokens: 4096

      # Amazon Bedrock 模型


      - model_name: bedrock-claude-sonnet

        litellm_params:
          model: anthropic.claude-sonnet-4-5-20250929-v1:0
          api_base: https://bedrock-runtime.us-east-1.amazonaws.com
          api_key: os.environ/AWS_ACCESS_KEY_ID
          aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
          aws_region_name: us-east-1

      # Google Vertex AI 模型


      - model_name: vertex-claude-sonnet

        litellm_params:
          model: claude-sonnet-4-5@20250929
          api_base: https://us-central1-aiplatform.googleapis.com
          api_key: os.environ/GOOGLE_APPLICATION_CREDENTIALS
          vertex_project: os.environ/VERTEX_PROJECT_ID
          vertex_location: us-central1

    # LiteLLM 设置

    litellm_settings:
      drop_params: true              # 删除未使用的参数
      set_verbose: true              # 启用详细日志
      json_logs: true               # JSON 格式日志
      success_callback: http://localhost:5000/callback  # 成功回调
      failure_callback: http://localhost:5000/failure  # 失败回调

    # 通用设置

    general_settings:
      master_key: sk-litellm-master-key-123456  # 主密钥
      database_url: postgresql://user:password@localhost:5432/litellm  # 数据库 URL
      cache: redis://localhost:6379  # Redis 缓存
      cache_seconds: 3600  # 缓存时间（秒）

    # 安全设置

    security_settings:
      valid_api_keys:  # 有效的 API 密钥

        - sk-team-a-key-123
        - sk-team-b-key-456
        - sk-team-c-key-789

      max_budget: 1000.0  # 最大预算（美元）
      budget_duration: monthly  # 预算周期
      rpm_limit: 100  # 每分钟请求数限制
      tpm_limit: 10000  # 每分钟令牌数限制

    # 负载均衡设置

    load_balancing_settings:
      routing_strategy: usage-based  # 路由策略：usage-based, round-robin, least-latency
      health_check: true  # 启用健康检查
      health_check_interval: 60  # 健康检查间隔（秒）

    # 监控设置

    monitoring_settings:
      enable_prometheus: true  # 启用 Prometheus
      prometheus_port: 9090  # Prometheus 端口
      enable_slack_alerts: true  # 启用 Slack 告警
      slack_webhook_url: https://hooks.slack.com/services/xxx/yyy/zzz
      alert_thresholds:
        error_rate: 0.05  # 错误率阈值
        latency_p99: 5000  # P99 延迟阈值（毫秒）

    ```## 33.2.3 高级配置

    ### 1. 缓存配置

    # 缓存设置
    cache_settings:
    type: redis  # 缓存类型：redis, memory, none
    redis_url: redis://localhost:6379/0
    cache_ttl: 3600  # 缓存生存时间（秒）
    cache_key_prefix: litellm  # 缓存键前缀
    enable_cache_for_stream: false  # 是否为流式响应启用缓存
    cache_control_headers: true  # 是否使用缓存控制头

### 2\. 速率限制配置

    yaml

yaml


    ```yaml

    # 速率限制设置

    rate_limit_settings:
      enabled: true
      strategy: sliding_window  # 策略：sliding_window, token_bucket, fixed_window
      limits:

        - api_key: sk-team-a-key-123

          rpm: 100  # 每分钟请求数
          tpm: 10000  # 每分钟令牌数
          rpd: 10000  # 每天请求数

        - api_key: sk-team-b-key-456

          rpm: 50
          tpm: 5000
          rpd: 5000
      default_limits:
        rpm: 10
        tpm: 1000
        rpd: 100
      burst_size: 20  # 突发大小

    ```### 3. 预算控制配置

    # 预算设置
    budget_settings:
    enabled: true
    currency: USD
    budgets:
    - name: team-a-budget
    api_keys:
    - sk-team-a-key-123
    limit: 1000.0
    period: monthly
    alert_threshold: 0.8  # 在 80% 时告警
    hard_limit: true  # 达到限制时阻止请求
    - name: team-b-budget
    api_keys:
    - sk-team-b-key-456
    limit: 500.0
    period: monthly
    alert_threshold: 0.9
    hard_limit: false
    cost_tracking:
    enabled: true
    update_interval: 60  # 更新间隔（秒）
    storage: database  # 存储方式：database, file

### 4\. 监控和告警配置

    yaml

bash


    ```yaml

    # 监控设置

    monitoring_settings:
      prometheus:
        enabled: true
        port: 9090
        metrics:

          - request_count
          - request_duration
          - error_count
          - cache_hit_rate
          - token_usage
          - cost

      grafana:
        enabled: true
        dashboard_url: http://localhost:3000/d/litellm

      alerts:
        slack:
          enabled: true
          webhook_url: https://hooks.slack.com/services/xxx/yyy/zzz
          channels:

            - litellm-alerts
            - devops-notifications

          alert_rules:

            - name: high_error_rate

              condition: error_rate > 0.05
              duration: 5m
              severity: warning

            - name: high_latency

              condition: p99_latency > 5000
              duration: 2m
              severity: critical

            - name: budget_exceeded

              condition: budget_usage > 1.0
              severity: critical

        email:
          enabled: true
          smtp_server: smtp.gmail.com
          smtp_port: 587
          smtp_username: alerts@company.com
          smtp_password: ${SMTP_PASSWORD}
          from_address: litellm-alerts@company.com
          to_addresses:

            - devops@company.com
            - finance@company.com

    ```## 33.2.4 集成 Claude Code

    ### 1. 配置 Claude Code 使用 LiteLLM

    # 方法 1：使用统一端点（推荐）
    export ANTHROPIC_BASE_URL=https://litellm-server:4000
    export ANTHROPIC_AUTH_TOKEN=sk-litellm-static-key
    # 方法 2：使用 Anthropic 格式端点
    export ANTHROPIC_BASE_URL=https://litellm-server:4000/anthropic
    export ANTHROPIC_AUTH_TOKEN=sk-litellm-static-key

# 方法 3：使用 API 密钥辅助程序

# 创建辅助程序脚本

cat > ~/bin/get-litellm-key.sh << 'EOF' #!/bin/bash

# 从 Vault 获取密钥

vault kv get -field=api_key secret/litellm/claude-code EOF chmod +x ~/bin/get-litellm-key.sh

# 配置 Claude Code 使用辅助程序

cat > ~~/.claude-code/settings.json << EOF { "apiKeyHelper": "~~/bin/get-litellm-key.sh", "env": { "ANTHROPIC_BASE_URL": "<https://litellm-server:4000>" } } EOF

    bash


    ### 2. 验证配置

    ```python

    ```python

    class LiteLLMValidator:
        """LiteLLM 验证器"""

        def __init__(self, gateway_url: str, auth_token: str):
            self.gateway_url = gateway_url
            self.auth_token = auth_token

        def validate_connection(self) -> ValidationResult:
            """验证连接"""
            result = ValidationResult()

            try:
                # 测试健康检查端点
                response = requests.get(
                    f"{self.gateway_url}/health",
                    headers={'Authorization': f'Bearer {self.auth_token}'},
                    timeout=10
                )

                if response.status_code == 200:
                    result.success = True
                    result.message = "Connection successful"
                else:
                    result.success = False
                    result.message = f"Health check failed: {response.status_code}"

            except requests.exceptions.Timeout:
                result.success = False
                result.message = "Connection timeout"
            except requests.exceptions.ConnectionError:
                result.success = False
                result.message = "Connection error"
            except Exception as e:
                result.success = False
                result.message = f"Unexpected error: {str(e)}"

            return result

        def validate_model_access(self, model: str) -> ValidationResult:
            """验证模型访问"""
            result = ValidationResult()

            try:
                # 测试模型访问
                response = requests.post(
                    f"{self.gateway_url}/v1/completions",
                    headers={
                        'Authorization': f'Bearer {self.auth_token}',
                        'Content-Type': 'application/json'
                    },
                    json={
                        'model': model,
                        'prompt': 'Hello',
                        'max_tokens': 10
                    },
                    timeout=30
                )

                if response.status_code == 200:
                    result.success = True
                    result.message = f"Model {model} accessible"
                else:
                    result.success = False
                    result.message = f"Model access failed: {response.status_code}"
                    result.error = response.text

            except Exception as e:
                result.success = False
                result.message = f"Model access error: {str(e)}"

            return result

        def validate_all(self) -> ValidationReport:
            """验证所有配置"""
            report = ValidationReport()

            # 验证连接
            report.connection = self.validate_connection()

            # 验证模型访问
            models = ['claude-sonnet-4', 'claude-opus-4', 'claude-haiku-4']
            report.models = {}

            for model in models:
                report.models[model] = self.validate_model_access(model)

            # 生成摘要
            report.summary = self._generate_summary(report)

            return report

        def _generate_summary(self, report: ValidationReport) -> str:
            """生成验证摘要"""
            summary = "LiteLLM Validation Summary:\n\n"

            summary += f"Connection: {'✓' if report.connection.success else '✗'} "
            summary += f"{report.connection.message}\n\n"

            summary += "Model Access:\n"
            for model, result in report.models.items():
                status = '✓' if result.success else '✗'
                summary += f"  {status} {model}: {result.message}\n"

            return summary

    ```## 33.2.5 监控和维护

```python
    ### 1. Prometheus 监控

    # prometheus.yml
    global:
    scrape_interval: 15s
    evaluation_interval: 15s
    scrape_configs:
    - job_name: 'litellm'
    static_configs:
    - targets: ['litellm-server:9090']
    metrics_path: '/metrics'

### 2\. Grafana 仪表板

    json


    ```json

```python
    {
      "dashboard": {
        "title": "LiteLLM Dashboard",
        "panels": [
          {
            "title": "Request Rate",
            "targets": [
              {
                "expr": "rate(litellm_request_count[1m])"
              }
            ]
          },
          {
            "title": "Error Rate",
            "targets": [
              {
                "expr": "rate(litellm_error_count[1m]) / rate(litellm_request_count[1m])"
              }
            ]
          },
          {
            "title": "P99 Latency",
            "targets": [
              {
                "expr": "histogram_quantile(0.99, rate(litellm_request_duration_bucket[1m]))"
              }
            ]
          },
          {
            "title": "Cache Hit Rate",
            "targets": [
              {
                "expr": "rate(litellm_cache_hits[1m]) / rate(litellm_cache_requests[1m])"
              }
            ]
          },
          {
            "title": "Token Usage",
            "targets": [
              {
                "expr": "rate(litellm_token_usage[1m])"
              }
            ]
          },
          {
            "title": "Cost",
            "targets": [
              {
                "expr": "litellm_cost_total"
              }
            ]
          }
        ]
      }
    }

    ```### 3. 日志管理

    class LiteLLMLogManager:
    """LiteLLM 日志管理器"""
    def __init__(self, log_file: str):
    self.log_file = log_file
    self.log_parser = LiteLLMLogParser()
    def analyze_logs(self,
    start_time: datetime = None,
    end_time: datetime = None) -> LogAnalysis:
    """分析日志"""
    analysis = LogAnalysis()

    # 读取日志文件

    with open(self.log_file, 'r') as f:
    logs = f.readlines()

    # 解析日志

    parsed_logs = []
    for log in logs:
    try:
    parsed = self.log_parser.parse(log)
    parsed_logs.append(parsed)
    except Exception as e:
    logger.warning(f"Failed to parse log: {e}")

    # 过滤时间范围

    if start_time or end_time:
    parsed_logs = [
    log for log in parsed_logs
    if (not start_time or log.timestamp >= start_time) and
    (not end_time or log.timestamp <= end_time)
    ]

    # 分析日志

    analysis.total_requests = len(parsed_logs)
    analysis.successful_requests = sum(

    1 for log in parsed_logs if log.status == 'success'

    )
    analysis.failed_requests = sum(

    1 for log in parsed_logs if log.status == 'error'

    )
    analysis.error_rate = (
    analysis.failed_requests / analysis.total_requests
    if analysis.total_requests > 0 else 0
    )

    # 分析延迟

    latencies = [log.duration for log in parsed_logs if log.duration]
    if latencies:
    analysis.avg_latency = sum(latencies) / len(latencies)
    analysis.p50_latency = np.percentile(latencies, 50)
    analysis.p95_latency = np.percentile(latencies, 95)
    analysis.p99_latency = np.percentile(latencies, 99)

    # 分析令牌使用

    analysis.total_tokens = sum(
    log.input_tokens + log.output_tokens
    for log in parsed_logs
    )

    # 分析成本

    analysis.total_cost = sum(log.cost for log in parsed_logs)
    return analysis
    def generate_report(self, analysis: LogAnalysis) -> str:
    """生成报告"""
    report = "LiteLLM Log Analysis Report\n"
    report += "=" * 50 + "\n\n"
    report += "Request Summary:\n"
    report += f"  Total: {analysis.total_requests}\n"
    report += f"  Successful: {analysis.successful_requests}\n"
    report += f"  Failed: {analysis.failed_requests}\n"
    report += f"  Error Rate: {analysis.error_rate:.2%}\n\n"
    report += "Latency (ms):\n"
    report += f"  Average: {analysis.avg_latency:.0f}\n"
    report += f"  P50: {analysis.p50_latency:.0f}\n"
    report += f"  P95: {analysis.p95_latency:.0f}\n"
    report += f"  P99: {analysis.p99_latency:.0f}\n\n"
    report += "Token Usage:\n"
    report += f"  Total: {analysis.total_tokens:,}\n\n"
    report += "Cost:\n"
    report += f"  Total: ${analysis.total_cost:.2f}\n"
    return report

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297

33.2 LiteLLM 网关部署 ​

33.2.1 LiteLLM 简介 ​

LiteLLM 的核心特性 ​

LiteLLM 架构 ​

33.2 LiteLLM 网关部署

33.2.1 LiteLLM 简介

LiteLLM 的核心特性

LiteLLM 架构