第一个版本

2026-01-11 04:17:53 +08:00
commit c160320892
11 changed files with 2383 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,27 @@
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+*.db
+*.sqlite
+*.sqlite3
+exports/
+*.log
+.mitmproxy/
--- a/README.md
+++ b/README.md
@@ -0,0 +1,220 @@
+# LLM Proxy - OpenAI API 代理和训练数据收集工具
+
+一个透明的 HTTP 代理服务器，用于拦截和保存 LLM API 请求，自动导出为 JSONL 格式的训练数据。
+
+## 功能特性
+
+- ✅ **透明代理**：拦截所有 `/v1/` 开头的 LLM API 请求
+- ✅ **零配置**：无需在代理中配置 API Key，直接使用客户端的 Key
+- ✅ **多提供商支持**：支持 OpenAI、Anthropic、GLM、OpenRouter 等所有 OpenAI 兼容的 API
+- ✅ **智能解析**：自动识别和解析 LLM 请求，忽略其他请求
+- ✅ **思考过程保存**：自动保存模型的推理内容（reasoning）
+- ✅ **多轮对话支持**：完整保存对话上下文
+- ✅ **JSONL 导出**：一键导出为标准训练数据格式
+- ✅ **SQLite 存储**：轻量级数据库，无需额外配置
+
+## 安装
+
+### 1. 克隆项目
+
+```bash
+git clone https://github.com/mitmproxy/mitmproxy.git
+cd mitmproxy
+```
+
+### 2. 安装依赖
+
+```bash
+pip install -r requirements.txt
+```
+
+## 使用方法
+
+### 启动代理服务器
+
+```bash
+python start_proxy.py
+```
+
+默认监听 `127.0.0.1:8080`
+
+### 配置系统代理
+
+#### Windows
+
+1. 打开"设置" → "网络和 Internet" → "代理"
+2. 开启"使用代理服务器"
+3. 地址：`127.0.0.1`
+4. 端口：`8080`
+
+#### macOS
+
+```bash
+networksetup -setwebproxy Wi-Fi 127.0.0.1 8080
+networksetup -setsecurewebproxy Wi-Fi 127.0.0.1 8080
+```
+
+#### Linux
+
+在浏览器或系统设置中配置 HTTP/HTTPS 代理为 `127.0.0.1:8080`
+
+### 使用客户端
+
+#### Trae
+
+1. 启动代理服务器
+2. 配置系统代理（见上）
+3. 在 Trae 中正常使用，配置任何 API 提供商和 Key
+4. 所有请求自动被拦截和保存
+
+#### CherryStudio
+
+**方法 1：配置自定义提供商**
+1. 打开 CherryStudio 设置 → 模型服务
+2. 添加自定义提供商
+3. API 地址：`http://127.0.0.1:8080/v1`
+4. API Key：任意值（代理会忽略）
+5. 添加你使用的模型
+
+**方法 2：使用系统代理**
+1. 启动代理服务器
+2. 配置系统代理（见上）
+3. 在 CherryStudio 中正常使用
+
+### 导出训练数据
+
+```bash
+# 导出所有对话（包含思考过程）
+python export.py
+
+# 导出指定文件
+python export.py --output my_data.jsonl
+
+# 不包含思考过程
+python export.py --no-reasoning
+
+# 包含元数据
+python export.py --with-metadata
+
+# 查看数据库统计
+python export.py --stats
+```
+
+## 配置文件
+
+编辑 `config.json` 来自定义配置：
+
+```json
+{
+  "proxy": {
+    "listen_port": 8080,
+    "listen_host": "127.0.0.1"
+  },
+  "database": {
+    "path": "llm_data.db"
+  },
+  "filter": {
+    "enabled": true,
+    "path_patterns": ["/v1/"],
+    "save_all_requests": false
+  },
+  "export": {
+    "output_dir": "exports",
+    "include_reasoning": true,
+    "include_metadata": false
+  }
+}
+```
+
+## JSONL 格式
+
+导出的 JSONL 文件格式：
+
+```jsonl
+{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello!"}, {"role": "assistant", "content": "Hi there!", "reasoning": "The user greeted me, so I should respond politely."}]}
+{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is 2+2?"}, {"role": "assistant", "content": "2+2 equals 4.", "reasoning": "This is a simple arithmetic problem. 2+2 = 4."}]}
+```
+
+## 数据库结构
+
+### conversations 表
+- `id`: 主键
+- `conversation_id`: 对话 ID
+- `created_at`: 创建时间
+- `updated_at`: 更新时间
+
+### requests 表
+- `id`: 主键
+- `request_id`: 请求 ID
+- `conversation_id`: 对话 ID（外键）
+- `model`: 模型名称
+- `messages`: 消息列表（JSON）
+- `request_body`: 完整请求体（JSON）
+- `created_at`: 创建时间
+
+### responses 表
+- `id`: 主键
+- `request_id`: 请求 ID（外键）
+- `response_body`: 完整响应体（JSON）
+- `reasoning_content`: 思考过程
+- `tokens_used`: 使用的 token 数量
+- `created_at`: 创建时间
+
+## 工作原理
+
+1. **拦截请求**：代理拦截所有 `/v1/` 开头的 HTTP 请求
+2. **智能解析**：尝试解析请求体，识别是否为 LLM API 请求
+3. **保存请求**：将请求信息保存到 SQLite 数据库
+4. **透明转发**：保持原始 Authorization header，转发到目标服务器
+5. **保存响应**：接收响应后，保存响应内容和思考过程
+6. **导出数据**：随时导出为 JSONL 格式用于训练
+
+## 注意事项
+
+### HTTPS 证书
+
+如果客户端使用 HTTPS 连接到 API（如 `https://api.openai.com`），需要：
+
+1. 安装 mitmproxy 证书到系统信任库
+2. 或者在客户端配置中使用 HTTP（如 `http://api.openai.com`）
+
+### 证书安装
+
+首次运行代理时，mitmproxy 会生成证书：
+
+- Windows: `%USERPROFILE%\.mitmproxy\mitmproxy-ca-cert.pem`
+- macOS/Linux: `~/.mitmproxy/mitmproxy-ca-cert.pem`
+
+将证书安装到系统信任库即可。
+
+### 隐私和安全
+
+- 代理不会保存 API Key
+- 所有数据存储在本地 SQLite 数据库
+- 请妥善保管导出的训练数据
+
+## 故障排除
+
+### 请求没有被拦截
+
+1. 检查系统代理是否正确配置
+2. 检查代理服务器是否正在运行
+3. 检查请求路径是否包含 `/v1/`
+
+### HTTPS 请求失败
+
+1. 安装 mitmproxy 证书到系统信任库
+2. 或者在客户端配置中使用 HTTP 而不是 HTTPS
+
+### 数据库错误
+
+1. 检查数据库文件权限
+2. 删除 `llm_data.db` 重新初始化
+
+## 许可证
+
+MIT License
+
+## 贡献
+
+欢迎提交 Issue 和 Pull Request！
--- a/config.json
+++ b/config.json
@@ -0,0 +1,20 @@
+{
+  "proxy": {
+    "listen_port": 8080,
+    "listen_host": "127.0.0.1"
+  },
+  "database": {
+    "path": "llm_data.db"
+  },
+  "filter": {
+    "enabled": true,
+    "path_patterns": ["/v1/", "/chat/completions", "/completions"],
+    "host_patterns": ["deepseek.com", "openrouter.ai", "api.openai.com"],
+    "save_all_requests": false
+  },
+  "export": {
+    "output_dir": "exports",
+    "include_reasoning": true,
+    "include_metadata": false
+  }
+}
--- a/database.py
+++ b/database.py
@@ -0,0 +1,243 @@
+import sqlite3
+import json
+import uuid
+from datetime import datetime
+from typing import Optional, Dict, Any, List
+from pathlib import Path
+
+
+class LLMDatabase:
+    def __init__(self, db_path: str = "llm_data.db"):
+        self.db_path = db_path
+        self.init_database()
+
+    def get_connection(self):
+        conn = sqlite3.connect(self.db_path)
+        conn.row_factory = sqlite3.Row
+        return conn
+
+    def init_database(self):
+        conn = self.get_connection()
+        cursor = conn.cursor()
+
+        cursor.execute("""
+            CREATE TABLE IF NOT EXISTS conversations (
+                id INTEGER PRIMARY KEY AUTOINCREMENT,
+                conversation_id TEXT UNIQUE NOT NULL,
+                created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+                updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
+            )
+        """)
+
+        cursor.execute("""
+            CREATE TABLE IF NOT EXISTS requests (
+                id INTEGER PRIMARY KEY AUTOINCREMENT,
+                request_id TEXT UNIQUE NOT NULL,
+                conversation_id TEXT,
+                model TEXT,
+                messages TEXT,
+                request_body TEXT,
+                created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+                FOREIGN KEY (conversation_id) REFERENCES conversations(conversation_id)
+            )
+        """)
+
+        cursor.execute("""
+            CREATE TABLE IF NOT EXISTS responses (
+                id INTEGER PRIMARY KEY AUTOINCREMENT,
+                request_id TEXT NOT NULL,
+                response_body TEXT,
+                reasoning_content TEXT,
+                tokens_used INTEGER,
+                created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+                FOREIGN KEY (request_id) REFERENCES requests(request_id)
+            )
+        """)
+
+        cursor.execute("""
+            CREATE INDEX IF NOT EXISTS idx_conversation_id ON requests(conversation_id)
+        """)
+
+        cursor.execute("""
+            CREATE INDEX IF NOT EXISTS idx_request_id ON responses(request_id)
+        """)
+
+        conn.commit()
+        conn.close()
+
+    def get_or_create_conversation(self, conversation_id: Optional[str] = None) -> str:
+        if conversation_id is None:
+            conversation_id = str(uuid.uuid4())
+
+        conn = self.get_connection()
+        cursor = conn.cursor()
+
+        cursor.execute("""
+            INSERT OR IGNORE INTO conversations (conversation_id)
+            VALUES (?)
+        """, (conversation_id,))
+
+        cursor.execute("""
+            UPDATE conversations SET updated_at = CURRENT_TIMESTAMP
+            WHERE conversation_id = ?
+        """, (conversation_id,))
+
+        conn.commit()
+        conn.close()
+
+        return conversation_id
+
+    def save_request(self, request_id: str, model: str, messages: List[Dict[str, Any]], 
+                     request_body: Dict[str, Any], conversation_id: Optional[str] = None) -> None:
+        conversation_id = self.get_or_create_conversation(conversation_id)
+
+        conn = self.get_connection()
+        cursor = conn.cursor()
+
+        cursor.execute("""
+            INSERT OR REPLACE INTO requests 
+            (request_id, conversation_id, model, messages, request_body)
+            VALUES (?, ?, ?, ?, ?)
+        """, (
+            request_id,
+            conversation_id,
+            model,
+            json.dumps(messages, ensure_ascii=False),
+            json.dumps(request_body, ensure_ascii=False)
+        ))
+
+        conn.commit()
+        conn.close()
+
+    def save_response(self, request_id: str, response_body: Dict[str, Any], 
+                      reasoning_content: Optional[str] = None, tokens_used: Optional[int] = None) -> None:
+        conn = self.get_connection()
+        cursor = conn.cursor()
+
+        cursor.execute("""
+            INSERT OR REPLACE INTO responses 
+            (request_id, response_body, reasoning_content, tokens_used)
+            VALUES (?, ?, ?, ?)
+        """, (
+            request_id,
+            json.dumps(response_body, ensure_ascii=False),
+            reasoning_content,
+            tokens_used
+        ))
+
+        conn.commit()
+        conn.close()
+
+    def get_conversation_messages(self, conversation_id: str) -> List[Dict[str, Any]]:
+        conn = self.get_connection()
+        cursor = conn.cursor()
+
+        cursor.execute("""
+            SELECT r.messages, resp.response_body, resp.reasoning_content
+            FROM requests r
+            LEFT JOIN responses resp ON r.request_id = resp.request_id
+            WHERE r.conversation_id = ?
+            ORDER BY r.created_at
+        """, (conversation_id,))
+
+        rows = cursor.fetchall()
+        conn.close()
+
+        messages = []
+        for row in rows:
+            request_messages = json.loads(row['messages'])
+            response_body = json.loads(row['response_body']) if row['response_body'] else None
+            reasoning_content = row['reasoning_content']
+
+            if not messages:
+                for msg in request_messages:
+                    messages.append(msg)
+            else:
+                max_prefix = min(len(messages), len(request_messages))
+                prefix_len = 0
+                while prefix_len < max_prefix and messages[prefix_len] == request_messages[prefix_len]:
+                    prefix_len += 1
+                for msg in request_messages[prefix_len:]:
+                    messages.append(msg)
+
+            if response_body and 'choices' in response_body:
+                for choice in response_body['choices']:
+                    assistant_msg = {
+                        'role': 'assistant',
+                        'content': choice.get('message', {}).get('content', '')
+                    }
+                    if reasoning_content:
+                        assistant_msg['reasoning'] = reasoning_content
+                    messages.append(assistant_msg)
+
+        return messages
+
+    def get_all_conversations(self) -> List[Dict[str, Any]]:
+        conn = self.get_connection()
+        cursor = conn.cursor()
+
+        cursor.execute("""
+            SELECT conversation_id, created_at, updated_at
+            FROM conversations
+            ORDER BY updated_at DESC
+        """)
+
+        rows = cursor.fetchall()
+        conn.close()
+
+        return [
+            {
+                'conversation_id': row['conversation_id'],
+                'created_at': row['created_at'],
+                'updated_at': row['updated_at']
+            }
+            for row in rows
+        ]
+
+    def export_to_jsonl(self, output_path: str, include_reasoning: bool = True) -> int:
+        conversations = self.get_all_conversations()
+        count = 0
+
+        with open(output_path, 'w', encoding='utf-8') as f:
+            for conv in conversations:
+                messages = self.get_conversation_messages(conv['conversation_id'])
+
+                if not messages:
+                    continue
+
+                if not include_reasoning:
+                    messages = [
+                        {k: v for k, v in msg.items() if k != 'reasoning'}
+                        for msg in messages
+                    ]
+
+                jsonl_line = json.dumps({'messages': messages}, ensure_ascii=False)
+                f.write(jsonl_line + '\n')
+                count += 1
+
+        return count
+
+    def get_stats(self) -> Dict[str, Any]:
+        conn = self.get_connection()
+        cursor = conn.cursor()
+
+        cursor.execute("SELECT COUNT(*) as count FROM conversations")
+        conversation_count = cursor.fetchone()['count']
+
+        cursor.execute("SELECT COUNT(*) as count FROM requests")
+        request_count = cursor.fetchone()['count']
+
+        cursor.execute("SELECT COUNT(*) as count FROM responses")
+        response_count = cursor.fetchone()['count']
+
+        cursor.execute("SELECT SUM(tokens_used) as total FROM responses")
+        total_tokens = cursor.fetchone()['total'] or 0
+
+        conn.close()
+
+        return {
+            'conversations': conversation_count,
+            'requests': request_count,
+            'responses': response_count,
+            'total_tokens': total_tokens
+        }
--- a/export.bat
+++ b/export.bat
@@ -0,0 +1,5 @@
+@echo off
+echo Exporting LLM training data...
+echo.
+python export.py %*
+pause
--- a/export.py
+++ b/export.py
@@ -0,0 +1,93 @@
+import json
+import argparse
+from pathlib import Path
+from datetime import datetime
+from database import LLMDatabase
+from proxy_addon import load_config
+
+
+def export_training_data(output_path: str, db_path: str = "llm_data.db", 
+                         include_reasoning: bool = True) -> int:
+    db = LLMDatabase(db_path)
+    count = db.export_to_jsonl(output_path, include_reasoning)
+    return count
+
+
+def export_with_metadata(output_path: str, db_path: str = "llm_data.db") -> int:
+    db = LLMDatabase(db_path)
+    conversations = db.get_all_conversations()
+    count = 0
+
+    with open(output_path, 'w', encoding='utf-8') as f:
+        for conv in conversations:
+            messages = db.get_conversation_messages(conv['conversation_id'])
+
+            if not messages:
+                continue
+
+            data = {
+                'messages': messages,
+                'metadata': {
+                    'conversation_id': conv['conversation_id'],
+                    'created_at': conv['created_at'],
+                    'updated_at': conv['updated_at']
+                }
+            }
+
+            jsonl_line = json.dumps(data, ensure_ascii=False)
+            f.write(jsonl_line + '\n')
+            count += 1
+
+    return count
+
+
+def main():
+    config = load_config()
+    export_config = config.get('export', {})
+    db_config = config.get('database', {})
+
+    parser = argparse.ArgumentParser(description='Export LLM training data to JSONL format')
+    parser.add_argument('--output', '-o', type=str, 
+                       default=f"exports/training_data_{datetime.now().strftime('%Y%m%d_%H%M%S')}.jsonl",
+                       help='Output file path')
+    parser.add_argument('--db', type=str, default=db_config.get('path', 'llm_data.db'),
+                       help='Database file path')
+    parser.add_argument('--no-reasoning', action='store_true',
+                       help='Exclude reasoning content from export')
+    parser.add_argument('--with-metadata', action='store_true',
+                       help='Include metadata in export')
+    parser.add_argument('--stats', action='store_true',
+                       help='Show database statistics')
+
+    args = parser.parse_args()
+
+    if args.stats:
+        db = LLMDatabase(args.db)
+        stats = db.get_stats()
+        print("\nDatabase Statistics:")
+        print(f"  Conversations: {stats['conversations']}")
+        print(f"  Requests: {stats['requests']}")
+        print(f"  Responses: {stats['responses']}")
+        print(f"  Total Tokens: {stats['total_tokens']}")
+        return
+
+    output_path = Path(args.output)
+    output_path.parent.mkdir(parents=True, exist_ok=True)
+
+    include_reasoning = not args.no_reasoning
+
+    if args.with_metadata:
+        count = export_with_metadata(str(output_path), args.db)
+        print(f"\nExported {count} conversations with metadata to: {output_path}")
+    else:
+        count = export_training_data(str(output_path), args.db, include_reasoning)
+        print(f"\nExported {count} conversations to: {output_path}")
+
+    if include_reasoning:
+        print("  (Reasoning content included)")
+    else:
+        print("  (Reasoning content excluded)")
+
+
+if __name__ == '__main__':
+    main()
--- a/log.txt
+++ b/log.txt
--- a/proxy_addon.py
+++ b/proxy_addon.py
@@ -0,0 +1,288 @@
+import json
+import uuid
+import logging
+from typing import Optional, Dict, Any, List
+from mitmproxy import http
+from database import LLMDatabase
+
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+
+
+class LLMProxyAddon:
+    def __init__(self, config: Dict[str, Any]):
+        self.config = config
+        self.db = LLMDatabase(config['database']['path'])
+        self.path_patterns = config['filter'].get('path_patterns', ['/v1/'])
+        self.host_patterns = config['filter'].get('host_patterns', [])
+        self.save_all = config['filter'].get('save_all_requests', False)
+        logger.info("LLMProxyAddon initialized")
+
+    def is_llm_request(self, flow: http.HTTPFlow) -> bool:
+        path = flow.request.path
+        host = flow.request.host
+
+        if host.startswith("clerk.openrouter.ai"):
+            return False
+
+        for pattern in self.path_patterns:
+            if pattern in path:
+                logger.info(f"LLM path match: host={host}, path={path}")
+                return True
+
+        for pattern in self.host_patterns:
+            if pattern in host:
+                logger.info(f"LLM host match: host={host}, path={path}")
+                return True
+
+        return False
+
+    def extract_conversation_id(self, request_body: Dict[str, Any]) -> Optional[str]:
+        if 'conversation_id' in request_body:
+            return request_body['conversation_id']
+
+        messages = request_body.get('messages', [])
+        if messages and len(messages) > 0:
+            first_msg = messages[0]
+            if 'conversation_id' in first_msg:
+                return first_msg['conversation_id']
+
+        if not messages:
+            return None
+
+        system_content = None
+        first_user_content = None
+
+        for msg in messages:
+            role = msg.get('role')
+            if role == 'system' and system_content is None:
+                system_content = msg.get('content', '')
+            if role == 'user' and first_user_content is None:
+                first_user_content = msg.get('content', '')
+            if system_content is not None and first_user_content is not None:
+                break
+
+        if first_user_content is None:
+            return None
+
+        key = (system_content or '') + '\n---\n' + first_user_content
+        conv_id = uuid.uuid5(uuid.NAMESPACE_URL, key)
+        return str(conv_id)
+
+    def extract_reasoning(self, response_body: Dict[str, Any]) -> Optional[str]:
+        reasoning = None
+
+        if 'choices' in response_body:
+            for choice in response_body['choices']:
+                message = choice.get('message', {})
+                if 'reasoning_content' in message:
+                    reasoning = message['reasoning_content']
+                    break
+                if 'reasoning' in message:
+                    reasoning = message['reasoning']
+                    break
+
+        if 'reasoning_content' in response_body:
+            reasoning = response_body['reasoning_content']
+
+        if 'reasoning' in response_body:
+            reasoning = response_body['reasoning']
+
+        return reasoning
+
+    def extract_tokens_used(self, response_body: Dict[str, Any]) -> Optional[int]:
+        usage = response_body.get('usage', {})
+        if usage:
+            total_tokens = usage.get('total_tokens')
+            if total_tokens is not None:
+                return total_tokens
+
+            prompt_tokens = usage.get('prompt_tokens', 0)
+            completion_tokens = usage.get('completion_tokens', 0)
+            return prompt_tokens + completion_tokens
+
+        return None
+
+    def parse_sse_response(self, raw_content: bytes) -> Optional[Dict[str, Any]]:
+        text = raw_content.decode('utf-8', errors='ignore')
+        lines = text.splitlines()
+        data_lines = []
+        for line in lines:
+            line = line.strip()
+            if not line:
+                continue
+            if line.startswith(':'):
+                continue
+            if not line.startswith('data:'):
+                continue
+            payload = line[5:].strip()
+            if payload == '[DONE]':
+                break
+            data_lines.append(payload)
+        if not data_lines:
+            return None
+        content_parts = []
+        reasoning_parts = []
+        tool_calls_state: Dict[str, Dict[str, Any]] = {}
+        for payload in data_lines:
+            try:
+                obj = json.loads(payload)
+            except json.JSONDecodeError:
+                continue
+            choices = obj.get('choices', [])
+            for choice in choices:
+                delta = choice.get('delta') or choice.get('message') or {}
+                if 'reasoning_content' in delta:
+                    reasoning_parts.append(delta.get('reasoning_content') or '')
+                if 'content' in delta:
+                    content_parts.append(delta.get('content') or '')
+                if 'tool_calls' in delta:
+                    for idx, tc in enumerate(delta.get('tool_calls') or []):
+                        tc_id = tc.get('id') or str(idx)
+                        state = tool_calls_state.get(tc_id)
+                        if state is None:
+                            state = {
+                                'id': tc.get('id'),
+                                'type': tc.get('type'),
+                                'function': {
+                                    'name': None,
+                                    'arguments': ''
+                                }
+                            }
+                            tool_calls_state[tc_id] = state
+                        fn = tc.get('function') or {}
+                        if fn.get('name'):
+                            state['function']['name'] = fn['name']
+                        if fn.get('arguments'):
+                            state['function']['arguments'] = state['function']['arguments'] + fn['arguments']
+        message: Dict[str, Any] = {}
+        if content_parts:
+            message['content'] = ''.join(content_parts)
+        if reasoning_parts:
+            message['reasoning_content'] = ''.join(reasoning_parts)
+        if tool_calls_state:
+            message['tool_calls'] = list(tool_calls_state.values())
+        if not message:
+            return None
+        return {
+            'choices': [
+                {
+                    'message': message
+                }
+            ]
+        }
+
+    def is_valid_llm_request(self, request_body: Dict[str, Any]) -> bool:
+        if 'messages' in request_body:
+            return True
+
+        if 'prompt' in request_body:
+            return True
+
+        if 'input' in request_body:
+            return True
+
+        return False
+
+    def request(self, flow: http.HTTPFlow) -> None:
+        if not self.is_llm_request(flow):
+            return
+
+        try:
+            logger.info(f"Processing potential LLM request: {flow.request.method} {flow.request.host}{flow.request.path}")
+            request_body = json.loads(flow.request.content)
+
+            if not self.is_valid_llm_request(request_body):
+                return
+
+            request_id = str(uuid.uuid4())
+            model = request_body.get('model', 'unknown')
+            messages = request_body.get('messages', [])
+            conversation_id = self.extract_conversation_id(request_body)
+
+            flow.request_id = request_id
+
+            self.db.save_request(
+                request_id=request_id,
+                model=model,
+                messages=messages,
+                request_body=request_body,
+                conversation_id=conversation_id
+            )
+
+            msg = f"\033[94mSaved request: {request_id}, model: {model}, messages: {len(messages)}\033[0m"
+            logger.info(msg)
+
+        except json.JSONDecodeError:
+            err = f"Failed to parse LLM request body for {flow.request.method} {flow.request.path}"
+            logger.error(err)
+        except Exception as e:
+            err = f"Error processing request: {e}"
+            logger.error(err)
+
+    def response(self, flow: http.HTTPFlow) -> None:
+        if not hasattr(flow, 'request_id'):
+            return
+
+        try:
+            raw = flow.response.content
+            content_type = flow.response.headers.get('content-type', '')
+            response_body: Optional[Dict[str, Any]] = None
+            if 'text/event-stream' in content_type or raw.strip().startswith(b'data:'):
+                response_body = self.parse_sse_response(raw)
+            else:
+                response_body = json.loads(raw)
+            if not response_body:
+                return
+
+            reasoning_content = self.extract_reasoning(response_body)
+            tokens_used = self.extract_tokens_used(response_body)
+
+            self.db.save_response(
+                request_id=flow.request_id,
+                response_body=response_body,
+                reasoning_content=reasoning_content,
+                tokens_used=tokens_used
+            )
+
+            msg = f"\033[94mSaved response for request: {flow.request_id}, tokens: {tokens_used}\033[0m"
+            logger.info(msg)
+
+        except json.JSONDecodeError:
+            err = f"Failed to parse response body for {flow.request.path}"
+            logger.debug(err)
+        except Exception as e:
+            err = f"Error processing response: {e}"
+            logger.error(err)
+
+
+def load_config(config_path: str = "config.json") -> Dict[str, Any]:
+    try:
+        with open(config_path, 'r', encoding='utf-8') as f:
+            return json.load(f)
+    except FileNotFoundError:
+        logger.warning(f"Config file not found: {config_path}, using defaults")
+        return {
+            "proxy": {
+                "listen_port": 8080,
+                "listen_host": "127.0.0.1"
+            },
+            "database": {
+                "path": "llm_data.db"
+            },
+            "filter": {
+                "enabled": True,
+                "path_patterns": ["/v1/", "/chat/completions", "/completions"],
+                "host_patterns": ["deepseek.com", "openrouter.ai", "api.openai.com"],
+                "save_all_requests": False
+            },
+            "export": {
+                "output_dir": "exports",
+                "include_reasoning": True,
+                "include_metadata": False
+            }
+        }
+
+
+config = load_config()
+addons = [LLMProxyAddon(config)]
--- a/requirements.txt
+++ b/requirements.txt
@@ -0,0 +1 @@
+mitmproxy>=10.0.0
--- a/start.bat
+++ b/start.bat
@@ -0,0 +1,5 @@
+@echo off
+echo Starting LLM Proxy Server...
+echo.
+python start_proxy.py
+pause
--- a/start_proxy.py
+++ b/start_proxy.py
@@ -0,0 +1,111 @@
+import sys
+import argparse
+import platform
+import ctypes
+import winreg
+from mitmproxy.tools.main import mitmdump
+from proxy_addon import load_config
+
+
+class SystemProxyManager:
+    def __init__(self, host: str, port: int):
+        self.host = host
+        self.port = port
+        self.original_enable = None
+        self.original_server = None
+
+    def _apply_windows_internet_options(self):
+        option_refresh = 37
+        option_settings_changed = 39
+        internet_set_option = ctypes.windll.Wininet.InternetSetOptionW
+        internet_set_option(0, option_settings_changed, 0, 0)
+        internet_set_option(0, option_refresh, 0, 0)
+
+    def enable(self):
+        if platform.system().lower() != "windows":
+            return
+        key_path = r"Software\Microsoft\Windows\CurrentVersion\Internet Settings"
+        key = winreg.OpenKey(winreg.HKEY_CURRENT_USER, key_path, 0, winreg.KEY_READ | winreg.KEY_WRITE)
+        try:
+            self.original_enable, _ = winreg.QueryValueEx(key, "ProxyEnable")
+        except FileNotFoundError:
+            self.original_enable = 0
+        try:
+            self.original_server, _ = winreg.QueryValueEx(key, "ProxyServer")
+        except FileNotFoundError:
+            self.original_server = ""
+        winreg.SetValueEx(key, "ProxyEnable", 0, winreg.REG_DWORD, 1)
+        winreg.SetValueEx(key, "ProxyServer", 0, winreg.REG_SZ, f"{self.host}:{self.port}")
+        winreg.CloseKey(key)
+        self._apply_windows_internet_options()
+
+    def disable(self):
+        if platform.system().lower() != "windows":
+            return
+        if self.original_enable is None or self.original_server is None:
+            return
+        key_path = r"Software\Microsoft\Windows\CurrentVersion\Internet Settings"
+        key = winreg.OpenKey(winreg.HKEY_CURRENT_USER, key_path, 0, winreg.KEY_READ | winreg.KEY_WRITE)
+        winreg.SetValueEx(key, "ProxyEnable", 0, winreg.REG_DWORD, self.original_enable)
+        winreg.SetValueEx(key, "ProxyServer", 0, winreg.REG_SZ, self.original_server)
+        winreg.CloseKey(key)
+        self._apply_windows_internet_options()
+
+
+def start_proxy(config_path: str = "config.json", port: int = 8080, host: str = "127.0.0.1", enable_system_proxy: bool = True):
+    config = load_config(config_path)
+
+    proxy_config = config.get('proxy', {})
+    listen_port = port or proxy_config.get('listen_port', 8080)
+    listen_host = host or proxy_config.get('listen_host', '127.0.0.1')
+
+    print(f"\n{'='*60}")
+    print(f"LLM Proxy Server")
+    print(f"{'='*60}")
+    print(f"Listening on: {listen_host}:{listen_port}")
+    print(f"Config file: {config_path}")
+    print(f"Database: {config.get('database', {}).get('path', 'llm_data.db')}")
+    if enable_system_proxy and platform.system().lower() == "windows":
+        print("System proxy: enabled for current session")
+    print(f"{'='*60}\n")
+
+    manager = None
+    if enable_system_proxy:
+        manager = SystemProxyManager(listen_host, listen_port)
+        manager.enable()
+
+    sys.argv = [
+        'mitmdump',
+        '-q',
+        '-s', 'proxy_addon.py',
+        '--listen-host', listen_host,
+        '--listen-port', str(listen_port),
+        '--set', 'block_global=false',
+        '--set', 'flow_detail=0'
+    ]
+
+    try:
+        mitmdump()
+    finally:
+        if manager is not None:
+            manager.disable()
+
+
+def cli_main():
+    parser = argparse.ArgumentParser(description='Start LLM Proxy Server')
+    parser.add_argument('--config', '-c', type=str, default='config.json',
+                       help='Path to config file')
+    parser.add_argument('--port', '-p', type=int, default=None,
+                       help='Listen port (overrides config)')
+    parser.add_argument('--host', '-H', type=str, default=None,
+                       help='Listen host (overrides config)')
+    parser.add_argument('--no-system-proxy', action='store_true',
+                       help='Do not modify system proxy settings')
+
+    args = parser.parse_args()
+
+    start_proxy(args.config, args.port, args.host, not args.no_system_proxy)
+
+
+if __name__ == '__main__':
+    cli_main()