feat: LLM 论文图书馆 — 初始提交
- FastAPI 后端: REST API + Bearer Token 鉴权 + PDF 代理 - 180 篇论文数据 (data/papers.json): 9 模块、32 子领域 - 前端: 数据驱动、卡片径向渐变光效、PDF 页面内阅读 - 底部状态栏: arXiv/HF 连通性检测 - PDF 加载: arXiv 优先(5s超时) → HK 本地兜底 - Docker 化部署 (Dockerfile + start.sh + nginx.conf) - arXiv + HF 批量下载器 (api/downloader.py)
This commit is contained in:
17
.env.example
Normal file
17
.env.example
Normal file
@@ -0,0 +1,17 @@
|
||||
# LLM 论文图书馆 — 环境配置
|
||||
# 复制此文件为 .env.local 并填入实际值
|
||||
|
||||
# API 管理密钥 (用于 POST/PUT/DELETE 鉴权)
|
||||
API_KEY=change-me-to-a-strong-random-string
|
||||
|
||||
# 服务端口
|
||||
PORT=8000
|
||||
|
||||
# PDF 存储路径
|
||||
PAPER_DIR=papers
|
||||
|
||||
# 日志级别
|
||||
LOG_LEVEL=info
|
||||
|
||||
# CORS 允许的域名 (逗号分隔)
|
||||
CORS_ORIGINS=*
|
||||
9
.gitignore
vendored
Normal file
9
.gitignore
vendored
Normal file
@@ -0,0 +1,9 @@
|
||||
__pycache__/
|
||||
*.pyc
|
||||
papers/arxiv/*.pdf
|
||||
papers/hf/*.pdf
|
||||
.env
|
||||
.env.local
|
||||
*.egg-info/
|
||||
.venv/
|
||||
papers/download.log
|
||||
16
Dockerfile
Normal file
16
Dockerfile
Normal file
@@ -0,0 +1,16 @@
|
||||
FROM python:3.11-slim
|
||||
|
||||
RUN apt-get update && apt-get install -y --no-install-recommends poppler-utils && rm -rf /var/lib/apt/lists/*
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
COPY requirements.txt .
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
COPY . .
|
||||
|
||||
VOLUME ["/app/papers", "/app/data"]
|
||||
EXPOSE 8000
|
||||
ENV PORT=8000
|
||||
ENV LOG_LEVEL=info
|
||||
|
||||
CMD ["sh", "-c", "python3 -m uvicorn api.server:app --host 0.0.0.0 --port ${PORT} --log-level ${LOG_LEVEL}"]
|
||||
123
README.md
Normal file
123
README.md
Normal file
@@ -0,0 +1,123 @@
|
||||
# LLM 论文图书馆
|
||||
|
||||
大模型全链路技术论文知识库 — 从架构设计到 Agent 应用,覆盖 9 大模块、30+ 子领域、180+ 篇关键论文。
|
||||
|
||||
**在线访问:** https://your-domain.com
|
||||
**API 文档:** https://your-domain.com/docs (FastAPI Swagger)
|
||||
|
||||
## 项目结构
|
||||
|
||||
```
|
||||
llm-library/
|
||||
├── api/
|
||||
│ ├── server.py # FastAPI 服务 (REST API + PDF 代理)
|
||||
│ ├── downloader.py # PDF 批量下载器
|
||||
│ ├── parse_papers.py # 从 HTML 提取论文数据
|
||||
│ └── extract_data.py # 备用提取脚本
|
||||
├── data/
|
||||
│ └── papers.json # 论文元数据 (单一数据源)
|
||||
├── papers/
|
||||
│ ├── arxiv/ # arXiv PDF 缓存
|
||||
│ └── hf/ # HuggingFace PDF 缓存
|
||||
├── static/ # 前端 (index.html + CSS + JS)
|
||||
├── start.sh # 一键启动
|
||||
├── requirements.txt
|
||||
└── pyproject.toml
|
||||
```
|
||||
|
||||
## 快速启动
|
||||
|
||||
```bash
|
||||
# 1. 配置 API Key
|
||||
echo "API_KEY=$(python3 -c 'import secrets; print(secrets.token_urlsafe(32))')" > .env
|
||||
|
||||
# 2. 安装依赖
|
||||
pip install -r requirements.txt
|
||||
|
||||
# 3. 启动服务
|
||||
bash start.sh
|
||||
# 或
|
||||
python3 -m uvicorn api.server:app --host 0.0.0.0 --port 8000
|
||||
```
|
||||
|
||||
服务启动后访问 `http://localhost:8000` 即可使用。
|
||||
|
||||
## API 接口
|
||||
|
||||
| 方法 | 路径 | 说明 | 鉴权 |
|
||||
|------|------|------|------|
|
||||
| GET | `/api/stats` | 图书馆统计 | 无 |
|
||||
| GET | `/api/modules` | 列出所有模块 | 无 |
|
||||
| GET | `/api/modules/{id}` | 获取模块详情 (含论文) | 无 |
|
||||
| GET | `/api/papers?q=xxx` | 搜索论文 | 无 |
|
||||
| POST | `/api/papers` | 添加论文 | Bearer Token |
|
||||
| PUT | `/api/papers` | 更新论文 | Bearer Token |
|
||||
| DELETE | `/api/papers` | 删除论文 | Bearer Token |
|
||||
| GET | `/papers/arxiv/{id}.pdf` | 本地 PDF 代理 | 无 |
|
||||
|
||||
### 管理接口示例
|
||||
|
||||
```bash
|
||||
# 添加一篇论文
|
||||
curl -X POST http://localhost:8000/api/papers \
|
||||
-H "Authorization: Bearer $API_KEY" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"module_id": "arch",
|
||||
"area_id": "attention",
|
||||
"section": "mainline",
|
||||
"title": "Paper Title Here",
|
||||
"authors": "Author et al.",
|
||||
"year": 2026,
|
||||
"venue": "arXiv",
|
||||
"arxiv": "2601.01234",
|
||||
"tags": ["前沿"]
|
||||
}'
|
||||
```
|
||||
|
||||
## PDF 下载
|
||||
|
||||
```bash
|
||||
# 下载所有论文 PDF 到本地 (增量)
|
||||
python3 api/downloader.py
|
||||
|
||||
# 只下载前 5 篇测试
|
||||
python3 api/downloader.py --limit 5
|
||||
|
||||
# 强制重新下载
|
||||
python3 api/downloader.py --no-incremental
|
||||
```
|
||||
|
||||
## 部署 (Nginx 反向代理)
|
||||
|
||||
```nginx
|
||||
server {
|
||||
listen 80;
|
||||
server_name your-domain.com;
|
||||
|
||||
location / {
|
||||
proxy_pass http://127.0.0.1:8000;
|
||||
proxy_set_header Host $host;
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
}
|
||||
|
||||
# 静态文件直接由 Nginx 服务 (可选, 提升性能)
|
||||
location /style.css { alias /path/to/llm-library/static/style.css; }
|
||||
location /app.js { alias /path/to/llm-library/static/app.js; }
|
||||
}
|
||||
```
|
||||
|
||||
## 数据维护
|
||||
|
||||
论文数据存储在 `data/papers.json`,也可通过 API 管理。
|
||||
|
||||
**标签系统:**
|
||||
- 🏁 **起点** — 该子领域的奠基论文
|
||||
- 🔴 **关键节点** — 改变技术方向的里程碑论文
|
||||
- 🟢 **前沿** — 当前 SOTA,已被主流模型采纳
|
||||
- 🟣 **前瞻** — 有潜力的想法,尚未被主流采纳 (如 Engram, Titans)
|
||||
- 🟠 **支线** — 有影响力的替代技术路线
|
||||
|
||||
## 许可证
|
||||
|
||||
MIT
|
||||
1
api/__init__.py
Normal file
1
api/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
# LLM 论文图书馆 — API 模块
|
||||
57
api/backfill.py
Normal file
57
api/backfill.py
Normal file
@@ -0,0 +1,57 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Backfill all uncached PDFs, skipping dead arXiv IDs"""
|
||||
import json, subprocess, sys, time
|
||||
from pathlib import Path
|
||||
|
||||
PAPERS_JSON = Path(__file__).resolve().parent.parent / "data" / "papers.json"
|
||||
ARXIV_DIR = Path(__file__).resolve().parent.parent / "papers" / "arxiv"
|
||||
LOG_FILE = Path("/app/papers/backfill.log")
|
||||
|
||||
def main():
|
||||
with open(PAPERS_JSON) as f:
|
||||
data = json.load(f)
|
||||
|
||||
# Collect all arxiv IDs
|
||||
arxiv_ids = set()
|
||||
for mod in data.values():
|
||||
for area in mod.get("areas", []):
|
||||
for section in ("mainline", "branches", "forward"):
|
||||
for p in area.get(section, []):
|
||||
aid = p.get("arxiv")
|
||||
if aid:
|
||||
arxiv_ids.add(aid)
|
||||
|
||||
cached = {p.stem for p in ARXIV_DIR.glob("*.pdf")}
|
||||
missing = [aid for aid in arxiv_ids if aid not in cached]
|
||||
print(f"Total: {len(arxiv_ids)}, Cached: {len(cached)}, Missing: {len(missing)}")
|
||||
|
||||
if not missing:
|
||||
print("All caught up!")
|
||||
return
|
||||
|
||||
ok, fail = 0, 0
|
||||
for aid in missing:
|
||||
url = f"https://arxiv.org/pdf/{aid}.pdf"
|
||||
dest = ARXIV_DIR / f"{aid}.pdf"
|
||||
try:
|
||||
r = subprocess.run(
|
||||
["wget", "-q", "-T", "15", "-O", str(dest), url],
|
||||
timeout=20
|
||||
)
|
||||
if r.returncode == 0 and dest.exists() and dest.stat().st_size > 5000:
|
||||
ok += 1
|
||||
print(f" OK {aid} ({dest.stat().st_size//1024} KB)")
|
||||
else:
|
||||
dest.unlink(missing_ok=True)
|
||||
fail += 1
|
||||
print(f" FAIL {aid} (rc={r.returncode}, sz={dest.stat().st_size if dest.exists() else 0})")
|
||||
except Exception as e:
|
||||
dest.unlink(missing_ok=True)
|
||||
fail += 1
|
||||
print(f" ERR {aid} {e}")
|
||||
time.sleep(0.8) # Be nice to arXiv
|
||||
|
||||
print(f"\nDone: {ok} ok, {fail} failed")
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
13
api/check_trans.py
Normal file
13
api/check_trans.py
Normal file
@@ -0,0 +1,13 @@
|
||||
import urllib.request, json
|
||||
r = urllib.request.urlopen("http://127.0.0.1:8000/api/translate/2005.14165")
|
||||
data = json.loads(r.read())
|
||||
p = data["paragraphs"][0]
|
||||
page1 = p["page"]
|
||||
en1 = p["en"][:60]
|
||||
zh1 = p["zh"][:80]
|
||||
print(f"page={page1}, en={en1}")
|
||||
print(f"zh={zh1}")
|
||||
p5 = data["paragraphs"][5]
|
||||
page5 = p5["page"]
|
||||
zh5 = p5["zh"][:80]
|
||||
print(f"para[5] page={page5}, zh={zh5}")
|
||||
198
api/downloader.py
Normal file
198
api/downloader.py
Normal file
@@ -0,0 +1,198 @@
|
||||
"""
|
||||
LLM 论文图书馆 — PDF 下载器
|
||||
从 arXiv 和 HuggingFace 下载论文 PDF 到本地缓存
|
||||
"""
|
||||
|
||||
import os
|
||||
import json
|
||||
import time
|
||||
import logging
|
||||
from pathlib import Path
|
||||
|
||||
import httpx
|
||||
from tqdm import tqdm
|
||||
|
||||
ROOT = Path(__file__).resolve().parent.parent
|
||||
DATA_FILE = ROOT / "data" / "papers.json"
|
||||
ARXIV_DIR = ROOT / "papers" / "arxiv"
|
||||
HF_DIR = ROOT / "papers" / "hf"
|
||||
LOG_FILE = ROOT / "papers" / "download.log"
|
||||
|
||||
log = logging.getLogger("downloader")
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format="%(asctime)s %(levelname)s %(message)s",
|
||||
handlers=[
|
||||
logging.FileHandler(LOG_FILE),
|
||||
logging.StreamHandler(),
|
||||
],
|
||||
)
|
||||
|
||||
|
||||
def collect_urls() -> tuple[list[tuple[str, str]], list[tuple[str, str]]]:
|
||||
"""从 papers.json 收集所有需要下载的 PDF URL
|
||||
|
||||
Returns:
|
||||
arxiv_list: [(arxiv_id, title), ...]
|
||||
hf_list: [(url, filename), ...]
|
||||
"""
|
||||
with open(DATA_FILE) as f:
|
||||
data = json.load(f)
|
||||
|
||||
arxiv_seen = set()
|
||||
hf_seen = set()
|
||||
arxiv_list = []
|
||||
hf_list = []
|
||||
|
||||
for mod in data.values():
|
||||
for area in mod.get("areas", []):
|
||||
for section in ("mainline", "branches", "forward"):
|
||||
for p in area.get(section, []):
|
||||
if p.get("arxiv") and p["arxiv"] not in arxiv_seen:
|
||||
arxiv_seen.add(p["arxiv"])
|
||||
arxiv_list.append((p["arxiv"], p.get("title", "")))
|
||||
if p.get("pdf") and p["pdf"] not in hf_seen:
|
||||
hf_seen.add(p["pdf"])
|
||||
# Derive a safe filename from the URL
|
||||
name = p["pdf"].split("/")[-1].replace(".pdf", "")
|
||||
hf_list.append((p["pdf"], name))
|
||||
|
||||
return arxiv_list, hf_list
|
||||
|
||||
|
||||
def download_arxiv(client: httpx.Client, arxiv_id: str, title: str) -> bool:
|
||||
"""下载单个 arXiv PDF"""
|
||||
pdf_path = ARXIV_DIR / f"{arxiv_id}.pdf"
|
||||
if pdf_path.exists():
|
||||
log.debug(f"Skip (exists): {arxiv_id}")
|
||||
return True
|
||||
|
||||
url = f"https://arxiv.org/pdf/{arxiv_id}.pdf"
|
||||
try:
|
||||
resp = client.get(url, follow_redirects=True, timeout=30)
|
||||
resp.raise_for_status()
|
||||
|
||||
# Verify it's actually a PDF (arxiv returns HTML for missing papers)
|
||||
content_type = resp.headers.get("content-type", "")
|
||||
if "pdf" not in content_type and not resp.content.startswith(b"%PDF"):
|
||||
log.warning(f"Not a PDF: {arxiv_id} — {title[:60]}")
|
||||
return False
|
||||
|
||||
pdf_path.write_bytes(resp.content)
|
||||
size_kb = len(resp.content) / 1024
|
||||
log.info(f"OK: {arxiv_id} ({size_kb:.0f} KB) — {title[:60]}")
|
||||
return True
|
||||
except httpx.HTTPError as e:
|
||||
log.error(f"HTTP error {arxiv_id}: {e}")
|
||||
return False
|
||||
except Exception as e:
|
||||
log.error(f"Error {arxiv_id}: {e}")
|
||||
return False
|
||||
|
||||
|
||||
def download_hf(client: httpx.Client, url: str, filename: str) -> bool:
|
||||
"""下载单个 HuggingFace PDF"""
|
||||
safe_name = filename.replace("..", "").replace("/", "_")
|
||||
pdf_path = HF_DIR / f"{safe_name}.pdf"
|
||||
if pdf_path.exists():
|
||||
log.debug(f"Skip (exists): {safe_name}")
|
||||
return True
|
||||
|
||||
try:
|
||||
resp = client.get(url, follow_redirects=True, timeout=60)
|
||||
resp.raise_for_status()
|
||||
|
||||
if not resp.content.startswith(b"%PDF"):
|
||||
log.warning(f"Not a PDF: {safe_name}")
|
||||
return False
|
||||
|
||||
pdf_path.write_bytes(resp.content)
|
||||
size_kb = len(resp.content) / 1024
|
||||
log.info(f"OK (HF): {safe_name} ({size_kb:.0f} KB)")
|
||||
return True
|
||||
except httpx.HTTPError as e:
|
||||
log.error(f"HTTP error {safe_name}: {e}")
|
||||
return False
|
||||
except Exception as e:
|
||||
log.error(f"Error {safe_name}: {e}")
|
||||
return False
|
||||
|
||||
|
||||
def run(incremental: bool = True, limit: int = 0, delay: float = 1.0):
|
||||
"""批量下载所有 PDF
|
||||
|
||||
Args:
|
||||
incremental: True=跳过已有文件
|
||||
limit: 0=全部, N=只下载前N篇
|
||||
delay: 请求间延迟(秒)
|
||||
"""
|
||||
ARXIV_DIR.mkdir(parents=True, exist_ok=True)
|
||||
HF_DIR.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
arxiv_list, hf_list = collect_urls()
|
||||
total = len(arxiv_list) + len(hf_list)
|
||||
log.info(f"Found {len(arxiv_list)} arXiv + {len(hf_list)} HF = {total} PDFs to download")
|
||||
log.info(f"Incremental: {incremental}, Delay: {delay}s")
|
||||
|
||||
if not incremental:
|
||||
log.warning("Non-incremental mode: will re-download existing files")
|
||||
|
||||
# Count existing
|
||||
arxiv_existing = sum(1 for aid, _ in arxiv_list if (ARXIV_DIR / f"{aid}.pdf").exists())
|
||||
hf_existing = sum(1 for _, name in hf_list if (HF_DIR / f"{name}.pdf").exists())
|
||||
log.info(f"Already cached: {arxiv_existing} arXiv + {hf_existing} HF")
|
||||
|
||||
ok, fail = 0, 0
|
||||
total_size = 0.0
|
||||
|
||||
with httpx.Client(
|
||||
headers={"User-Agent": "LLM-Library-Downloader/0.1"},
|
||||
timeout=30,
|
||||
follow_redirects=True,
|
||||
) as client:
|
||||
|
||||
# Download arXiv
|
||||
if limit > 0:
|
||||
arxiv_list = arxiv_list[:limit]
|
||||
for arxiv_id, title in tqdm(arxiv_list, desc="arXiv"):
|
||||
if incremental and (ARXIV_DIR / f"{arxiv_id}.pdf").exists():
|
||||
ok += 1
|
||||
continue
|
||||
success = download_arxiv(client, arxiv_id, title)
|
||||
if success:
|
||||
ok += 1
|
||||
p = ARXIV_DIR / f"{arxiv_id}.pdf"
|
||||
if p.exists():
|
||||
total_size += p.stat().st_size
|
||||
else:
|
||||
fail += 1
|
||||
time.sleep(delay)
|
||||
|
||||
# Download HF
|
||||
if limit > 0:
|
||||
hf_list = hf_list[:limit]
|
||||
for url, name in tqdm(hf_list, desc="HF "):
|
||||
if incremental and (HF_DIR / f"{name}.pdf").exists():
|
||||
ok += 1
|
||||
continue
|
||||
success = download_hf(client, url, name)
|
||||
if success:
|
||||
ok += 1
|
||||
p = HF_DIR / f"{name}.pdf"
|
||||
if p.exists():
|
||||
total_size += p.stat().st_size
|
||||
else:
|
||||
fail += 1
|
||||
time.sleep(delay)
|
||||
|
||||
log.info(f"Done: {ok} OK, {fail} failed, {total_size/1024/1024:.1f} MB total")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
import argparse
|
||||
parser = argparse.ArgumentParser(description="下载论文 PDF 到本地缓存")
|
||||
parser.add_argument("--no-incremental", action="store_true", help="重新下载所有 (默认跳过已有)")
|
||||
parser.add_argument("--limit", type=int, default=0, help="限制下载数量 (0=全部)")
|
||||
parser.add_argument("--delay", type=float, default=1.0, help="请求间延迟 (秒)")
|
||||
args = parser.parse_args()
|
||||
run(incremental=not args.no_incremental, limit=args.limit, delay=args.delay)
|
||||
103
api/extract_data.py
Normal file
103
api/extract_data.py
Normal file
@@ -0,0 +1,103 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Build papers.json by regex-extracting paper entries from llm_library.html"""
|
||||
import re, json, os
|
||||
|
||||
ROOT = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
|
||||
html_path = os.path.join(ROOT, 'llm_library.html')
|
||||
|
||||
with open(html_path, 'r') as f:
|
||||
html = f.read()
|
||||
|
||||
# Step 1: Parse modules (each module is a top-level key in PAPER_DATA)
|
||||
# Find each module block by matching " arch: {" style patterns
|
||||
# Actually, let's parse line by line since this is a human-readable format
|
||||
|
||||
# Simpler approach: extract all paper entries with regex
|
||||
# Pattern: { title:"...", authors:"...", year:..., venue:"...", arxiv:"...", tags:[...] }
|
||||
paper_re = re.compile(
|
||||
r'\{\s*title:\s*"([^"]*)",\s*authors:\s*"([^"]*)",\s*year:\s*(\d+),\s*venue:\s*"([^"]*)",\s*'
|
||||
r'(?:arxiv:\s*"([^"]*)",\s*|pdf:\s*"([^"]*)",\s*|)'
|
||||
r'tags:\s*\[(.*?)\]\s*\}',
|
||||
re.DOTALL
|
||||
)
|
||||
|
||||
papers = []
|
||||
for m in paper_re.finditer(html):
|
||||
title = m.group(1)
|
||||
authors = m.group(2)
|
||||
year = int(m.group(3))
|
||||
venue = m.group(4)
|
||||
arxiv = m.group(5) or None
|
||||
pdf = m.group(6) or None
|
||||
tags_str = m.group(7)
|
||||
tags = re.findall(r'"([^"]*)"', tags_str)
|
||||
|
||||
# Find which module/area this paper belongs to
|
||||
pos = m.start()
|
||||
# Search backwards for module and area context
|
||||
before = html[max(0,pos-3000):pos]
|
||||
|
||||
# Find module id
|
||||
mod_match = re.search(r'\n\s*(\w+):\s*\{\s*\{?\s*name:\s*"([^"]*)"', before)
|
||||
if not mod_match:
|
||||
# Try broader pattern
|
||||
mod_match = re.search(r'(\w+):\s*\{[^}]*name:\s*"([^"]*)"', before)
|
||||
if mod_match:
|
||||
mod_id = mod_match.group(1)
|
||||
mod_name = mod_match.group(2)
|
||||
else:
|
||||
mod_id = 'unknown'
|
||||
mod_name = 'Unknown'
|
||||
|
||||
# Find area id
|
||||
area_match = re.search(r'id:\s*"(\w+)"[^}]*name:\s*"([^"]*)"', before)
|
||||
if area_match:
|
||||
area_id = area_match.group(1)
|
||||
area_name = area_match.group(2)
|
||||
else:
|
||||
area_id = 'unknown'
|
||||
area_name = 'Unknown'
|
||||
|
||||
papers.append({
|
||||
'module': mod_id,
|
||||
'module_name': mod_name,
|
||||
'area': area_id,
|
||||
'area_name': area_name,
|
||||
'title': title,
|
||||
'authors': authors,
|
||||
'year': year,
|
||||
'venue': venue,
|
||||
'arxiv': arxiv,
|
||||
'pdf': pdf,
|
||||
'tags': tags,
|
||||
})
|
||||
|
||||
print(f'Extracted {len(papers)} papers')
|
||||
|
||||
# Group by module → area → section (mainline/branches/forward)
|
||||
# For now, just save as flat list for verification
|
||||
# We'll reconstruct the proper nested structure after verifying
|
||||
|
||||
# Also extract module metadata
|
||||
modules = {}
|
||||
for m in re.finditer(r"(\w+):\s*\{\s*name:\s*\"([^\"]+)\"[^}]*icon:\s*\"([^\"]*)\"[^}]*desc:\s*\"([^\"]*)\"", html):
|
||||
mod_id = m.group(1)
|
||||
modules[mod_id] = {
|
||||
'name': m.group(2),
|
||||
'icon': m.group(3),
|
||||
'desc': m.group(4),
|
||||
'color': mod_id,
|
||||
'areas': []
|
||||
}
|
||||
|
||||
print(f'Found {len(modules)} modules')
|
||||
for mod_id, mod in modules.items():
|
||||
print(f' {mod_id}: {mod["name"]}')
|
||||
|
||||
# Save the flat list for now
|
||||
output_path = os.path.join(os.path.dirname(__file__), '..', 'data', 'papers.json')
|
||||
os.makedirs(os.path.dirname(output_path), exist_ok=True)
|
||||
with open(output_path, 'w') as f:
|
||||
json.dump(papers, f, ensure_ascii=False, indent=2)
|
||||
|
||||
print(f'Saved {len(papers)} papers (flat) to {output_path}')
|
||||
107
api/parse_papers.py
Normal file
107
api/parse_papers.py
Normal file
@@ -0,0 +1,107 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Parse llm_library.html PAPER_DATA block → nested papers.json"""
|
||||
import re, json, os
|
||||
|
||||
HTML = '/app/working/workspaces/default/llm_library.html'
|
||||
JSON = '/app/working/workspaces/default/llm-library/data/papers.json'
|
||||
|
||||
with open(HTML) as f:
|
||||
html = f.read()
|
||||
|
||||
s = html.index('const PAPER_DATA = {')
|
||||
e = html.index('APP STATE')
|
||||
block = html[s+22:e]
|
||||
|
||||
modules = {}
|
||||
current_mod = None
|
||||
current_area = None
|
||||
current_section = 'mainline'
|
||||
|
||||
for line in block.split('\n'):
|
||||
stripped = line.strip()
|
||||
if not stripped:
|
||||
continue
|
||||
indent = len(line) - len(line.lstrip())
|
||||
|
||||
# Module start: " arch: {" at indent 2
|
||||
if indent == 2 and re.match(r'^\w+:\s*\{', stripped):
|
||||
mid = stripped.split(':')[0]
|
||||
if current_mod and current_mod.get('name'):
|
||||
modules[current_mod['id']] = current_mod
|
||||
current_mod = {'id': mid, 'name': '', 'icon': '', 'desc': '', 'color': mid, 'areas': []}
|
||||
current_area = None
|
||||
current_section = 'mainline'
|
||||
continue
|
||||
|
||||
if not current_mod:
|
||||
continue
|
||||
|
||||
# Module metadata at indent 4
|
||||
if indent == 4:
|
||||
m = re.match(r'(\w+):\s*"([^"]*)"', stripped)
|
||||
if m and m.group(1) in ('name', 'icon', 'desc', 'color'):
|
||||
current_mod[m.group(1)] = m.group(2)
|
||||
|
||||
# Area header at indent 8
|
||||
if indent == 8:
|
||||
m_id = re.match(r'id:\s*"(\w+)"', stripped)
|
||||
if m_id:
|
||||
current_area = {'id': m_id.group(1), 'name': '', 'mainline': [], 'branches': [], 'forward': []}
|
||||
current_mod['areas'].append(current_area)
|
||||
current_section = 'mainline'
|
||||
continue
|
||||
m_name = re.match(r'name:\s*"([^"]+)"', stripped)
|
||||
if m_name and current_area:
|
||||
current_area['name'] = m_name.group(1)
|
||||
continue
|
||||
if re.match(r'mainline:\s*\[', stripped):
|
||||
current_section = 'mainline'
|
||||
elif re.match(r'branches:\s*\[', stripped):
|
||||
current_section = 'branches'
|
||||
elif re.match(r'forward:\s*\[', stripped):
|
||||
current_section = 'forward'
|
||||
|
||||
# Paper entry
|
||||
if stripped.startswith('{ title:') and 'tags:' in stripped and current_area:
|
||||
title = re.search(r'title:\s*"([^"]+)"', stripped)
|
||||
authors = re.search(r'authors:\s*"([^"]*?)"', stripped)
|
||||
year = re.search(r'year:\s*(\d+)', stripped)
|
||||
venue = re.search(r'venue:\s*"([^"]*?)"', stripped)
|
||||
arxiv = re.search(r'arxiv:\s*"(\S+?)"', stripped)
|
||||
pdf = re.search(r'pdf:\s*"(https:[^"]+)"', stripped)
|
||||
tags_m = re.search(r'tags:\s*\[(.*?)\]', stripped, re.DOTALL)
|
||||
|
||||
if title and year and tags_m:
|
||||
tags = re.findall(r'"([^"]*)"', tags_m.group(1))
|
||||
entry = {
|
||||
'title': title.group(1),
|
||||
'authors': authors.group(1) if authors else '',
|
||||
'year': int(year.group(1)),
|
||||
'venue': venue.group(1) if venue else '',
|
||||
'tags': tags
|
||||
}
|
||||
if arxiv and arxiv.group(1):
|
||||
entry['arxiv'] = arxiv.group(1)
|
||||
if pdf and pdf.group(1):
|
||||
entry['pdf'] = pdf.group(1)
|
||||
current_area[current_section].append(entry)
|
||||
|
||||
# Save last module
|
||||
if current_mod and current_mod.get('name'):
|
||||
modules[current_mod['id']] = current_mod
|
||||
|
||||
# Count
|
||||
total = sum(
|
||||
len(a.get('mainline',[])) + len(a.get('branches',[])) + len(a.get('forward',[]))
|
||||
for m in modules.values() for a in m.get('areas',[])
|
||||
)
|
||||
|
||||
print(f'Parsed: {len(modules)} modules, {total} papers')
|
||||
for mid, m in sorted(modules.items()):
|
||||
pc = sum(len(a.get('mainline',[]))+len(a.get('branches',[]))+len(a.get('forward',[])) for a in m['areas'])
|
||||
print(f' {mid}: {m["name"]} — {len(m["areas"])} areas, {pc} papers')
|
||||
|
||||
os.makedirs(os.path.dirname(JSON), exist_ok=True)
|
||||
with open(JSON, 'w') as f:
|
||||
json.dump(modules, f, ensure_ascii=False, indent=2)
|
||||
print(f'\nSaved to {JSON}')
|
||||
484
api/server.py
Normal file
484
api/server.py
Normal file
@@ -0,0 +1,484 @@
|
||||
"""
|
||||
LLM 论文图书馆 — FastAPI 后端
|
||||
提供 REST API 进行论文查询、管理、PDF 代理服务
|
||||
"""
|
||||
|
||||
import json
|
||||
import os
|
||||
import hashlib
|
||||
import secrets
|
||||
import logging
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
from fastapi import FastAPI, HTTPException, Query, Depends, Request
|
||||
from fastapi.middleware.cors import CORSMiddleware
|
||||
from fastapi.responses import FileResponse, JSONResponse
|
||||
from fastapi.staticfiles import StaticFiles
|
||||
from pydantic import BaseModel
|
||||
|
||||
# ─── Config ────────────────────────────────────────────
|
||||
ROOT = Path(__file__).resolve().parent.parent
|
||||
DATA_FILE = ROOT / "data" / "papers.json"
|
||||
PAPERS_DIR = ROOT / "papers"
|
||||
API_KEY = os.environ.get("LLM_LIB_API_KEY", "change-me")
|
||||
|
||||
log = logging.getLogger("llm-library")
|
||||
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
|
||||
|
||||
# ─── App ───────────────────────────────────────────────
|
||||
app = FastAPI(
|
||||
title="LLM 论文图书馆",
|
||||
description="大模型论文知识库 API — 查询、搜索、管理论文",
|
||||
version="0.1.0",
|
||||
)
|
||||
|
||||
app.add_middleware(
|
||||
CORSMiddleware,
|
||||
allow_origins=["*"],
|
||||
allow_methods=["*"],
|
||||
allow_headers=["*"],
|
||||
)
|
||||
|
||||
# ─── Auth ──────────────────────────────────────────────
|
||||
def verify_api_key(request: Request):
|
||||
"""简单的 API Key 鉴权 — 用于写操作 (POST/PUT/DELETE)"""
|
||||
auth = request.headers.get("Authorization", "")
|
||||
if auth.startswith("Bearer "):
|
||||
token = auth[7:]
|
||||
else:
|
||||
token = request.query_params.get("api_key", "")
|
||||
if not token or token != API_KEY:
|
||||
raise HTTPException(status_code=401, detail="Invalid or missing API key")
|
||||
return True
|
||||
|
||||
# ─── Data loading ──────────────────────────────────────
|
||||
def load_data():
|
||||
if not DATA_FILE.exists():
|
||||
return {}
|
||||
with open(DATA_FILE, 'r') as f:
|
||||
return json.load(f)
|
||||
|
||||
def save_data(data):
|
||||
with open(DATA_FILE, 'w') as f:
|
||||
json.dump(data, f, ensure_ascii=False, indent=2)
|
||||
|
||||
# ─── Paper CRUD helpers ────────────────────────────────
|
||||
def find_paper(data, module_id, area_id, title):
|
||||
"""Find a paper index by title within module/area"""
|
||||
mod = data.get(module_id)
|
||||
if not mod:
|
||||
return None, None, None, None
|
||||
for area in mod.get("areas", []):
|
||||
if area["id"] == area_id:
|
||||
for section in ("mainline", "branches", "forward"):
|
||||
for i, p in enumerate(area.get(section, [])):
|
||||
if p["title"] == title:
|
||||
return mod, area, section, i
|
||||
return None, None, None, None
|
||||
|
||||
# ─── Routes: Query ─────────────────────────────────────
|
||||
@app.get("/api/stats")
|
||||
def get_stats():
|
||||
"""获取图书馆统计信息"""
|
||||
data = load_data()
|
||||
mods = len(data)
|
||||
areas = 0
|
||||
papers = 0
|
||||
sections = {"mainline": 0, "branches": 0, "forward": 0}
|
||||
for mod in data.values():
|
||||
areas += len(mod.get("areas", []))
|
||||
for area in mod.get("areas", []):
|
||||
for s in ("mainline", "branches", "forward"):
|
||||
n = len(area.get(s, []))
|
||||
papers += n
|
||||
sections[s] += n
|
||||
return {
|
||||
"modules": mods,
|
||||
"areas": areas,
|
||||
"papers": papers,
|
||||
"sections": sections,
|
||||
"data_file": str(DATA_FILE),
|
||||
}
|
||||
|
||||
@app.get("/api/modules")
|
||||
def list_modules():
|
||||
"""列出所有模块 (不含论文详情)"""
|
||||
data = load_data()
|
||||
return [
|
||||
{
|
||||
"id": mid,
|
||||
"name": m["name"],
|
||||
"icon": m["icon"],
|
||||
"desc": m["desc"],
|
||||
"area_count": len(m.get("areas", [])),
|
||||
"paper_count": sum(
|
||||
len(a.get("mainline", [])) + len(a.get("branches", [])) + len(a.get("forward", []))
|
||||
for a in m.get("areas", [])
|
||||
),
|
||||
}
|
||||
for mid, m in data.items()
|
||||
]
|
||||
|
||||
@app.get("/api/modules/{module_id}")
|
||||
def get_module(module_id: str):
|
||||
"""获取单个模块的完整论文数据"""
|
||||
data = load_data()
|
||||
mod = data.get(module_id)
|
||||
if not mod:
|
||||
raise HTTPException(status_code=404, detail=f"Module '{module_id}' not found")
|
||||
return mod
|
||||
|
||||
@app.get("/api/papers")
|
||||
def search_papers(
|
||||
q: str = Query(default="", description="搜索关键词: 标题/作者"),
|
||||
module: Optional[str] = Query(default=None),
|
||||
tag: Optional[str] = Query(default=None, description="起点/关键节点/前沿/前瞻/支线"),
|
||||
limit: int = Query(default=50, ge=1, le=200),
|
||||
):
|
||||
"""搜索论文 (全文/按模块/按标签)"""
|
||||
data = load_data()
|
||||
results = []
|
||||
q = q.lower()
|
||||
for mid, mod in data.items():
|
||||
if module and mid != module:
|
||||
continue
|
||||
for area in mod.get("areas", []):
|
||||
for section in ("mainline", "branches", "forward"):
|
||||
for p in area.get(section, []):
|
||||
# Filter by tag
|
||||
if tag and tag not in p.get("tags", []):
|
||||
continue
|
||||
# Filter by query
|
||||
if q:
|
||||
if q not in (p.get("title", "") + p.get("authors", "")).lower():
|
||||
continue
|
||||
results.append({
|
||||
"module_id": mid,
|
||||
"module_name": mod["name"],
|
||||
"area_id": area["id"],
|
||||
"area_name": area["name"],
|
||||
"section": section,
|
||||
**p,
|
||||
})
|
||||
if len(results) >= limit:
|
||||
break
|
||||
if len(results) >= limit:
|
||||
break
|
||||
if len(results) >= limit:
|
||||
break
|
||||
if len(results) >= limit:
|
||||
break
|
||||
return results
|
||||
|
||||
# ─── Routes: Management (写操作, 需 API Key) ────────────
|
||||
class PaperCreate(BaseModel):
|
||||
module_id: str
|
||||
area_id: str
|
||||
section: str = "mainline" # mainline / branches / forward
|
||||
title: str
|
||||
authors: str = ""
|
||||
year: int
|
||||
venue: str = ""
|
||||
arxiv: Optional[str] = None
|
||||
pdf: Optional[str] = None
|
||||
tags: list[str] = []
|
||||
|
||||
class PaperUpdate(BaseModel):
|
||||
authors: Optional[str] = None
|
||||
year: Optional[int] = None
|
||||
venue: Optional[str] = None
|
||||
arxiv: Optional[str] = None
|
||||
pdf: Optional[str] = None
|
||||
tags: Optional[list[str]] = None
|
||||
section: Optional[str] = None # move to different section
|
||||
|
||||
@app.post("/api/papers", dependencies=[Depends(verify_api_key)])
|
||||
def add_paper(paper: PaperCreate):
|
||||
"""添加一篇新论文"""
|
||||
data = load_data()
|
||||
mod = data.get(paper.module_id)
|
||||
if not mod:
|
||||
raise HTTPException(status_code=404, detail="Module not found")
|
||||
|
||||
area = next((a for a in mod["areas"] if a["id"] == paper.area_id), None)
|
||||
if not area:
|
||||
raise HTTPException(status_code=404, detail="Area not found")
|
||||
|
||||
section = paper.section
|
||||
if section not in ("mainline", "branches", "forward"):
|
||||
raise HTTPException(status_code=400, detail="section must be mainline/branches/forward")
|
||||
|
||||
entry = {
|
||||
"title": paper.title,
|
||||
"authors": paper.authors,
|
||||
"year": paper.year,
|
||||
"venue": paper.venue,
|
||||
"tags": paper.tags,
|
||||
}
|
||||
if paper.arxiv:
|
||||
entry["arxiv"] = paper.arxiv
|
||||
if paper.pdf:
|
||||
entry["pdf"] = paper.pdf
|
||||
|
||||
area.setdefault(section, []).append(entry)
|
||||
save_data(data)
|
||||
log.info(f"Added paper: {paper.title}")
|
||||
return {"ok": True, "title": paper.title}
|
||||
|
||||
@app.put("/api/papers")
|
||||
def update_paper(
|
||||
module_id: str,
|
||||
area_id: str,
|
||||
title: str,
|
||||
update: PaperUpdate,
|
||||
_=Depends(verify_api_key),
|
||||
):
|
||||
"""更新一篇论文"""
|
||||
data = load_data()
|
||||
mod, area, section, idx = find_paper(data, module_id, area_id, title)
|
||||
if mod is None:
|
||||
raise HTTPException(status_code=404, detail="Paper not found")
|
||||
|
||||
paper = area[section][idx]
|
||||
for field in ("authors", "year", "venue", "arxiv", "pdf", "tags"):
|
||||
val = getattr(update, field)
|
||||
if val is not None:
|
||||
paper[field] = val
|
||||
|
||||
# Move to different section?
|
||||
if update.section and update.section != section:
|
||||
if update.section not in ("mainline", "branches", "forward"):
|
||||
raise HTTPException(status_code=400, detail="Invalid section")
|
||||
area[section].pop(idx)
|
||||
area.setdefault(update.section, []).append(paper)
|
||||
|
||||
save_data(data)
|
||||
log.info(f"Updated paper: {title}")
|
||||
return {"ok": True, "title": title}
|
||||
|
||||
@app.delete("/api/papers")
|
||||
def delete_paper(
|
||||
module_id: str,
|
||||
area_id: str,
|
||||
title: str,
|
||||
_=Depends(verify_api_key),
|
||||
):
|
||||
"""删除一篇论文"""
|
||||
data = load_data()
|
||||
mod, area, section, idx = find_paper(data, module_id, area_id, title)
|
||||
if mod is None:
|
||||
raise HTTPException(status_code=404, detail="Paper not found")
|
||||
|
||||
area[section].pop(idx)
|
||||
save_data(data)
|
||||
log.info(f"Deleted paper: {title}")
|
||||
return {"ok": True, "title": title}
|
||||
|
||||
# ─── Routes: PDF proxy ──────────────────────────────────
|
||||
@app.get("/papers/arxiv/{arxiv_id}.pdf")
|
||||
@app.get("/papers/arxiv/{arxiv_id}")
|
||||
def serve_arxiv_pdf(arxiv_id: str):
|
||||
"""从本地缓存提供 arXiv PDF(无 .pdf 后缀路由防 IDM 拦截)"""
|
||||
pdf_path = PAPERS_DIR / "arxiv" / f"{arxiv_id}.pdf"
|
||||
if not pdf_path.exists():
|
||||
raise HTTPException(status_code=404, detail=f"PDF not in local cache: {arxiv_id}")
|
||||
return FileResponse(
|
||||
pdf_path, media_type="application/pdf",
|
||||
headers={"Cache-Control": "public, max-age=86400"},
|
||||
)
|
||||
|
||||
@app.get("/papers/hf/{filename}.pdf")
|
||||
@app.get("/papers/hf/{filename}")
|
||||
def serve_hf_pdf(filename: str):
|
||||
"""从本地缓存提供 HuggingFace PDF(无 .pdf 后缀路由防 IDM 拦截)"""
|
||||
safe_name = filename.replace("..", "").replace("/", "_").removesuffix(".pdf")
|
||||
pdf_path = PAPERS_DIR / "hf" / f"{safe_name}.pdf"
|
||||
if not pdf_path.exists():
|
||||
raise HTTPException(status_code=404, detail=f"PDF not in local cache: {filename}")
|
||||
return FileResponse(
|
||||
pdf_path, media_type="application/pdf",
|
||||
headers={"Cache-Control": "public, max-age=86400"},
|
||||
)
|
||||
|
||||
# ─── Routes: Translation ───────────────────────────────
|
||||
TRANSLATE_CACHE = ROOT / "data" / "translations"
|
||||
TRANSLATE_CACHE.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
def extract_pdf_text_with_pages(pdf_path: Path, max_chars: int = 12000) -> list[dict]:
|
||||
"""从 PDF 提取文本和页码信息,使用 pdftotext (Poppler) 避免 PyMuPDF GPU 依赖"""
|
||||
import subprocess, tempfile
|
||||
|
||||
# Strip arXiv stamp (first page header)
|
||||
stamp = f"{pdf_path.stem}.pdf" # e.g. "1706.03762.pdf"
|
||||
|
||||
result = subprocess.run(
|
||||
["pdftotext", "-layout", "-q", str(pdf_path), "-"],
|
||||
capture_output=True, text=True, timeout=30
|
||||
)
|
||||
|
||||
if result.returncode != 0:
|
||||
log.error(f"pdftotext failed: {result.stderr}")
|
||||
raise HTTPException(status_code=500, detail="PDF text extraction failed")
|
||||
|
||||
text = result.stdout
|
||||
|
||||
# Remove arXiv stamp line
|
||||
import re
|
||||
text = re.sub(r'arXiv:' + re.escape(stamp.split('.pdf')[0]) + r'.*?\n\n', '', text, flags=re.DOTALL)
|
||||
text = re.sub(r'arXiv:' + re.escape(stamp) + r'.*?\n\n', '', text, flags=re.DOTALL)
|
||||
|
||||
# Split by form-feed (page break)
|
||||
pages = text.split('\f')
|
||||
|
||||
result_pages = []
|
||||
total = 0
|
||||
for i, page_text in enumerate(pages):
|
||||
pt = page_text.strip()
|
||||
if not pt: continue
|
||||
result_pages.append({"page": i + 1, "text": pt})
|
||||
total += len(pt)
|
||||
if total >= max_chars: break
|
||||
|
||||
if not result_pages:
|
||||
raise HTTPException(status_code=500, detail="No text extracted from PDF")
|
||||
|
||||
return result_pages
|
||||
|
||||
|
||||
def split_text_with_pages(page_texts: list[dict], max_len: int = 400) -> list[dict]:
|
||||
"""将按页拆分的文本进一步拆为段落,保留页码"""
|
||||
chunks = []
|
||||
for pt in page_texts:
|
||||
page = pt["page"]
|
||||
text = pt["text"]
|
||||
raw_paras = [p.strip() for p in text.split("\n\n") if p.strip()]
|
||||
for para in raw_paras:
|
||||
if len(para) <= max_len:
|
||||
chunks.append({"page": page, "text": para})
|
||||
else:
|
||||
sentences = para.replace(". ", ".|").replace("? ", "?|").replace("! ", "!|").split("|")
|
||||
current = ""
|
||||
for s in sentences:
|
||||
s = s.strip()
|
||||
if not s: continue
|
||||
if len(current) + len(s) + 1 <= max_len:
|
||||
current = (current + " " + s).strip()
|
||||
else:
|
||||
if current: chunks.append({"page": page, "text": current})
|
||||
current = s
|
||||
if current: chunks.append({"page": page, "text": current})
|
||||
return chunks
|
||||
|
||||
|
||||
def translate_text(text: str, source: str = "en", target: str = "zh") -> str:
|
||||
"""使用 MyMemory 免费 API 翻译文本"""
|
||||
import urllib.request
|
||||
import urllib.parse
|
||||
|
||||
url = "https://api.mymemory.translated.net/get"
|
||||
params = urllib.parse.urlencode({
|
||||
"q": text,
|
||||
"langpair": f"{source}|{target}",
|
||||
"mt": "1", # Force machine translation, not memory
|
||||
"de": "me@llm-library.local",
|
||||
})
|
||||
full_url = f"{url}?{params}"
|
||||
|
||||
try:
|
||||
with urllib.request.urlopen(full_url, timeout=15) as resp:
|
||||
data = json.loads(resp.read())
|
||||
except Exception as e:
|
||||
log.warning(f"Translation API error: {e}")
|
||||
return text
|
||||
|
||||
if data.get("responseStatus") == 200 and data.get("responseData"):
|
||||
return data["responseData"]["translatedText"]
|
||||
return text
|
||||
|
||||
|
||||
@app.get("/api/translate/{arxiv_id}")
|
||||
def translate_paper(arxiv_id: str):
|
||||
"""翻译论文正文 (从本地 PDF 提取文本,每段带页码)"""
|
||||
pdf_path = PAPERS_DIR / "arxiv" / f"{arxiv_id}.pdf"
|
||||
if not pdf_path.exists():
|
||||
raise HTTPException(status_code=404, detail=f"PDF not cached: {arxiv_id}")
|
||||
|
||||
cache_file = TRANSLATE_CACHE / f"{arxiv_id}.json"
|
||||
if cache_file.exists():
|
||||
with open(cache_file) as f:
|
||||
return json.load(f)
|
||||
|
||||
# Extract text with page numbers
|
||||
log.info(f"Extracting text from {arxiv_id}")
|
||||
page_texts = extract_pdf_text_with_pages(pdf_path)
|
||||
chunks = split_text_with_pages(page_texts)
|
||||
log.info(f"Translating {len(chunks)} paragraphs for {arxiv_id}")
|
||||
|
||||
translated = []
|
||||
for i, chunk in enumerate(chunks):
|
||||
if i % 10 == 0:
|
||||
log.info(f" [{arxiv_id}] translating paragraph {i+1}/{len(chunks)}")
|
||||
zh = translate_text(chunk["text"])
|
||||
translated.append({
|
||||
"page": chunk["page"],
|
||||
"en": chunk["text"],
|
||||
"zh": zh,
|
||||
})
|
||||
|
||||
result = {"arxiv_id": arxiv_id, "paragraphs": translated, "count": len(translated)}
|
||||
|
||||
with open(cache_file, "w") as f:
|
||||
json.dump(result, f, ensure_ascii=False)
|
||||
|
||||
return result
|
||||
|
||||
|
||||
@app.get("/api/translate/{arxiv_id}/status")
|
||||
def translate_status(arxiv_id: str):
|
||||
"""检查翻译缓存状态"""
|
||||
cache_file = TRANSLATE_CACHE / f"{arxiv_id}.json"
|
||||
return {
|
||||
"arxiv_id": arxiv_id,
|
||||
"cached": cache_file.exists(),
|
||||
"pdf_exists": (PAPERS_DIR / "arxiv" / f"{arxiv_id}.pdf").exists(),
|
||||
}
|
||||
|
||||
# ─── Routes: PDF download on-demand ────────────────────
|
||||
@app.post("/api/download/{arxiv_id}")
|
||||
def download_single_pdf(arxiv_id: str):
|
||||
"""按需下载单篇 arXiv PDF"""
|
||||
import subprocess, sys
|
||||
pdf_path = PAPERS_DIR / "arxiv" / f"{arxiv_id}.pdf"
|
||||
if pdf_path.exists():
|
||||
return {"ok": True, "arxiv_id": arxiv_id, "status": "cached"}
|
||||
|
||||
cmd = [sys.executable, str(ROOT / "api" / "downloader.py"), "--limit", "1", "--delay", "0"]
|
||||
# We need a way to download specific arxiv IDs — for now, just run the downloader
|
||||
# It will try all uncached papers, but the specific one will be among them
|
||||
try:
|
||||
subprocess.run(cmd, cwd=str(ROOT), timeout=60, capture_output=True)
|
||||
if pdf_path.exists():
|
||||
return {"ok": True, "arxiv_id": arxiv_id, "status": "downloaded"}
|
||||
return {"ok": False, "arxiv_id": arxiv_id, "status": "failed"}
|
||||
except subprocess.TimeoutExpired:
|
||||
return {"ok": False, "arxiv_id": arxiv_id, "status": "timeout"}
|
||||
|
||||
# ─── Health ─────────────────────────────────────────────
|
||||
@app.get("/api/health")
|
||||
def health():
|
||||
return {"status": "ok", "version": "0.1.0"}
|
||||
|
||||
# ─── Mount static frontend (at /) ──────────────────────
|
||||
# Static files mounted after API routes to avoid conflicts
|
||||
static_dir = ROOT / "static"
|
||||
if static_dir.exists() and any(static_dir.iterdir()):
|
||||
app.mount("/", StaticFiles(directory=str(static_dir), html=True), name="static")
|
||||
|
||||
# ─── Main ───────────────────────────────────────────────
|
||||
def main():
|
||||
import uvicorn
|
||||
uvicorn.run("api.server:app", host="0.0.0.0", port=8000, reload=True)
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
2175
data/papers.json
Normal file
2175
data/papers.json
Normal file
File diff suppressed because it is too large
Load Diff
55
nginx.conf
Normal file
55
nginx.conf
Normal file
@@ -0,0 +1,55 @@
|
||||
# LLM 论文图书馆 — Nginx 反向代理配置
|
||||
# 部署路径: /etc/nginx/sites-available/llm-library
|
||||
|
||||
server {
|
||||
listen 80;
|
||||
server_name your-domain.com;
|
||||
|
||||
# 安全头
|
||||
add_header X-Frame-Options "SAMEORIGIN" always;
|
||||
add_header X-Content-Type-Options "nosniff" always;
|
||||
add_header Referrer-Policy "no-referrer" always;
|
||||
|
||||
# 限制请求速率 (防止滥用)
|
||||
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
|
||||
|
||||
# API 代理
|
||||
location /api/ {
|
||||
proxy_pass http://127.0.0.1:8000;
|
||||
proxy_set_header Host $host;
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||
proxy_set_header X-Forwarded-Proto $scheme;
|
||||
limit_req zone=api burst=20 nodelay;
|
||||
}
|
||||
|
||||
# PDF 代理 — 强制 inline 阻止 IDM 弹下载
|
||||
location /papers/ {
|
||||
proxy_pass http://127.0.0.1:8741;
|
||||
proxy_set_header Host $host;
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
proxy_read_timeout 60s;
|
||||
# 追加 inline header 防止 IDM 拦截
|
||||
add_header Content-Disposition "inline" always;
|
||||
add_header X-Content-Type-Options "nosniff" always;
|
||||
}
|
||||
|
||||
# 静态文件直接由 Nginx 服务 (性能更好)
|
||||
location /style.css {
|
||||
alias /opt/llm-library/static/style.css;
|
||||
expires 1d;
|
||||
}
|
||||
location /app.js {
|
||||
alias /opt/llm-library/static/app.js;
|
||||
expires 1d;
|
||||
}
|
||||
location /favicon.ico {
|
||||
return 204;
|
||||
}
|
||||
|
||||
# 首页
|
||||
location / {
|
||||
proxy_pass http://127.0.0.1:8000;
|
||||
proxy_set_header Host $host;
|
||||
}
|
||||
}
|
||||
10
proxy_conf.txt
Normal file
10
proxy_conf.txt
Normal file
@@ -0,0 +1,10 @@
|
||||
location / {
|
||||
proxy_pass http://127.0.0.1:8741;
|
||||
proxy_set_header Host $host;
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||
|
||||
# 强制 inline 阻止 IDM 弹下载
|
||||
add_header Content-Disposition "inline" always;
|
||||
add_header X-Content-Type-Options "nosniff" always;
|
||||
}
|
||||
17
pyproject.toml
Normal file
17
pyproject.toml
Normal file
@@ -0,0 +1,17 @@
|
||||
[project]
|
||||
name = "llm-library"
|
||||
version = "0.1.0"
|
||||
description = "LLM 论文图书馆 — 可维护的大模型论文知识库"
|
||||
requires-python = ">=3.10"
|
||||
dependencies = [
|
||||
"fastapi>=0.115",
|
||||
"uvicorn[standard]>=0.34",
|
||||
"httpx>=0.28",
|
||||
"pydantic>=2.10",
|
||||
"python-multipart>=0.0.19",
|
||||
"aiofiles>=24.0",
|
||||
"tqdm>=4.66",
|
||||
]
|
||||
|
||||
[project.scripts]
|
||||
llm-lib = "api.server:main"
|
||||
8
requirements.txt
Normal file
8
requirements.txt
Normal file
@@ -0,0 +1,8 @@
|
||||
fastapi>=0.115
|
||||
uvicorn[standard]>=0.34
|
||||
httpx>=0.28
|
||||
pydantic>=2.10
|
||||
python-multipart>=0.0.19
|
||||
aiofiles>=24.0
|
||||
tqdm>=4.66
|
||||
PyMuPDF>=1.24
|
||||
43
start.sh
Executable file
43
start.sh
Executable file
@@ -0,0 +1,43 @@
|
||||
#!/bin/bash
|
||||
# LLM 论文图书馆 — 启动脚本
|
||||
set -e
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
|
||||
cd "$SCRIPT_DIR"
|
||||
|
||||
# 加载环境变量
|
||||
if [ -f .env ]; then
|
||||
export $(grep -v '^#' .env | xargs)
|
||||
fi
|
||||
|
||||
# 生成 API Key (如果未设置)
|
||||
if [ -z "$LLM_LIB_API_KEY" ]; then
|
||||
export LLM_LIB_API_KEY=$(python3 -c "import secrets; print(secrets.token_urlsafe(32))")
|
||||
echo "API_KEY=$LLM_LIB_API_KEY" > .env
|
||||
echo "⚠️ 自动生成 API Key: $LLM_LIB_API_KEY"
|
||||
fi
|
||||
|
||||
echo "═══ LLM 论文图书馆 ═══"
|
||||
echo " API Key: ${LLM_LIB_API_KEY:0:8}..."
|
||||
echo " Port: ${PORT:-8000}"
|
||||
echo " PDF Dir: papers/"
|
||||
echo
|
||||
|
||||
# 首次运行: 下载依赖
|
||||
if ! python3 -c "import fastapi" 2>/dev/null; then
|
||||
echo "📦 安装依赖..."
|
||||
pip install -r requirements.txt -q
|
||||
fi
|
||||
|
||||
# 如果 papers.json 不存在,从 HTML 重新提取
|
||||
if [ ! -f data/papers.json ]; then
|
||||
echo "📊 提取论文数据..."
|
||||
python3 api/extract_data.py || echo "⚠️ extract_data.py 失败,请手动运行"
|
||||
fi
|
||||
|
||||
# 启动服务
|
||||
echo "🚀 启动服务..."
|
||||
exec python3 -m uvicorn api.server:app \
|
||||
--host 0.0.0.0 \
|
||||
--port ${PORT:-8000} \
|
||||
--log-level ${LOG_LEVEL:-info}
|
||||
252
static/app.js
Normal file
252
static/app.js
Normal file
@@ -0,0 +1,252 @@
|
||||
/**
|
||||
* LLM 论文图书馆 — 前端 JS
|
||||
* 页面加载检测 arXiv/HF 连通性 → 底部状态条
|
||||
* 点击论文:arXiv 连通? → iframe 直连 arXiv → 5s 超时 → HK 兜底
|
||||
* IDM 拦就拦,不额外对抗
|
||||
*/
|
||||
|
||||
const API = '/api';
|
||||
|
||||
let modules = {};
|
||||
let moduleData = {};
|
||||
let pdfTimeout = null;
|
||||
let networkStatus = {};
|
||||
|
||||
const $ = (sel) => document.querySelector(sel);
|
||||
const $$ = (sel) => document.querySelectorAll(sel);
|
||||
const TAG_CLASS = { '起点':'tag-start','关键节点':'tag-milestone','前沿':'tag-frontier','前瞻':'tag-forward','支线':'tag-branch' };
|
||||
|
||||
// ══════════════════ INIT ═══════════════════════════════
|
||||
async function init() {
|
||||
buildStatusBar();
|
||||
try {
|
||||
const resp = await fetch(`${API}/modules`);
|
||||
const mods = await resp.json();
|
||||
for (const m of mods) modules[m.id] = m;
|
||||
renderCards(mods);
|
||||
attachGlowTracking();
|
||||
} catch (e) { console.error(e); }
|
||||
checkSources();
|
||||
document.addEventListener('keydown', e => {
|
||||
if (e.key === 'Escape') {
|
||||
if ($('#pdfOverlay')?.classList.contains('open')) closePdf();
|
||||
else if ($('#overlay').classList.contains('open')) closeModal();
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
// ══════════════════ STATUS BAR ════════════════════════
|
||||
function buildStatusBar() {
|
||||
const bar = document.createElement('div');
|
||||
bar.id = 'statusBar'; bar.className = 'status-bar';
|
||||
bar.innerHTML = `<span class="status-label">连通性检测</span>
|
||||
<span class="status-item" id="status-arxiv"><span class="status-dot"></span> arXiv <span class="status-ms" id="ms-arxiv">—</span></span>
|
||||
<span class="status-item" id="status-hf"><span class="status-dot"></span> HuggingFace <span class="status-ms" id="ms-hf">—</span></span>`;
|
||||
document.body.appendChild(bar);
|
||||
}
|
||||
|
||||
function setStatus(id, ok, ms, aborted) {
|
||||
networkStatus[id.replace('status-','')] = ok;
|
||||
const el = document.getElementById(id); if (!el) return;
|
||||
el.querySelector('.status-dot').className = 'status-dot ' + (ok ? 'status-ok' : 'status-fail');
|
||||
const msEl = document.getElementById('ms-'+id.replace('status-',''));
|
||||
if (msEl) {
|
||||
if (aborted) msEl.textContent = '超时';
|
||||
else if (ok) msEl.textContent = ms+'ms';
|
||||
else msEl.textContent = '—';
|
||||
}
|
||||
}
|
||||
|
||||
async function checkSource(name, url, statusId) {
|
||||
const start = performance.now();
|
||||
let aborted = false;
|
||||
const probeUrl = url + '?_=' + Date.now();
|
||||
try {
|
||||
const ctrl = new AbortController();
|
||||
setTimeout(() => { aborted = true; ctrl.abort(); }, 4000);
|
||||
await fetch(probeUrl, { mode: 'no-cors', signal: ctrl.signal, cache: 'no-store' });
|
||||
const ms = Math.round(performance.now() - start);
|
||||
setStatus(statusId, true, ms, false);
|
||||
} catch {
|
||||
const ms = Math.round(performance.now() - start);
|
||||
setStatus(statusId, false, ms, aborted);
|
||||
}
|
||||
}
|
||||
|
||||
function checkSources() {
|
||||
checkSource('arxiv', 'https://arxiv.org/favicon.ico', 'status-arxiv');
|
||||
checkSource('hf', 'https://huggingface.co/favicon.ico', 'status-hf');
|
||||
}
|
||||
|
||||
// ══════════════════ GLOW ══════════════════════════════
|
||||
function attachGlowTracking() {
|
||||
$$('.card').forEach(card => {
|
||||
card.addEventListener('mousemove', e => {
|
||||
const r = card.getBoundingClientRect();
|
||||
card.style.setProperty('--mx', (e.clientX-r.left)/r.width*100+'%');
|
||||
card.style.setProperty('--my', (e.clientY-r.top)/r.height*100+'%');
|
||||
});
|
||||
});
|
||||
}
|
||||
|
||||
// ══════════════════ CARDS → MODAL → PAPERS ═══════════
|
||||
function renderCards(mods) {
|
||||
$('#moduleGrid').innerHTML = mods.map(m =>
|
||||
`<div class="card card-${m.id}" onclick="openModule('${m.id}')">
|
||||
<div class="card-header"><span class="card-icon">${m.icon}</span> ${m.name}</div>
|
||||
<div class="card-desc">${m.desc}</div>
|
||||
<div class="card-badge">${m.area_count} 子领域 · ${m.paper_count} 篇</div>
|
||||
</div>`).join('');
|
||||
}
|
||||
|
||||
async function openModule(modId) {
|
||||
let data = moduleData[modId];
|
||||
if (!data) {
|
||||
try { data = await (await fetch(`${API}/modules/${modId}`)).json(); moduleData[modId]=data; }
|
||||
catch { return; }
|
||||
}
|
||||
const mod = modules[modId];
|
||||
$('#modalTitle').innerHTML = `<span class="card-icon">${mod?.icon||''}</span> ${data.name}`;
|
||||
$('#modalSubtitle').textContent = data.desc||'';
|
||||
const areas = data.areas||[];
|
||||
$('#modalTabs').innerHTML = areas.map((a,i)=>`<button class="tab ${i?'':'active'}" onclick="switchArea(${i})">${a.name}</button>`).join('');
|
||||
if (areas.length) { $('#modalTabs').dataset.areaIdx='0'; renderPapers(areas[0]); }
|
||||
$('#overlay').classList.add('open');
|
||||
}
|
||||
|
||||
function switchArea(idx) {
|
||||
const data = Object.values(moduleData).find(d => d.areas && d.areas[idx]);
|
||||
if (!data) return;
|
||||
$$('#modalTabs .tab').forEach((t,i)=>t.classList.toggle('active',i===idx));
|
||||
renderPapers(data.areas[idx]);
|
||||
}
|
||||
|
||||
function closeModal() { $('#overlay').classList.remove('open'); }
|
||||
|
||||
function renderPapers(area) {
|
||||
const s = (l, ps, c) => ps.length ? `<div class="section-label ${c}">${l}</div>`+ps.map(renderPaper).join('') : '';
|
||||
$('#modalContent').innerHTML =
|
||||
s('📌 主线论文', area.mainline||[],'mainline') +
|
||||
s('🌿 支线论文', area.branches||[],'branch') +
|
||||
s('🔮 前瞻探索', area.forward||[],'forward') ||
|
||||
'<p style="color:var(--text-dim);padding:20px;">暂无论文数据</p>';
|
||||
}
|
||||
|
||||
function renderPaper(p) {
|
||||
const pdfUrl = getPdfLink(p);
|
||||
const tags = (p.tags||[]).map(t=>`<span class="paper-tag ${TAG_CLASS[t]||'tag-branch'}">${t}</span>`).join(' ');
|
||||
const links = [];
|
||||
if (pdfUrl) links.push(`<button class="paper-link" data-pdf="${encodeURIComponent(pdfUrl)}" data-title="${encodeURIComponent(p.title)}" onclick="openPdfBtn(this)">📄 阅读</button>`);
|
||||
else if (p.arxiv) links.push(`<a class="paper-link" href="https://arxiv.org/abs/${p.arxiv}" target="_blank">📋 arXiv</a>`);
|
||||
return `<div class="paper-item"><div class="paper-year">${p.year||'—'}</div><div class="paper-body">
|
||||
<div class="paper-title">${p.title}</div>
|
||||
<div class="paper-meta"><span>${p.authors||''}</span>${p.venue?`<span class="paper-venue">${p.venue}</span>`:''}${tags}</div>
|
||||
<div class="paper-links">${links.join('')}</div>
|
||||
</div></div>`;
|
||||
}
|
||||
|
||||
function getPdfLink(p) {
|
||||
// 有 pdf 字段 → 返回外部源 URL(arxiv 直连 / HF 直连)
|
||||
if (p.pdf) return p.pdf;
|
||||
// 只有 arxiv id → arXiv 直连
|
||||
if (p.arxiv) return `https://arxiv.org/pdf/${p.arxiv}.pdf`;
|
||||
return null;
|
||||
}
|
||||
|
||||
function openPdfBtn(btn) { openPdf(btn.dataset.pdf, btn.dataset.title); }
|
||||
|
||||
// ══════════════════ PDF VIEWER ════════════════════════
|
||||
function getLocalUrl(extUrl) {
|
||||
// arXiv
|
||||
const am = extUrl.match(/arxiv\.org\/pdf\/(\d+\.\d+)/);
|
||||
if (am) return `/papers/arxiv/${am[1]}.pdf`;
|
||||
// HuggingFace
|
||||
if (extUrl.includes('huggingface.co')) {
|
||||
const name = decodeURIComponent(extUrl).split('/').pop().replace('.pdf','');
|
||||
return `/papers/hf/${name}.pdf`;
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
function openPdf(url, title) {
|
||||
const decodedUrl = decodeURIComponent(url);
|
||||
const decodedTitle = decodeURIComponent(title);
|
||||
|
||||
// 构建 overlay
|
||||
if (!$('#pdfOverlay')) {
|
||||
const div = document.createElement('div'); div.id='pdfOverlay'; div.className='pdf-overlay';
|
||||
div.innerHTML = `<div class="pdf-container" id="pdfContainer">
|
||||
<div class="pdf-toolbar"><span id="pdfTitle">PDF</span><span id="pdfStatus" style="color:var(--orange);font-size:0.8em;margin-left:8px;"></span>
|
||||
<button onclick="window.open($('#pdfFrame').src || currentPdf,'_blank')">🔗 新窗口</button>
|
||||
<button class="pdf-close" onclick="closePdf()">×</button></div>
|
||||
<iframe class="pdf-frame" id="pdfFrame" src=""></iframe></div>`;
|
||||
document.body.appendChild(div);
|
||||
div.addEventListener('click', e => { if (e.target===div) closePdf(); });
|
||||
}
|
||||
|
||||
if (pdfTimeout) { clearTimeout(pdfTimeout); pdfTimeout=null; }
|
||||
$('#pdfTitle').textContent = decodedTitle;
|
||||
$('#pdfStatus').textContent = '';
|
||||
const frame = $('#pdfFrame');
|
||||
let loaded = false;
|
||||
frame.src = 'about:blank';
|
||||
frame.onload = ()=>{ loaded=true; if(pdfTimeout){clearTimeout(pdfTimeout);pdfTimeout=null;} $('#pdfStatus').textContent=''; };
|
||||
|
||||
const hkLocalUrl = getLocalUrl(decodedUrl);
|
||||
const isArxiv = decodedUrl.includes('arxiv.org');
|
||||
const isHF = decodedUrl.includes('huggingface.co');
|
||||
const isRemote = isArxiv || isHF;
|
||||
const sourceOk = isArxiv ? networkStatus['arxiv'] !== false
|
||||
: isHF ? networkStatus['hf'] !== false
|
||||
: false;
|
||||
|
||||
// 弹框先出
|
||||
$('#pdfOverlay').classList.add('open');
|
||||
|
||||
if (isRemote && sourceOk) {
|
||||
// 远程源 iframe 直连
|
||||
loaded = false;
|
||||
$('#pdfStatus').textContent = isArxiv ? '🌐 arXiv 加载中...' : '🤗 HuggingFace 加载中...';
|
||||
frame.src = decodedUrl;
|
||||
|
||||
pdfTimeout = setTimeout(() => {
|
||||
if (loaded) return;
|
||||
if (!hkLocalUrl) { $('#pdfStatus').textContent='⚠️ 超时'; return; }
|
||||
$('#pdfStatus').textContent = '⏳ 超时,走 HK 服务器...';
|
||||
frame.src = hkLocalUrl;
|
||||
}, 5000);
|
||||
} else {
|
||||
// 直接走 HK
|
||||
$('#pdfStatus').textContent = hkLocalUrl ? '📂 HK 服务器加载中...' : '⏳ 加载中...';
|
||||
frame.src = hkLocalUrl || decodedUrl;
|
||||
}
|
||||
}
|
||||
|
||||
function closePdf() {
|
||||
if (pdfTimeout) { clearTimeout(pdfTimeout); pdfTimeout=null; }
|
||||
$('#pdfOverlay').classList.remove('open');
|
||||
$('#pdfFrame').src = '';
|
||||
}
|
||||
|
||||
// ══════════════════ SEARCH ════════════════════════════
|
||||
function searchPapers(q) {
|
||||
q=(q||'').toLowerCase().trim();
|
||||
if (q.length<2) { $$('.card').forEach(c=>c.style.outline=''); return; }
|
||||
fetch(`${API}/papers?q=${encodeURIComponent(q)}&limit=10`).then(r=>r.json()).then(ps=>{
|
||||
const matched = new Set(ps.map(p=>p.module_id));
|
||||
$$('.card').forEach(c=>{
|
||||
const m=[...c.classList].find(x=>x.startsWith('card-'))?.replace('card-','');
|
||||
c.style.outline=matched.has(m)?'2px solid var(--green)':'';
|
||||
});
|
||||
});
|
||||
}
|
||||
|
||||
// ══════════════════ EVENTS ════════════════════════════
|
||||
if (typeof document !== 'undefined') {
|
||||
document.addEventListener('DOMContentLoaded', () => {
|
||||
init();
|
||||
$('.search-input').addEventListener('input', e => searchPapers(e.target.value));
|
||||
$('#modalClose').addEventListener('click', closeModal);
|
||||
$('#overlay').addEventListener('click', e => { if (e.target===$('#overlay')) closeModal(); });
|
||||
});
|
||||
}
|
||||
34
static/index.html
Normal file
34
static/index.html
Normal file
@@ -0,0 +1,34 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="zh-CN">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>LLM 论文图书馆</title>
|
||||
<link rel="stylesheet" href="/style.css">
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<div class="header">
|
||||
<h1>🧠 LLM 论文图书馆</h1>
|
||||
</div>
|
||||
|
||||
<div class="search-wrap">
|
||||
<span class="search-icon">🔍</span>
|
||||
<input class="search-input" placeholder="搜索论文标题、作者、关键词..." autocomplete="off">
|
||||
</div>
|
||||
|
||||
<div class="grid" id="moduleGrid"></div>
|
||||
|
||||
<div class="overlay" id="overlay">
|
||||
<div class="modal" id="modal">
|
||||
<button class="modal-close" id="modalClose">×</button>
|
||||
<div class="modal-title" id="modalTitle"></div>
|
||||
<div class="modal-subtitle" id="modalSubtitle"></div>
|
||||
<div class="tabs" id="modalTabs"></div>
|
||||
<div id="modalContent"></div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<script src="/app.js"></script>
|
||||
</body>
|
||||
</html>
|
||||
7
static/pdf.min.js
vendored
Normal file
7
static/pdf.min.js
vendored
Normal file
@@ -0,0 +1,7 @@
|
||||
<html>
|
||||
<head><title>404 Not Found</title></head>
|
||||
<body>
|
||||
<center><h1>404 Not Found</h1></center>
|
||||
<hr><center>nginx</center>
|
||||
</body>
|
||||
</html>
|
||||
313
static/style.css
Normal file
313
static/style.css
Normal file
@@ -0,0 +1,313 @@
|
||||
/* LLM 论文图书馆 — 前端样式 (提取自 llm_library.html) */
|
||||
:root {
|
||||
--bg: #0a0e14;
|
||||
--bg-card: #12171f;
|
||||
--bg-modal: #0d1117;
|
||||
--border: #1e293b;
|
||||
--text: #c9d1d9;
|
||||
--text-dim: #8b949e;
|
||||
--text-bright: #e6edf3;
|
||||
--blue: #58a6ff; --blue-bg: #0d1b2a; --blue-border: #1a3a5c;
|
||||
--green: #3fb950; --green-bg: #0d1f14; --green-border: #1a3d1a;
|
||||
--red: #f85149; --red-bg: #1f0d0d; --red-border: #3d1a1a;
|
||||
--purple: #bc8cff; --purple-bg: #190d2a; --purple-border: #2d1a3d;
|
||||
--orange: #d2991d; --orange-bg: #1f160d; --orange-border: #3d2d1a;
|
||||
--cyan: #39d2c0; --cyan-bg: #0d1f1c; --cyan-border: #1a3d3a;
|
||||
--pink: #f778ba; --pink-bg: #1f0d1a; --pink-border: #3d1a2d;
|
||||
--yellow: #e3b341; --yellow-bg: #1f1a0d; --yellow-border: #3d3d1a;
|
||||
--teal: #79c0ff; --teal-bg: #0d1d2d; --teal-border: #1a2d3d;
|
||||
--radius: 10px; --radius-sm: 6px; --transition: 0.2s ease;
|
||||
}
|
||||
|
||||
*, *::before, *::after { box-sizing: border-box; margin: 0; padding: 0; }
|
||||
body {
|
||||
background: var(--bg); color: var(--text);
|
||||
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', 'Noto Sans SC', sans-serif;
|
||||
padding: 32px 24px 60px; min-height: 100vh;
|
||||
}
|
||||
|
||||
.header { text-align: center; margin-bottom: 32px; }
|
||||
.header h1 {
|
||||
font-size: 1.5em;
|
||||
background: linear-gradient(135deg, var(--blue), var(--purple), var(--pink));
|
||||
-webkit-background-clip: text; -webkit-text-fill-color: transparent;
|
||||
}
|
||||
.header p { color: var(--text-dim); font-size: 0.85em; margin-top: 4px; }
|
||||
|
||||
.grid {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(auto-fill, minmax(260px, 1fr));
|
||||
gap: 16px; max-width: 1400px; margin: 0 auto;
|
||||
}
|
||||
|
||||
.card {
|
||||
background: var(--bg-card); border: 1px solid var(--border);
|
||||
border-radius: var(--radius); padding: 18px 16px;
|
||||
cursor: pointer; position: relative; overflow: hidden;
|
||||
transition: border-color 0.3s ease, box-shadow 0.3s ease;
|
||||
}
|
||||
.card::before {
|
||||
content: ''; position: absolute; inset: 0;
|
||||
background: radial-gradient(circle at var(--mx, 50%) var(--my, 50%),
|
||||
rgba(88,166,255,0.12) 0%, transparent 60%);
|
||||
opacity: 0; transition: opacity 0.3s ease;
|
||||
pointer-events: none; z-index: 0;
|
||||
}
|
||||
.card:hover::before { opacity: 1; }
|
||||
.card:hover {
|
||||
border-color: rgba(88,166,255,0.4);
|
||||
box-shadow: 0 0 20px rgba(88,166,255,0.08), inset 0 0 20px rgba(88,166,255,0.03);
|
||||
}
|
||||
.card-header, .card-desc, .card-badge { position: relative; z-index: 1; }
|
||||
.card-icon { font-size: 1.3em; }
|
||||
.card-desc { font-size: 0.8em; color: var(--text-dim); line-height: 1.5; }
|
||||
.card-badge {
|
||||
position: absolute; top: 12px; right: 12px;
|
||||
font-size: 0.7em; background: rgba(255,255,255,.06);
|
||||
padding: 2px 8px; border-radius: 10px; color: var(--text-dim);
|
||||
}
|
||||
|
||||
.card-arch { border-left: 3px solid var(--blue); }
|
||||
.card-multi { border-left: 3px solid var(--cyan); }
|
||||
.card-data { border-left: 3px solid var(--green); }
|
||||
.card-pretrain { border-left: 3px solid var(--red); }
|
||||
.card-post { border-left: 3px solid var(--purple);}
|
||||
.card-compress { border-left: 3px solid var(--orange);}
|
||||
.card-deploy { border-left: 3px solid var(--teal); }
|
||||
.card-agent { border-left: 3px solid var(--pink); }
|
||||
.card-eval { border-left: 3px solid var(--yellow);}
|
||||
|
||||
.card-arch .card-header { color: var(--blue); }
|
||||
.card-multi .card-header { color: var(--cyan); }
|
||||
.card-data .card-header { color: var(--green); }
|
||||
.card-pretrain .card-header { color: var(--red); }
|
||||
.card-post .card-header { color: var(--purple);}
|
||||
.card-compress .card-header { color: var(--orange);}
|
||||
.card-deploy .card-header { color: var(--teal); }
|
||||
.card-agent .card-header { color: var(--pink); }
|
||||
.card-eval .card-header { color: var(--yellow);}
|
||||
|
||||
/* Modal */
|
||||
.overlay {
|
||||
display: none; position: fixed; inset: 0;
|
||||
background: rgba(0,0,0,.7); z-index: 100;
|
||||
justify-content: center; align-items: flex-start;
|
||||
padding: 40px 16px; overflow-y: auto;
|
||||
}
|
||||
.overlay.open { display: flex; }
|
||||
|
||||
.modal {
|
||||
background: var(--bg-modal); border: 1px solid var(--border);
|
||||
border-radius: var(--radius); width: 100%; max-width: 960px;
|
||||
padding: 28px 24px; position: relative;
|
||||
animation: fadeIn 0.2s ease;
|
||||
}
|
||||
@keyframes fadeIn { from { opacity: 0; transform: translateY(8px); } to { opacity: 1; transform: translateY(0); } }
|
||||
.modal-close {
|
||||
position: absolute; top: 16px; right: 16px;
|
||||
background: none; border: none; color: var(--text-dim);
|
||||
font-size: 1.4em; cursor: pointer;
|
||||
width: 36px; height: 36px; border-radius: 50%;
|
||||
display: flex; align-items: center; justify-content: center;
|
||||
}
|
||||
.modal-close:hover { background: rgba(255,255,255,.06); color: var(--text); }
|
||||
.modal-title { font-size: 1.2em; font-weight: 700; margin-bottom: 6px; display: flex; align-items: center; gap: 8px; }
|
||||
.modal-subtitle { color: var(--text-dim); font-size: 0.85em; margin-bottom: 24px; }
|
||||
|
||||
/* Tabs */
|
||||
.tabs { display: flex; gap: 4px; flex-wrap: wrap; border-bottom: 1px solid var(--border); margin-bottom: 20px; }
|
||||
.tab {
|
||||
padding: 8px 16px; font-size: 0.85em; color: var(--text-dim);
|
||||
cursor: pointer; border: none; background: none;
|
||||
border-bottom: 2px solid transparent;
|
||||
transition: color var(--transition), border-color var(--transition);
|
||||
}
|
||||
.tab:hover { color: var(--text); }
|
||||
.tab.active { color: var(--text-bright); border-bottom-color: var(--blue); font-weight: 600; }
|
||||
|
||||
/* Section labels */
|
||||
.section-label {
|
||||
font-size: 0.78em; font-weight: 700; text-transform: uppercase;
|
||||
letter-spacing: 1px; margin: 16px 0 8px; padding: 4px 0;
|
||||
}
|
||||
.section-label.mainline { color: var(--blue); border-bottom: 1px solid var(--blue-border); }
|
||||
.section-label.branch { color: var(--orange); border-bottom: 1px solid var(--orange-border); }
|
||||
.section-label.forward { color: var(--purple); border-bottom: 1px solid var(--purple-border); }
|
||||
|
||||
/* Paper item */
|
||||
.paper-item {
|
||||
display: flex; align-items: flex-start; gap: 12px;
|
||||
padding: 10px 12px; border-radius: var(--radius-sm);
|
||||
margin-bottom: 4px; transition: background var(--transition);
|
||||
}
|
||||
.paper-item:hover { background: rgba(255,255,255,.03); }
|
||||
.paper-year { flex-shrink: 0; width: 42px; font-size: 0.75em; color: var(--text-dim); font-variant-numeric: tabular-nums; text-align: right; padding-top: 1px; }
|
||||
.paper-body { flex: 1; min-width: 0; }
|
||||
.paper-title { font-size: 0.9em; font-weight: 600; color: var(--text-bright); line-height: 1.4; margin-bottom: 2px; }
|
||||
.paper-meta { font-size: 0.75em; color: var(--text-dim); display: flex; gap: 8px; flex-wrap: wrap; align-items: center; }
|
||||
.paper-venue { color: var(--green); font-weight: 500; }
|
||||
|
||||
/* Tags */
|
||||
.paper-tag { display: inline-block; font-size: 0.7em; padding: 1px 6px; border-radius: 4px; font-weight: 600; }
|
||||
.tag-start { background: #1a3a5c; color: var(--blue); }
|
||||
.tag-milestone{ background: #3d1a1a; color: var(--red); }
|
||||
.tag-frontier { background: #1a3d1a; color: var(--green);}
|
||||
.tag-forward { background: #2d1a3d; color: var(--purple);}
|
||||
.tag-branch { background: #3d2d1a; color: var(--orange);}
|
||||
|
||||
/* Links */
|
||||
.paper-links { display: flex; gap: 6px; margin-top: 4px; flex-wrap: wrap; }
|
||||
.paper-link {
|
||||
font-size: 0.72em; color: var(--blue); text-decoration: none;
|
||||
padding: 2px 8px; border: 1px solid var(--blue-border);
|
||||
border-radius: 4px; transition: background var(--transition); cursor: pointer;
|
||||
}
|
||||
.paper-link:hover { background: var(--blue-bg); }
|
||||
|
||||
/* Search */
|
||||
.search-wrap { max-width: 600px; margin: 0 auto 28px; position: relative; }
|
||||
.search-input {
|
||||
width: 100%; padding: 10px 16px 10px 38px;
|
||||
background: var(--bg-card); border: 1px solid var(--border);
|
||||
border-radius: var(--radius); color: var(--text); font-size: 0.9em; outline: none;
|
||||
}
|
||||
.search-input:focus { border-color: var(--blue); }
|
||||
.search-icon { position: absolute; left: 12px; top: 50%; transform: translateY(-50%); color: var(--text-dim); }
|
||||
|
||||
@media (max-width: 640px) { .grid { grid-template-columns: 1fr; } .modal { padding: 20px 16px; } }
|
||||
|
||||
/* PDF viewer */
|
||||
.pdf-overlay {
|
||||
display: none; position: fixed; inset: 0; background: rgba(0,0,0,.85);
|
||||
z-index: 200; justify-content: center; align-items: center;
|
||||
transition: justify-content 0.3s ease;
|
||||
}
|
||||
.pdf-overlay.open { display: flex; }
|
||||
.pdf-container {
|
||||
width: 90vw; height: 90vh; background: #fff; border-radius: var(--radius);
|
||||
display: flex; flex-direction: column; overflow: hidden;
|
||||
}
|
||||
.pdf-toolbar {
|
||||
display: flex; align-items: center; gap: 12px; padding: 10px 16px;
|
||||
background: #1a1a2e; color: #eee; font-size: 0.85em;
|
||||
}
|
||||
.pdf-toolbar button {
|
||||
background: rgba(255,255,255,.1); border: none; color: #eee;
|
||||
padding: 4px 12px; border-radius: 4px; cursor: pointer; font-size: 0.85em;
|
||||
}
|
||||
.pdf-toolbar button:hover { background: rgba(255,255,255,.2); }
|
||||
.pdf-toolbar .pdf-close { margin-left: auto; font-size: 1.2em; }
|
||||
.pdf-frame { flex: 1; border: none; width: 100%; }
|
||||
|
||||
/* Translation */
|
||||
.trans-btn {
|
||||
font-size: 0.7em; color: var(--cyan); cursor: pointer;
|
||||
padding: 2px 6px; border: 1px solid var(--cyan-border);
|
||||
border-radius: 4px; background: transparent;
|
||||
margin-left: 4px; transition: background var(--transition);
|
||||
}
|
||||
.trans-btn:hover { background: var(--cyan-bg); }
|
||||
.trans-btn.loading { opacity: 0.5; pointer-events: none; }
|
||||
.paper-title-zh {
|
||||
font-size: 0.8em; color: var(--cyan); margin-top: 3px; line-height: 1.4;
|
||||
}
|
||||
|
||||
/* Status bar */
|
||||
.status-bar {
|
||||
position: fixed; bottom: 0; left: 0; right: 0;
|
||||
background: var(--bg-card); border-top: 1px solid var(--border);
|
||||
display: flex; gap: 20px; align-items: center; padding: 8px 20px;
|
||||
z-index: 50; font-size: 0.78em; color: var(--text-dim);
|
||||
}
|
||||
.status-label {
|
||||
color: var(--text-dim); font-weight: 600; margin-right: 4px;
|
||||
}
|
||||
.status-item {
|
||||
display: flex; align-items: center; gap: 6px;
|
||||
}
|
||||
.status-ms {
|
||||
color: var(--text-dim); font-variant-numeric: tabular-nums;
|
||||
margin-left: 2px;
|
||||
}
|
||||
.status-dot {
|
||||
width: 8px; height: 8px; border-radius: 50%;
|
||||
background: var(--text-dim); transition: background 0.3s;
|
||||
}
|
||||
.status-dot.status-ok { background: var(--green); }
|
||||
.status-dot.status-fail { background: var(--red); }
|
||||
|
||||
.pdf-container {
|
||||
width: 90vw; height: 90vh; background: #fff; border-radius: var(--radius);
|
||||
display: flex; flex-direction: column; overflow: hidden;
|
||||
transition: width 0.3s ease;
|
||||
}
|
||||
.pdf-container.with-trans { width: 55vw; }
|
||||
|
||||
/* Translation panel (right sidebar) */
|
||||
.trans-panel {
|
||||
position: fixed; right: -50vw; top: 5vh; bottom: 5vh;
|
||||
width: 45vw; min-width: 380px; max-width: 650px;
|
||||
background: var(--bg-modal); border: 1px solid var(--border);
|
||||
border-radius: var(--radius);
|
||||
z-index: 201; display: flex; flex-direction: column;
|
||||
transition: right 0.35s cubic-bezier(0.4,0,0.2,1);
|
||||
box-shadow: -4px 0 24px rgba(0,0,0,.5);
|
||||
overflow: hidden;
|
||||
}
|
||||
.trans-panel.open { right: 2vw; }
|
||||
.trans-panel-header {
|
||||
display: flex; align-items: center; gap: 12px;
|
||||
padding: 12px 16px; border-bottom: 1px solid var(--border);
|
||||
font-weight: 700; font-size: 1em; flex-shrink: 0;
|
||||
}
|
||||
.trans-panel-close {
|
||||
margin-left: auto; background: none; border: none;
|
||||
color: var(--text-dim); font-size: 1.3em; cursor: pointer;
|
||||
}
|
||||
.trans-panel-body {
|
||||
flex: 1; overflow-y: auto; padding: 16px;
|
||||
scroll-behavior: smooth;
|
||||
}
|
||||
.trans-panel-footer {
|
||||
padding: 10px 16px; border-top: 1px solid var(--border); flex-shrink: 0;
|
||||
}
|
||||
.trans-panel-footer button {
|
||||
background: var(--blue-bg); color: var(--blue);
|
||||
border: 1px solid var(--blue-border); padding: 6px 16px;
|
||||
border-radius: 4px; cursor: pointer; font-size: 0.85em;
|
||||
}
|
||||
.trans-panel-footer button:hover { background: #1a3a5c; }
|
||||
.trans-panel-footer button:disabled { opacity: 0.5; cursor: not-allowed; }
|
||||
|
||||
.trans-para-group {
|
||||
margin-bottom: 18px; padding-bottom: 14px;
|
||||
border-bottom: 1px solid rgba(255,255,255,.04);
|
||||
transition: background 0.3s ease;
|
||||
}
|
||||
.trans-para-group.active {
|
||||
background: rgba(88,166,255,.06);
|
||||
border-left: 2px solid var(--blue);
|
||||
padding-left: 10px;
|
||||
margin-left: -12px;
|
||||
padding-right: 4px;
|
||||
border-radius: 0 4px 4px 0;
|
||||
}
|
||||
.trans-en, .trans-zh {
|
||||
font-size: 0.82em; line-height: 1.65;
|
||||
}
|
||||
.trans-page-badge {
|
||||
font-size: 0.65em; color: var(--text-dim); margin-bottom: 2px;
|
||||
font-variant-numeric: tabular-nums;
|
||||
}
|
||||
.trans-en { color: var(--text); margin-bottom: 4px; }
|
||||
.trans-zh { color: var(--cyan); padding-left: 8px; border-left: 2px solid var(--cyan-border); }
|
||||
|
||||
/* Scroll sync highlight */
|
||||
.trans-para-group.highlight {
|
||||
background: rgba(88,166,255,.12);
|
||||
border-left: 3px solid var(--blue);
|
||||
padding-left: 9px;
|
||||
margin-left: -12px;
|
||||
padding-right: 4px;
|
||||
border-radius: 0 4px 4px 0;
|
||||
}
|
||||
Reference in New Issue
Block a user