架构 5 篇: - MiniMax-01: Lightning Attention + MoE (2501.08313) - Byte Latent Transformer: tokenization-free (2412.09871) - Large Concept Models: 句子级推理 (2412.08821) - xLSTM: LSTM复兴, matrix memory (2405.04517) - LLaDA: Diffusion LLM, 非自回归 (2502.09992) 后训练 2 篇: - MiniMax-M1: CISPO RL 推理涌现 (2506.13585) - Coconut: 潜在空间连续推理 (2412.06769) Agent 1 篇: - Generative Agents: Stanford AI Town (2304.03442) 多模态 1 篇: - GPT-4V/4o/5.4: 视觉-语音原生多模态