# 大模型全生命周期管理平台 # PilotDeck OpenBMB开源一站式大模型开发部署平台,全流程工具链助力大模型快速落地 ## (一)项目简介 ### 核心定位 本项目是OpenBMB推出的生产级大模型全生命周期管理平台,为开发者和企业提供从模型训练、微调、推理部署到性能监控、资源调度的一站式解决方案,解决大模型开发门槛高、部署复杂、运维困难等核心痛点,帮助企业零门槛构建和部署自己的大模型应用。 ### 解决的痛点 - 大模型开发需要深厚的AI技术积累,普通开发者和中小企业难以快速上手 - 多模型管理混乱,缺乏统一的调度、监控和版本管理机制 - 部署流程繁琐,资源利用率低,推理性能优化难度大,运维成本高 - 不同框架和模型之间兼容性差,模型迁移和复用困难 ### 核心优势 - **全流程一体化**:覆盖模型训练、参数高效微调、推理部署、性能监控、资源调度全环节,一个平台搞定大模型应用开发全流程 - **低代码可视化操作**:提供直观的Web管理界面,无需复杂编码,通过拖拽和配置即可完成模型部署和服务发布 - **全主流模型兼容**:原生支持Llama 2/3、Qwen 1.5/2、ChatGLM、Baichuan等数十种主流开源大模型,自动适配不同模型格式 - **高性能推理优化**:内置模型量化、剪枝、分布式推理、批处理等优化技术,推理速度提升3-10倍,大幅降低部署成本 - **企业级生产能力**:支持多租户隔离、细粒度权限管理、弹性扩缩容、日志审计和故障自动恢复,满足企业级生产环境需求 ## (二)环境前置要求 - **操作系统**:Ubuntu 20.04+/CentOS 8+/Debian 11+(推荐Linux系统) - **Python版本**:Python 3.9 - 3.11 - **软件依赖**:Git、Docker 20.10+、Docker Compose 2.0+、NVIDIA Container Toolkit(GPU环境必需) - **硬件要求**: - 推荐配置:8核CPU + 32GB内存 + NVIDIA GPU(显存≥16GB,支持CUDA 11.8+) - 最低配置:4核CPU + 16GB内存(仅用于CPU推理和轻量级模型部署) ## (三)快速开始 / 安装部署 ### 方式一:Docker Compose一键部署(推荐生产环境) ```bash # 克隆仓库 git clone https://github.com/OpenBMB/PilotDeck.git cd PilotDeck # 复制并修改环境配置文件 cp .env.example .env # 编辑.env文件,配置数据库、GPU资源和模型存储路径 # 启动所有服务 docker compose up -d ``` 服务启动后,访问 `http://你的服务器IP:8000` 即可进入管理后台,默认账号密码为 `admin/admin123`。 ### 方式二:源码本地部署(开发测试) ```bash # 克隆仓库 git clone https://github.com/OpenBMB/PilotDeck.git cd PilotDeck # 创建并激活虚拟环境 python -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate # 安装依赖 pip install -r requirements.txt # 初始化数据库 python init_db.py # 启动前端和后端服务 npm install && npm run build python main.py ``` 本地访问 `http://localhost:8000` 即可使用。 ## (四)基础使用示例 ### 1. 导入模型 1. 登录管理后台,进入"模型管理"页面 2. 点击"导入模型",选择模型来源(本地文件、Hugging Face、ModelScope) 3. 输入模型名称和版本,选择模型类型,点击"开始导入" 4. 等待模型导入完成后,即可在模型列表中查看 ### 2. 创建微调任务 1. 进入"微调管理"页面,点击"新建微调任务" 2. 选择基础模型,上传训练数据集(支持JSON、CSV格式) 3. 配置微调参数(学习率、批次大小、训练轮数等) 4. 点击"开始训练",系统自动执行微调任务,可实时查看训练进度和损失曲线 ### 3. 部署推理服务 1. 进入"服务部署"页面,点击"新建服务" 2. 选择要部署的模型和版本,配置资源配额和并发数 3. 选择部署方式(单实例、分布式),点击"部署" 4. 等待服务启动完成后,系统会自动生成API调用地址和密钥 ### 4. 调用推理API ```bash curl -X POST http://你的服务器IP:8000/api/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer 你的API密钥" \ -d '{ "model": "qwen-7b-chat", "messages": [{"role": "user", "content": "你好,请介绍一下自己"}], "temperature": 0.7, "max_tokens": 512 }' ``` ## (五)开源许可证 本项目采用 **Apache License 2.0** 开源许可证,详细条款请参考项目根目录下的 LICENSE 文件。

PilotDeck

Task-oriented AI Agent productivity platform — redefining operational boundaries and memory evolution, one WorkSpace at a time.

Official Website Live Demo License MCP Native Stars
Discord   Feishu   WeChat

English | 简体中文
Website · Live Demo · Tutorial · Quick Start · Highlights · Use Cases · Community

--- **News** 🔥 - **[2026.05.28]** PilotDeck is now open source! Visit our official website at [pilotdeck.openbmb.cn](https://pilotdeck.openbmb.cn). We welcome contributions, feedback, and stars from the community. --- ## 💡 About PilotDeck **PilotDeck** is an open-source agent operating system designed around the concept of "WorkSpace". It is jointly developed and open-sourced by Tsinghua University [THUNLP](https://nlp.csai.tsinghua.edu.cn/), [ModelBest](https://modelbest.cn/), [OpenBMB](https://www.openbmb.cn/), and [AI9Stars](https://github.com/AI9Stars). Targeting general-purpose, multi-task scenarios, PilotDeck is built to be a true *productivity tool* for the Agent era. A wave of excellent AI Agent harnesses has emerged in recent years, each with its own focus: **Claude Code / Cursor / Trae Solo** brought model reasoning deep into the programming IDE; **Claude Cowork** introduced the notion of project-level isolation to desktop-side knowledge work; **WorkBuddy** connected agents to IM ecosystems such as WeCom and Feishu so AI is one message away. When we shift the lens from "one-shot programming" or "immediate Q&A" to **long-running, multi-project productivity work**, however, several questions remain open: - When many projects run in parallel, can memory be **white-box and traceable**? When the AI gets something wrong, can you pinpoint which memory entry caused it and edit it directly — without starting a new chat from scratch? - Can token cost be **tracked per task**, so that running agents in the background actually becomes economically viable? - Can tasks of different difficulty **automatically be matched to different models**, instead of burning the flagship model on trivial calls? - When you step away from the keyboard, can the work keep moving? Can the agent **proactively discover what's worth doing, report progress, and land results as files on disk**? PilotDeck is an incremental exploration around exactly these questions. It uses the WorkSpace as the fundamental unit — completely isolating files, memory and skills per project — and pairs it with three pillar capabilities: **White-box Memory**, **Smart Routing** and **Always-on**. The entire system natively supports the [Model Context Protocol (MCP)](https://modelcontextprotocol.io/) and behaves consistently across front-ends (Web / CLI / IM). ### ✨ Key Highlights
**WorkSpace-Level Isolation & Accretion** Every project gets its own file system, memory store and skill set. Parallel work no longer interferes with itself, retrieval has a bounded scope, and skills accrete naturally as each task grows — no more global context pollution.

WorkSpace isolation demo

**Traceable White-box Memory** Memory generation, extraction, storage and retrieval are visible end-to-end. When the AI mis-remembers, you can pinpoint and fix the offending entry. Built-in **Dream Mode** consolidates memory in idle windows, and supports one-click rollback.

White-box memory demo

**Smart Routing & Cost Optimization** Task difficulty is auto-detected; complex calls go to flagship models (e.g. Claude 3.5 Sonnet / GPT-4o), simple ones drop to lighter models. Through on-device / cloud co-orchestration and precise matching, token spend shrinks dramatically without sacrificing quality.

Smart routing demo

**Always-on Background Execution** PilotDeck breaks the "you ask, it answers" loop: after you sign off, the agent keeps discovering candidate tasks, running long-horizon monitors, and finally lands deliverables as local files with a summary report waiting for you.

Always-on execution demo

### 📊 Real-world Numbers The three pillar capabilities have shown clear advantages in production-grade workflows: #### 1. Smart Routing — ~70% cost savings on social-media workloads In Xiaohongshu-style social-media operations, enabling Smart Routing automatically demotes simple polishing / layout tasks to a sub-agent (e.g. Sonnet 4.5) and only invokes Opus 4.5 at planning checkpoints:
Setup Model configuration Cost Multiplier
Smart Routing ON Opus 4.5 (main) + Sonnet 4.5 (sub) $2.83 1.1×
Smart Routing OFF All Opus 4.5 (main + sub) $12.58 5.0×
Monolithic Single Opus 4.5 long-react (estimated) $12.20 4.8×
#### 2. Smart Routing — 1/6 the cost while beating frontier models on hard tasks The research team benchmarked 7 complex tasks (multilingual podcast push, multi-source data reports, domain-specific literature review, codebase architecture docs, etc.). The "strong main + light sub" routing setup matches or beats the frontier single-model setup at a fraction of the cost:
Setting Score Cost
MiniMax-M2.7 single-agent 37.1 $1.90
Claude Sonnet 4.6 single-agent 69.1 $18.36
Sonnet 4.6 (main) + MiniMax-M2.7 (sub) 70.6 $3.15
#### 3. White-box Memory — layout & tone never bleed across projects In black-box agents, mixing tasks in a shared context pool inevitably pollutes memory. PilotDeck's WorkSpace-scoped white-box memory addresses this end-to-end:
Dimension Current AI Agents (black-box) PilotDeck (white-box)
Visibility You can't see what the AI remembers, only what it outputs View every memory entry: what was stored, when, and which WorkSpace
Control Once written, memory can't be edited or removed Edit / delete entries, pin critical decisions so they don't drift
Traceability When it goes wrong, you can't find the root cause Generation → extraction → storage → retrieval, all auditable
Isolation One shared pool — projects bleed into each other Scoped per WorkSpace; A's memory never reaches B
Reversible After compression, the original is gone Dream-mode supports one-click rollback to the prior state
--- ## 🖥️ UI & Demo PilotDeck ships an out-of-the-box Web UI with full WorkSpace management, white-box memory editing, and visualization of multi-agent collaboration. ### Use Cases > All demos below are generated entirely by edge-side models via PilotDeck's Smart Routing — no cloud-side frontier model required. #### Work Document Generation > *"Survey the Chinese LLM application market and turn it into a formal HTML white paper."*
Process Result
#### Mini-Game Development > *"Walk me through building an iOS AR mini-game Ball Finder in Vibe Coding mode."*
Process Result
#### AI Engineering Platform Development > *"Build a low-code embedding fine-tuning platform from scratch."*
Process Result
#### Audio-Video Editing & Social Media Operations > *"Push this English podcast to a global audience in Chinese / Japanese / French / Korean / Spanish / Arabic."*
Process Result (with audio)
https://github.com/user-attachments/assets/a7245467-ee3c-4939-a055-c56576ac56d1
--- ## 📦 Installation & Quick Start We provide a one-line installer for macOS / Linux, plus a source-based workflow for developers. ### Option A: One-line install (recommended, macOS / Linux) ```bash curl -fsSL https://raw.githubusercontent.com/OpenBMB/PilotDeck/main/install.sh | bash ``` The script auto-installs Node.js 22, clones the repo, installs dependencies, and builds the frontend. Once it finishes: ```bash pilotdeck # starts the server at http://localhost:3001 pilotdeck status # check runtime status ``` ### Option B: From source (for developers) **1. Clone and install dependencies** > This repo uses [Git LFS](https://git-lfs.com/) for large media assets. Make sure `git lfs` is installed before cloning. > If you don't need the demo videos/GIFs, add `GIT_LFS_SKIP_SMUDGE=1` before `git clone` to skip downloading them. ```bash git clone https://github.com/OpenBMB/PilotDeck.git cd PilotDeck npm install # root deps (Gateway runtime) cd ui && npm install # UI deps cd .. ``` **2. Configure a model provider** PilotDeck reads `~/.pilotdeck/pilotdeck.yaml`. You can create it manually, let the bootstrap script generate one, **or just open the Web UI and configure providers visually in the settings panel.** Supported protocols include OpenAI, Anthropic, DeepSeek, Qwen, Kimi, MiniMax and other OpenAI-compatible endpoints. ```yaml schemaVersion: 1 agent: model: deepseek/deepseek-v4-pro model: providers: deepseek: protocol: openai url: https://api.deepseek.com/v1 apiKey: sk-your-api-key ``` **3. Start the services** ```bash cd ui && npm run dev # dev mode (HMR), visit http://localhost:5173 # or cd ui && npm run start # production mode, visit http://localhost:3001 ``` ### Option C: Docker Compose If Docker is installed, you can start PilotDeck with: ```bash docker compose up -d ``` --- ## 🛠️ Extension Protocol PilotDeck has an open plugin architecture with a strict boundary between the open-source core and plugin customization. Extending the system is a `plugin.json` away: - **MCP Servers** — first-class integration with any Model Context Protocol server. - **Tools & Skills** — register custom tools, or pull community skills via [ClawHub](https://www.npmjs.com/package/clawhub). - **Lifecycle Hooks** — intercept `PreToolUse`, `UserPromptSubmit`, and other critical lifecycle events. - **Custom Memory** — plug in your own memory store provider. --- ## 🤝 Contributing Thanks to everyone who has contributed code, feedback, and ideas. New contributors are warmly welcome — let's build the next-gen agent OS together. Workflow: **Fork → feature branch → PR**. --- ## 💬 Community - For bugs and feature requests, please open a [GitHub Issue](https://github.com/OpenBMB/PilotDeck/issues). - Join our community channels:
WeChat Community Feishu Community Discord Community
WeChat QR Feishu QR Discord QR
--- ## 🙏 Acknowledgements We thank Agent OS pioneers such as OpenClaw, Claude Code, Codex, Cursor, and Hermes for their explorations that helped shape this field. PilotDeck builds upon the following outstanding open-source projects: - [ClawXRouter](https://github.com/OpenBMB/ClawXRouter) — Intelligent model routing - [ClawXMemory](https://github.com/OpenBMB/ClawXMemory) — Agent memory system - [Claude Code UI](https://github.com/siteboon/claudecodeui) — Web UI reference - [Claude Code Router](https://github.com/musistudio/claude-code-router) — Model routing reference - [UltraRAG](https://github.com/OpenBMB/UltraRAG) — RAG framework - [Anthropic Skills](https://github.com/anthropics/skills) — Agent skill framework and built-in skills (skill-creator) - [Vercel Labs Skills](https://github.com/vercel-labs/skills) — find-skills skill - [MiniMax-AI Skills](https://github.com/MiniMax-AI/skills) — minimax-pdf skill - [frontend-slides](https://github.com/zarazhangrui/frontend-slides) — Create beautiful slides on the web using a coding agent's frontend skills - [Karpathy Guidelines](https://x.com/karpathy/status/2015883857489522876) — LLM coding behavioral guidelines - [Vite](https://github.com/vitejs/vite) — Frontend build tool - [React](https://github.com/facebook/react) — UI framework - [Tailwind CSS](https://github.com/tailwindlabs/tailwindcss) — Utility-first CSS framework - [shadcn/ui](https://github.com/shadcn-ui/ui) — Accessible component primitives for React --- ## 🏢 Joint Development PilotDeck is jointly developed by Tsinghua University [THUNLP](https://nlp.csai.tsinghua.edu.cn/), [ModelBest](https://modelbest.cn/), [OpenBMB](https://www.openbmb.cn/) and [AI9Stars](https://github.com/AI9Stars). --- ## ⭐ Support Us If PilotDeck has been helpful in your work or research, please consider giving us a Star on GitHub! --- ## 📝 Citation ```bibtex @misc{pilotdeck2026, author = {PilotDeck Team}, title = {PilotDeck: A WorkSpace-Centric Open-Source Agent Operating System}, howpublished = {\url{https://github.com/OpenBMB/PilotDeck}}, year = {2026}, note = {Accessed: 2026-05-29} } ``` ## 📄 License This project is licensed under the [GNU Affero General Public License v3.0](LICENSE).