# 大模型全生命周期管理平台
# PilotDeck
OpenBMB开源一站式大模型开发部署平台,全流程工具链助力大模型快速落地
## (一)项目简介
### 核心定位
本项目是OpenBMB推出的生产级大模型全生命周期管理平台,为开发者和企业提供从模型训练、微调、推理部署到性能监控、资源调度的一站式解决方案,解决大模型开发门槛高、部署复杂、运维困难等核心痛点,帮助企业零门槛构建和部署自己的大模型应用。
### 解决的痛点
- 大模型开发需要深厚的AI技术积累,普通开发者和中小企业难以快速上手
- 多模型管理混乱,缺乏统一的调度、监控和版本管理机制
- 部署流程繁琐,资源利用率低,推理性能优化难度大,运维成本高
- 不同框架和模型之间兼容性差,模型迁移和复用困难
### 核心优势
- **全流程一体化**:覆盖模型训练、参数高效微调、推理部署、性能监控、资源调度全环节,一个平台搞定大模型应用开发全流程
- **低代码可视化操作**:提供直观的Web管理界面,无需复杂编码,通过拖拽和配置即可完成模型部署和服务发布
- **全主流模型兼容**:原生支持Llama 2/3、Qwen 1.5/2、ChatGLM、Baichuan等数十种主流开源大模型,自动适配不同模型格式
- **高性能推理优化**:内置模型量化、剪枝、分布式推理、批处理等优化技术,推理速度提升3-10倍,大幅降低部署成本
- **企业级生产能力**:支持多租户隔离、细粒度权限管理、弹性扩缩容、日志审计和故障自动恢复,满足企业级生产环境需求
## (二)环境前置要求
- **操作系统**:Ubuntu 20.04+/CentOS 8+/Debian 11+(推荐Linux系统)
- **Python版本**:Python 3.9 - 3.11
- **软件依赖**:Git、Docker 20.10+、Docker Compose 2.0+、NVIDIA Container Toolkit(GPU环境必需)
- **硬件要求**:
- 推荐配置:8核CPU + 32GB内存 + NVIDIA GPU(显存≥16GB,支持CUDA 11.8+)
- 最低配置:4核CPU + 16GB内存(仅用于CPU推理和轻量级模型部署)
## (三)快速开始 / 安装部署
### 方式一:Docker Compose一键部署(推荐生产环境)
```bash
# 克隆仓库
git clone https://github.com/OpenBMB/PilotDeck.git
cd PilotDeck
# 复制并修改环境配置文件
cp .env.example .env
# 编辑.env文件,配置数据库、GPU资源和模型存储路径
# 启动所有服务
docker compose up -d
```
服务启动后,访问 `http://你的服务器IP:8000` 即可进入管理后台,默认账号密码为 `admin/admin123`。
### 方式二:源码本地部署(开发测试)
```bash
# 克隆仓库
git clone https://github.com/OpenBMB/PilotDeck.git
cd PilotDeck
# 创建并激活虚拟环境
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# 安装依赖
pip install -r requirements.txt
# 初始化数据库
python init_db.py
# 启动前端和后端服务
npm install && npm run build
python main.py
```
本地访问 `http://localhost:8000` 即可使用。
## (四)基础使用示例
### 1. 导入模型
1. 登录管理后台,进入"模型管理"页面
2. 点击"导入模型",选择模型来源(本地文件、Hugging Face、ModelScope)
3. 输入模型名称和版本,选择模型类型,点击"开始导入"
4. 等待模型导入完成后,即可在模型列表中查看
### 2. 创建微调任务
1. 进入"微调管理"页面,点击"新建微调任务"
2. 选择基础模型,上传训练数据集(支持JSON、CSV格式)
3. 配置微调参数(学习率、批次大小、训练轮数等)
4. 点击"开始训练",系统自动执行微调任务,可实时查看训练进度和损失曲线
### 3. 部署推理服务
1. 进入"服务部署"页面,点击"新建服务"
2. 选择要部署的模型和版本,配置资源配额和并发数
3. 选择部署方式(单实例、分布式),点击"部署"
4. 等待服务启动完成后,系统会自动生成API调用地址和密钥
### 4. 调用推理API
```bash
curl -X POST http://你的服务器IP:8000/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer 你的API密钥" \
-d '{
"model": "qwen-7b-chat",
"messages": [{"role": "user", "content": "你好,请介绍一下自己"}],
"temperature": 0.7,
"max_tokens": 512
}'
```
## (五)开源许可证
本项目采用 **Apache License 2.0** 开源许可证,详细条款请参考项目根目录下的 LICENSE 文件。
Task-oriented AI Agent productivity platform — redefining operational boundaries and memory evolution, one WorkSpace at a time.
English | 简体中文
Website · Live Demo · Tutorial · Quick Start · Highlights · Use Cases · Community
---
**News** 🔥
- **[2026.05.28]** PilotDeck is now open source! Visit our official website at [pilotdeck.openbmb.cn](https://pilotdeck.openbmb.cn). We welcome contributions, feedback, and stars from the community.
---
## 💡 About PilotDeck
**PilotDeck** is an open-source agent operating system designed around the concept of "WorkSpace". It is jointly developed and open-sourced by Tsinghua University [THUNLP](https://nlp.csai.tsinghua.edu.cn/), [ModelBest](https://modelbest.cn/), [OpenBMB](https://www.openbmb.cn/), and [AI9Stars](https://github.com/AI9Stars). Targeting general-purpose, multi-task scenarios, PilotDeck is built to be a true *productivity tool* for the Agent era.
A wave of excellent AI Agent harnesses has emerged in recent years, each with its own focus: **Claude Code / Cursor / Trae Solo** brought model reasoning deep into the programming IDE; **Claude Cowork** introduced the notion of project-level isolation to desktop-side knowledge work; **WorkBuddy** connected agents to IM ecosystems such as WeCom and Feishu so AI is one message away.
When we shift the lens from "one-shot programming" or "immediate Q&A" to **long-running, multi-project productivity work**, however, several questions remain open:
- When many projects run in parallel, can memory be **white-box and traceable**? When the AI gets something wrong, can you pinpoint which memory entry caused it and edit it directly — without starting a new chat from scratch?
- Can token cost be **tracked per task**, so that running agents in the background actually becomes economically viable?
- Can tasks of different difficulty **automatically be matched to different models**, instead of burning the flagship model on trivial calls?
- When you step away from the keyboard, can the work keep moving? Can the agent **proactively discover what's worth doing, report progress, and land results as files on disk**?
PilotDeck is an incremental exploration around exactly these questions. It uses the WorkSpace as the fundamental unit — completely isolating files, memory and skills per project — and pairs it with three pillar capabilities: **White-box Memory**, **Smart Routing** and **Always-on**. The entire system natively supports the [Model Context Protocol (MCP)](https://modelcontextprotocol.io/) and behaves consistently across front-ends (Web / CLI / IM).
### ✨ Key Highlights
|
**WorkSpace-Level Isolation & Accretion**
Every project gets its own file system, memory store and skill set. Parallel work no longer interferes with itself, retrieval has a bounded scope, and skills accrete naturally as each task grows — no more global context pollution.
|
**Traceable White-box Memory**
Memory generation, extraction, storage and retrieval are visible end-to-end. When the AI mis-remembers, you can pinpoint and fix the offending entry. Built-in **Dream Mode** consolidates memory in idle windows, and supports one-click rollback.
|
|
**Smart Routing & Cost Optimization**
Task difficulty is auto-detected; complex calls go to flagship models (e.g. Claude 3.5 Sonnet / GPT-4o), simple ones drop to lighter models. Through on-device / cloud co-orchestration and precise matching, token spend shrinks dramatically without sacrificing quality.
|
**Always-on Background Execution**
PilotDeck breaks the "you ask, it answers" loop: after you sign off, the agent keeps discovering candidate tasks, running long-horizon monitors, and finally lands deliverables as local files with a summary report waiting for you.
|
### 📊 Real-world Numbers
The three pillar capabilities have shown clear advantages in production-grade workflows:
#### 1. Smart Routing — ~70% cost savings on social-media workloads
In Xiaohongshu-style social-media operations, enabling Smart Routing automatically demotes simple polishing / layout tasks to a sub-agent (e.g. Sonnet 4.5) and only invokes Opus 4.5 at planning checkpoints:
| Setup |
Model configuration |
Cost |
Multiplier |
| Smart Routing ON |
Opus 4.5 (main) + Sonnet 4.5 (sub) |
$2.83 |
1.1× |
| Smart Routing OFF |
All Opus 4.5 (main + sub) |
$12.58 |
5.0× |
| Monolithic |
Single Opus 4.5 long-react (estimated) |
$12.20 |
4.8× |
![]() |
#### 2. Smart Routing — 1/6 the cost while beating frontier models on hard tasks
The research team benchmarked 7 complex tasks (multilingual podcast push, multi-source data reports, domain-specific literature review, codebase architecture docs, etc.). The "strong main + light sub" routing setup matches or beats the frontier single-model setup at a fraction of the cost:
| Setting |
Score |
Cost |
| MiniMax-M2.7 single-agent |
37.1 |
$1.90 |
| Claude Sonnet 4.6 single-agent |
69.1 |
$18.36 |
| Sonnet 4.6 (main) + MiniMax-M2.7 (sub) |
70.6 |
$3.15 |
![]() |
#### 3. White-box Memory — layout & tone never bleed across projects
In black-box agents, mixing tasks in a shared context pool inevitably pollutes memory. PilotDeck's WorkSpace-scoped white-box memory addresses this end-to-end:
| Dimension |
Current AI Agents (black-box) |
PilotDeck (white-box) |
| Visibility |
You can't see what the AI remembers, only what it outputs |
View every memory entry: what was stored, when, and which WorkSpace |
| Control |
Once written, memory can't be edited or removed |
Edit / delete entries, pin critical decisions so they don't drift |
| Traceability |
When it goes wrong, you can't find the root cause |
Generation → extraction → storage → retrieval, all auditable |
| Isolation |
One shared pool — projects bleed into each other |
Scoped per WorkSpace; A's memory never reaches B |
| Reversible |
After compression, the original is gone |
Dream-mode supports one-click rollback to the prior state |
---
## 🖥️ UI & Demo
PilotDeck ships an out-of-the-box Web UI with full WorkSpace management, white-box memory editing, and visualization of multi-agent collaboration.
### Use Cases
> All demos below are generated entirely by edge-side models via PilotDeck's Smart Routing — no cloud-side frontier model required.
#### Work Document Generation
> *"Survey the Chinese LLM application market and turn it into a formal HTML white paper."*
| Process |
Result |
 |
 |
#### Mini-Game Development
> *"Walk me through building an iOS AR mini-game Ball Finder in Vibe Coding mode."*
| Process |
Result |
 |
 |
#### AI Engineering Platform Development
> *"Build a low-code embedding fine-tuning platform from scratch."*
| Process |
Result |
 |
 |
#### Audio-Video Editing & Social Media Operations
> *"Push this English podcast to a global audience in Chinese / Japanese / French / Korean / Spanish / Arabic."*
| Process |
Result (with audio) |
 |
https://github.com/user-attachments/assets/a7245467-ee3c-4939-a055-c56576ac56d1
|
---
## 📦 Installation & Quick Start
We provide a one-line installer for macOS / Linux, plus a source-based workflow for developers.
### Option A: One-line install (recommended, macOS / Linux)
```bash
curl -fsSL https://raw.githubusercontent.com/OpenBMB/PilotDeck/main/install.sh | bash
```
The script auto-installs Node.js 22, clones the repo, installs dependencies, and builds the frontend. Once it finishes:
```bash
pilotdeck # starts the server at http://localhost:3001
pilotdeck status # check runtime status
```
### Option B: From source (for developers)
**1. Clone and install dependencies**
> This repo uses [Git LFS](https://git-lfs.com/) for large media assets. Make sure `git lfs` is installed before cloning.
> If you don't need the demo videos/GIFs, add `GIT_LFS_SKIP_SMUDGE=1` before `git clone` to skip downloading them.
```bash
git clone https://github.com/OpenBMB/PilotDeck.git
cd PilotDeck
npm install # root deps (Gateway runtime)
cd ui && npm install # UI deps
cd ..
```
**2. Configure a model provider**
PilotDeck reads `~/.pilotdeck/pilotdeck.yaml`. You can create it manually, let the bootstrap script generate one, **or just open the Web UI and configure providers visually in the settings panel.**
Supported protocols include OpenAI, Anthropic, DeepSeek, Qwen, Kimi, MiniMax and other OpenAI-compatible endpoints.
```yaml
schemaVersion: 1
agent:
model: deepseek/deepseek-v4-pro
model:
providers:
deepseek:
protocol: openai
url: https://api.deepseek.com/v1
apiKey: sk-your-api-key
```
**3. Start the services**
```bash
cd ui && npm run dev # dev mode (HMR), visit http://localhost:5173
# or
cd ui && npm run start # production mode, visit http://localhost:3001
```
### Option C: Docker Compose
If Docker is installed, you can start PilotDeck with:
```bash
docker compose up -d
```
---
## 🛠️ Extension Protocol
PilotDeck has an open plugin architecture with a strict boundary between the open-source core and plugin customization. Extending the system is a `plugin.json` away:
- **MCP Servers** — first-class integration with any Model Context Protocol server.
- **Tools & Skills** — register custom tools, or pull community skills via [ClawHub](https://www.npmjs.com/package/clawhub).
- **Lifecycle Hooks** — intercept `PreToolUse`, `UserPromptSubmit`, and other critical lifecycle events.
- **Custom Memory** — plug in your own memory store provider.
---
## 🤝 Contributing
Thanks to everyone who has contributed code, feedback, and ideas. New contributors are warmly welcome — let's build the next-gen agent OS together.
Workflow: **Fork → feature branch → PR**.
---
## 💬 Community
- For bugs and feature requests, please open a [GitHub Issue](https://github.com/OpenBMB/PilotDeck/issues).
- Join our community channels:
| WeChat Community |
Feishu Community |
Discord Community |
 |
 |
 |
---
## 🙏 Acknowledgements
We thank Agent OS pioneers such as OpenClaw, Claude Code, Codex, Cursor, and Hermes for their explorations that helped shape this field.
PilotDeck builds upon the following outstanding open-source projects:
- [ClawXRouter](https://github.com/OpenBMB/ClawXRouter) — Intelligent model routing
- [ClawXMemory](https://github.com/OpenBMB/ClawXMemory) — Agent memory system
- [Claude Code UI](https://github.com/siteboon/claudecodeui) — Web UI reference
- [Claude Code Router](https://github.com/musistudio/claude-code-router) — Model routing reference
- [UltraRAG](https://github.com/OpenBMB/UltraRAG) — RAG framework
- [Anthropic Skills](https://github.com/anthropics/skills) — Agent skill framework and built-in skills (skill-creator)
- [Vercel Labs Skills](https://github.com/vercel-labs/skills) — find-skills skill
- [MiniMax-AI Skills](https://github.com/MiniMax-AI/skills) — minimax-pdf skill
- [frontend-slides](https://github.com/zarazhangrui/frontend-slides) — Create beautiful slides on the web using a coding agent's frontend skills
- [Karpathy Guidelines](https://x.com/karpathy/status/2015883857489522876) — LLM coding behavioral guidelines
- [Vite](https://github.com/vitejs/vite) — Frontend build tool
- [React](https://github.com/facebook/react) — UI framework
- [Tailwind CSS](https://github.com/tailwindlabs/tailwindcss) — Utility-first CSS framework
- [shadcn/ui](https://github.com/shadcn-ui/ui) — Accessible component primitives for React
---
## 🏢 Joint Development
PilotDeck is jointly developed by Tsinghua University [THUNLP](https://nlp.csai.tsinghua.edu.cn/), [ModelBest](https://modelbest.cn/), [OpenBMB](https://www.openbmb.cn/) and [AI9Stars](https://github.com/AI9Stars).
---
## ⭐ Support Us
If PilotDeck has been helpful in your work or research, please consider giving us a Star on GitHub!
---
## 📝 Citation
```bibtex
@misc{pilotdeck2026,
author = {PilotDeck Team},
title = {PilotDeck: A WorkSpace-Centric Open-Source Agent Operating System},
howpublished = {\url{https://github.com/OpenBMB/PilotDeck}},
year = {2026},
note = {Accessed: 2026-05-29}
}
```
## 📄 License
This project is licensed under the [GNU Affero General Public License v3.0](LICENSE).