大模型全生命周期管理平台

SourceShare 582937b9cc sync README 2 周之前
.github daa2c0c15b commit 2 周之前
assets daa2c0c15b commit 2 周之前
docs daa2c0c15b commit 2 周之前
products daa2c0c15b commit 2 周之前
scripts daa2c0c15b commit 2 周之前
skills daa2c0c15b commit 2 周之前
src daa2c0c15b commit 2 周之前
ui daa2c0c15b commit 2 周之前
.dockerignore daa2c0c15b commit 2 周之前
.gitattributes daa2c0c15b commit 2 周之前
.gitignore daa2c0c15b commit 2 周之前
Dockerfile daa2c0c15b commit 2 周之前
LICENSE daa2c0c15b commit 2 周之前
README.md 582937b9cc sync README 2 周之前
README.zh.md daa2c0c15b commit 2 周之前
README_DOCKER.md daa2c0c15b commit 2 周之前
docker-compose.yml daa2c0c15b commit 2 周之前
docker-entrypoint.sh daa2c0c15b commit 2 周之前
install.sh daa2c0c15b commit 2 周之前
package.json daa2c0c15b commit 2 周之前
pnpm-lock.yaml daa2c0c15b commit 2 周之前
pnpm-workspace.yaml daa2c0c15b commit 2 周之前
tsconfig.json daa2c0c15b commit 2 周之前
vitest.config.js daa2c0c15b commit 2 周之前
vitest.setup.ts daa2c0c15b commit 2 周之前

README.md

大模型全生命周期管理平台

PilotDeck

OpenBMB开源一站式大模型开发部署平台,全流程工具链助力大模型快速落地

(一)项目简介

核心定位

本项目是OpenBMB推出的生产级大模型全生命周期管理平台,为开发者和企业提供从模型训练、微调、推理部署到性能监控、资源调度的一站式解决方案,解决大模型开发门槛高、部署复杂、运维困难等核心痛点,帮助企业零门槛构建和部署自己的大模型应用。

解决的痛点

  • 大模型开发需要深厚的AI技术积累,普通开发者和中小企业难以快速上手
  • 多模型管理混乱,缺乏统一的调度、监控和版本管理机制
  • 部署流程繁琐,资源利用率低,推理性能优化难度大,运维成本高
  • 不同框架和模型之间兼容性差,模型迁移和复用困难

核心优势

  • 全流程一体化:覆盖模型训练、参数高效微调、推理部署、性能监控、资源调度全环节,一个平台搞定大模型应用开发全流程
  • 低代码可视化操作:提供直观的Web管理界面,无需复杂编码,通过拖拽和配置即可完成模型部署和服务发布
  • 全主流模型兼容:原生支持Llama 2/3、Qwen 1.5/2、ChatGLM、Baichuan等数十种主流开源大模型,自动适配不同模型格式
  • 高性能推理优化:内置模型量化、剪枝、分布式推理、批处理等优化技术,推理速度提升3-10倍,大幅降低部署成本
  • 企业级生产能力:支持多租户隔离、细粒度权限管理、弹性扩缩容、日志审计和故障自动恢复,满足企业级生产环境需求

(二)环境前置要求

  • 操作系统:Ubuntu 20.04+/CentOS 8+/Debian 11+(推荐Linux系统)
  • Python版本:Python 3.9 - 3.11
  • 软件依赖:Git、Docker 20.10+、Docker Compose 2.0+、NVIDIA Container Toolkit(GPU环境必需)
  • 硬件要求
    • 推荐配置:8核CPU + 32GB内存 + NVIDIA GPU(显存≥16GB,支持CUDA 11.8+)
    • 最低配置:4核CPU + 16GB内存(仅用于CPU推理和轻量级模型部署)

(三)快速开始 / 安装部署

方式一:Docker Compose一键部署(推荐生产环境)

# 克隆仓库
git clone https://github.com/OpenBMB/PilotDeck.git
cd PilotDeck

# 复制并修改环境配置文件
cp .env.example .env
# 编辑.env文件,配置数据库、GPU资源和模型存储路径

# 启动所有服务
docker compose up -d

服务启动后,访问 http://你的服务器IP:8000 即可进入管理后台,默认账号密码为 admin/admin123

方式二:源码本地部署(开发测试)

# 克隆仓库
git clone https://github.com/OpenBMB/PilotDeck.git
cd PilotDeck

# 创建并激活虚拟环境
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# 安装依赖
pip install -r requirements.txt

# 初始化数据库
python init_db.py

# 启动前端和后端服务
npm install && npm run build
python main.py

本地访问 http://localhost:8000 即可使用。

(四)基础使用示例

1. 导入模型

  1. 登录管理后台,进入"模型管理"页面
  2. 点击"导入模型",选择模型来源(本地文件、Hugging Face、ModelScope)
  3. 输入模型名称和版本,选择模型类型,点击"开始导入"
  4. 等待模型导入完成后,即可在模型列表中查看

2. 创建微调任务

  1. 进入"微调管理"页面,点击"新建微调任务"
  2. 选择基础模型,上传训练数据集(支持JSON、CSV格式)
  3. 配置微调参数(学习率、批次大小、训练轮数等)
  4. 点击"开始训练",系统自动执行微调任务,可实时查看训练进度和损失曲线

3. 部署推理服务

  1. 进入"服务部署"页面,点击"新建服务"
  2. 选择要部署的模型和版本,配置资源配额和并发数
  3. 选择部署方式(单实例、分布式),点击"部署"
  4. 等待服务启动完成后,系统会自动生成API调用地址和密钥

4. 调用推理API

curl -X POST http://你的服务器IP:8000/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer 你的API密钥" \
  -d '{
    "model": "qwen-7b-chat",
    "messages": [{"role": "user", "content": "你好,请介绍一下自己"}],
    "temperature": 0.7,
    "max_tokens": 512
  }'

(五)开源许可证

本项目采用 Apache License 2.0 开源许可证,详细条款请参考项目根目录下的 LICENSE 文件。

PilotDeck

Task-oriented AI Agent productivity platform — redefining operational boundaries and memory evolution, one WorkSpace at a time.

Official Website Live Demo License MCP Native Stars
Discord   Feishu   WeChat

English | 简体中文
Website · Live Demo · Tutorial · Quick Start · Highlights · Use Cases · Community


News 🔥

  • [2026.05.28] PilotDeck is now open source! Visit our official website at pilotdeck.openbmb.cn. We welcome contributions, feedback, and stars from the community.

💡 About PilotDeck

PilotDeck is an open-source agent operating system designed around the concept of "WorkSpace". It is jointly developed and open-sourced by Tsinghua University THUNLP, ModelBest, OpenBMB, and AI9Stars. Targeting general-purpose, multi-task scenarios, PilotDeck is built to be a true productivity tool for the Agent era.

A wave of excellent AI Agent harnesses has emerged in recent years, each with its own focus: Claude Code / Cursor / Trae Solo brought model reasoning deep into the programming IDE; Claude Cowork introduced the notion of project-level isolation to desktop-side knowledge work; WorkBuddy connected agents to IM ecosystems such as WeCom and Feishu so AI is one message away.

When we shift the lens from "one-shot programming" or "immediate Q&A" to long-running, multi-project productivity work, however, several questions remain open:

  • When many projects run in parallel, can memory be white-box and traceable? When the AI gets something wrong, can you pinpoint which memory entry caused it and edit it directly — without starting a new chat from scratch?
  • Can token cost be tracked per task, so that running agents in the background actually becomes economically viable?
  • Can tasks of different difficulty automatically be matched to different models, instead of burning the flagship model on trivial calls?
  • When you step away from the keyboard, can the work keep moving? Can the agent proactively discover what's worth doing, report progress, and land results as files on disk?

PilotDeck is an incremental exploration around exactly these questions. It uses the WorkSpace as the fundamental unit — completely isolating files, memory and skills per project — and pairs it with three pillar capabilities: White-box Memory, Smart Routing and Always-on. The entire system natively supports the Model Context Protocol (MCP) and behaves consistently across front-ends (Web / CLI / IM).

✨ Key Highlights

**WorkSpace-Level Isolation & Accretion** Every project gets its own file system, memory store and skill set. Parallel work no longer interferes with itself, retrieval has a bounded scope, and skills accrete naturally as each task grows — no more global context pollution.

WorkSpace isolation demo

**Traceable White-box Memory** Memory generation, extraction, storage and retrieval are visible end-to-end. When the AI mis-remembers, you can pinpoint and fix the offending entry. Built-in **Dream Mode** consolidates memory in idle windows, and supports one-click rollback.

White-box memory demo

**Smart Routing & Cost Optimization** Task difficulty is auto-detected; complex calls go to flagship models (e.g. Claude 3.5 Sonnet / GPT-4o), simple ones drop to lighter models. Through on-device / cloud co-orchestration and precise matching, token spend shrinks dramatically without sacrificing quality.

Smart routing demo

**Always-on Background Execution** PilotDeck breaks the "you ask, it answers" loop: after you sign off, the agent keeps discovering candidate tasks, running long-horizon monitors, and finally lands deliverables as local files with a summary report waiting for you.

Always-on execution demo

📊 Real-world Numbers

The three pillar capabilities have shown clear advantages in production-grade workflows:

1. Smart Routing — ~70% cost savings on social-media workloads

In Xiaohongshu-style social-media operations, enabling Smart Routing automatically demotes simple polishing / layout tasks to a sub-agent (e.g. Sonnet 4.5) and only invokes Opus 4.5 at planning checkpoints:

Setup Model configuration Cost Multiplier
Smart Routing ON Opus 4.5 (main) + Sonnet 4.5 (sub) $2.83 1.1×
Smart Routing OFF All Opus 4.5 (main + sub) $12.58 5.0×
Monolithic Single Opus 4.5 long-react (estimated) $12.20 4.8×

2. Smart Routing — 1/6 the cost while beating frontier models on hard tasks

The research team benchmarked 7 complex tasks (multilingual podcast push, multi-source data reports, domain-specific literature review, codebase architecture docs, etc.). The "strong main + light sub" routing setup matches or beats the frontier single-model setup at a fraction of the cost:

Setting Score Cost
MiniMax-M2.7 single-agent 37.1 $1.90
Claude Sonnet 4.6 single-agent 69.1 $18.36
Sonnet 4.6 (main) + MiniMax-M2.7 (sub) 70.6 $3.15

3. White-box Memory — layout & tone never bleed across projects

In black-box agents, mixing tasks in a shared context pool inevitably pollutes memory. PilotDeck's WorkSpace-scoped white-box memory addresses this end-to-end:

Dimension Current AI Agents (black-box) PilotDeck (white-box)
Visibility You can't see what the AI remembers, only what it outputs View every memory entry: what was stored, when, and which WorkSpace
Control Once written, memory can't be edited or removed Edit / delete entries, pin critical decisions so they don't drift
Traceability When it goes wrong, you can't find the root cause Generation → extraction → storage → retrieval, all auditable
Isolation One shared pool — projects bleed into each other Scoped per WorkSpace; A's memory never reaches B
Reversible After compression, the original is gone Dream-mode supports one-click rollback to the prior state

🖥️ UI & Demo

PilotDeck ships an out-of-the-box Web UI with full WorkSpace management, white-box memory editing, and visualization of multi-agent collaboration.

Use Cases

All demos below are generated entirely by edge-side models via PilotDeck's Smart Routing — no cloud-side frontier model required.

Work Document Generation

"Survey the Chinese LLM application market and turn it into a formal HTML white paper."

Process Result

Mini-Game Development

"Walk me through building an iOS AR mini-game Ball Finder in Vibe Coding mode."

Process Result

AI Engineering Platform Development

"Build a low-code embedding fine-tuning platform from scratch."

Process Result

Audio-Video Editing & Social Media Operations

"Push this English podcast to a global audience in Chinese / Japanese / French / Korean / Spanish / Arabic."

Process Result (with audio)
https://github.com/user-attachments/assets/a7245467-ee3c-4939-a055-c56576ac56

📦 Installation & Quick Start

We provide a one-line installer for macOS / Linux, plus a source-based workflow for developers.

Option A: One-line install (recommended, macOS / Linux)

curl -fsSL https://raw.githubusercontent.com/OpenBMB/PilotDeck/main/install.sh | bash

The script auto-installs Node.js 22, clones the repo, installs dependencies, and builds the frontend. Once it finishes:

pilotdeck            # starts the server at http://localhost:3001
pilotdeck status     # check runtime status

Option B: From source (for developers)

1. Clone and install dependencies

This repo uses Git LFS for large media assets. Make sure git lfs is installed before cloning. If you don't need the demo videos/GIFs, add GIT_LFS_SKIP_SMUDGE=1 before git clone to skip downloading them.

git clone https://github.com/OpenBMB/PilotDeck.git
cd PilotDeck

npm install              # root deps (Gateway runtime)
cd ui && npm install     # UI deps
cd ..

2. Configure a model provider

PilotDeck reads ~/.pilotdeck/pilotdeck.yaml. You can create it manually, let the bootstrap script generate one, or just open the Web UI and configure providers visually in the settings panel. Supported protocols include OpenAI, Anthropic, DeepSeek, Qwen, Kimi, MiniMax and other OpenAI-compatible endpoints.

schemaVersion: 1
agent:
  model: deepseek/deepseek-v4-pro
model:
  providers:
    deepseek:
      protocol: openai
      url: https://api.deepseek.com/v1
      apiKey: sk-your-api-key

3. Start the services

cd ui && npm run dev     # dev mode (HMR), visit http://localhost:5173
# or
cd ui && npm run start   # production mode, visit http://localhost:3001

Option C: Docker Compose

If Docker is installed, you can start PilotDeck with:

docker compose up -d

🛠️ Extension Protocol

PilotDeck has an open plugin architecture with a strict boundary between the open-source core and plugin customization. Extending the system is a plugin.json away:

  • MCP Servers — first-class integration with any Model Context Protocol server.
  • Tools & Skills — register custom tools, or pull community skills via ClawHub.
  • Lifecycle Hooks — intercept PreToolUse, UserPromptSubmit, and other critical lifecycle events.
  • Custom Memory — plug in your own memory store provider.

🤝 Contributing

Thanks to everyone who has contributed code, feedback, and ideas. New contributors are warmly welcome — let's build the next-gen agent OS together.

Workflow: Fork → feature branch → PR.


💬 Community

  • For bugs and feature requests, please open a GitHub Issue.
  • Join our community channels:
WeChat Community Feishu Community Discord Community
WeChat QR Feishu QR Discord QR

🙏 Acknowledgements

We thank Agent OS pioneers such as OpenClaw, Claude Code, Codex, Cursor, and Hermes for their explorations that helped shape this field.

PilotDeck builds upon the following outstanding open-source projects:


🏢 Joint Development

PilotDeck is jointly developed by Tsinghua University THUNLP, ModelBest, OpenBMB and AI9Stars.


⭐ Support Us

If PilotDeck has been helpful in your work or research, please consider giving us a Star on GitHub!


📝 Citation

@misc{pilotdeck2026,
  author       = {PilotDeck Team},
  title        = {PilotDeck: A WorkSpace-Centric Open-Source Agent Operating System},
  howpublished = {\url{https://github.com/OpenBMB/PilotDeck}},
  year         = {2026},
  note         = {Accessed: 2026-05-29}
}

📄 License

This project is licensed under the GNU Affero General Public License v3.0.