设计一个企业知识库 MCP Server
上传、搜索、检索、更新企业文档:基于角色的访问控制(RBAC):Markdown、PDF、Word、HTML、纯文本:文档摘要、翻译、相似文档推荐:操作日志、使用统计:与现有系统(Confluence、Git、SharePoint)集成。
·
目录
企业知识库 MCP Server 设计方案
一、需求分析与架构设计
核心需求
-
文档管理:上传、搜索、检索、更新企业文档
-
权限控制:基于角色的访问控制(RBAC)
-
多格式支持:Markdown、PDF、Word、HTML、纯文本
-
智能功能:文档摘要、翻译、相似文档推荐
-
审计与监控:操作日志、使用统计
-
集成能力:与现有系统(Confluence、Git、SharePoint)集成
系统架构
┌─────────────────────────────────────┐
│ MCP Client (Claude) │
└───────────────┬─────────────────────┘
│ SSE/HTTP
┌───────────────▼─────────────────────┐
│ 企业知识库 MCP Server │
├───────────────┬─────────────────────┤
│ API Gateway │ Auth Middleware │
│ Tool Router │ Rate Limiter │
│ Cache Layer │ Audit Logger │
└───────────────┴─────────────────────┘
│
┌───────────────┬─────────────────────┐
│ Search Engine │ Vector Database │
│ (Elasticsearch)│ (Pinecone/Qdrant) │
└───────────────┴─────────────────────┘
│
┌───────────────┬─────────────────────┐
│ Document Store│ External Systems │
│ (S3/MinIO) │ (Confluence/Git) │
└───────────────┴─────────────────────┘
二、工具(Tools)设计
1. 文档操作工具
// 完整工具集定义
{
"tools": [
{
"name": "knowledge_search",
"description": "搜索企业知识库文档",
"inputSchema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "搜索关键词"
},
"filters": {
"type": "object",
"description": "过滤条件",
"properties": {
"department": {
"type": "string",
"enum": ["engineering", "sales", "hr", "finance"]
},
"doc_type": {
"type": "string",
"enum": ["policy", "guide", "api", "meeting"]
},
"author": { "type": "string" },
"date_range": {
"type": "object",
"properties": {
"from": { "type": "string", "format": "date" },
"to": { "type": "string", "format": "date" }
}
},
"security_level": {
"type": "string",
"enum": ["public", "internal", "confidential"]
}
}
},
"limit": {
"type": "integer",
"minimum": 1,
"maximum": 50,
"default": 10
},
"page": {
"type": "integer",
"minimum": 1,
"default": 1
},
"sort_by": {
"type": "string",
"enum": ["relevance", "date_desc", "date_asc", "views"],
"default": "relevance"
}
},
"required": ["query"]
}
},
{
"name": "knowledge_upload",
"description": "上传文档到知识库",
"inputSchema": {
"type": "object",
"properties": {
"title": { "type": "string" },
"content": { "type": "string" },
"file_content": {
"type": "string",
"description": "Base64编码的文件内容"
},
"file_type": {
"type": "string",
"enum": ["text", "markdown", "pdf", "docx", "html"]
},
"metadata": {
"type": "object",
"properties": {
"department": { "type": "string" },
"tags": { "type": "array", "items": { "type": "string" } },
"security_level": { "type": "string" },
"expires_at": { "type": "string", "format": "date" }
}
}
},
"required": ["title"]
}
},
{
"name": "get_document",
"description": "获取特定文档内容",
"inputSchema": {
"type": "object",
"properties": {
"doc_id": { "type": "string" },
"include_metadata": { "type": "boolean", "default": true },
"include_embeddings": { "type": "boolean", "default": false }
},
"required": ["doc_id"]
}
},
{
"name": "update_document",
"description": "更新文档",
"inputSchema": {
"type": "object",
"properties": {
"doc_id": { "type": "string" },
"content": { "type": "string" },
"metadata": { "type": "object" },
"update_reason": { "type": "string" }
},
"required": ["doc_id"]
}
},
{
"name": "delete_document",
"description": "删除文档(需要确认)",
"inputSchema": {
"type": "object",
"properties": {
"doc_id": { "type": "string" },
"reason": { "type": "string" }
},
"required": ["doc_id", "reason"]
}
}
]
}
2. 智能处理工具
{
"tools": [
{
"name": "summarize_document",
"description": "生成文档摘要",
"inputSchema": {
"type": "object",
"properties": {
"doc_id": { "type": "string" },
"length": {
"type": "string",
"enum": ["short", "medium", "detailed"],
"default": "medium"
},
"language": { "type": "string", "default": "zh" }
},
"required": ["doc_id"]
}
},
{
"name": "translate_document",
"description": "翻译文档",
"inputSchema": {
"type": "object",
"properties": {
"doc_id": { "type": "string" },
"target_language": { "type": "string" },
"include_original": { "type": "boolean", "default": false }
},
"required": ["doc_id", "target_language"]
}
},
{
"name": "find_similar",
"description": "查找相似文档",
"inputSchema": {
"type": "object",
"properties": {
"doc_id": { "type": "string" },
"query": { "type": "string" },
"top_k": { "type": "integer", "default": 5 },
"similarity_threshold": {
"type": "number",
"minimum": 0,
"maximum": 1,
"default": 0.7
}
}
}
},
{
"name": "ask_question",
"description": "基于文档内容问答",
"inputSchema": {
"type": "object",
"properties": {
"question": { "type": "string" },
"doc_ids": {
"type": "array",
"items": { "type": "string" },
"description": "指定在哪些文档中搜索"
},
"scope": {
"type": "string",
"enum": ["all", "department", "personal"],
"default": "all"
},
"include_sources": { "type": "boolean", "default": true }
},
"required": ["question"]
}
}
]
}
3. 管理与集成工具
{
"tools": [
{
"name": "sync_external",
"description": "同步外部系统文档",
"inputSchema": {
"type": "object",
"properties": {
"source": {
"type": "string",
"enum": ["confluence", "github", "sharepoint", "notion"]
},
"config": { "type": "object" },
"full_sync": { "type": "boolean", "default": false }
},
"required": ["source"]
}
},
{
"name": "generate_report",
"description": "生成知识库使用报告",
"inputSchema": {
"type": "object",
"properties": {
"report_type": {
"type": "string",
"enum": ["usage", "coverage", "freshness", "popularity"]
},
"time_range": {
"type": "object",
"properties": {
"start": { "type": "string", "format": "date" },
"end": { "type": "string", "format": "date" }
}
},
"format": {
"type": "string",
"enum": ["markdown", "json", "html"],
"default": "markdown"
}
},
"required": ["report_type"]
}
},
{
"name": "manage_permissions",
"description": "管理文档权限",
"inputSchema": {
"type": "object",
"properties": {
"action": {
"type": "string",
"enum": ["grant", "revoke", "list"]
},
"doc_id": { "type": "string" },
"user_or_group": { "type": "string" },
"permission": {
"type": "string",
"enum": ["read", "write", "admin"]
}
},
"required": ["action"]
}
}
]
}
三、资源(Resources)设计
{
"resources": [
{
"uri": "knowledge://recent/{limit?}",
"name": "最近更新文档",
"description": "最近更新的知识库文档",
"mimeType": "application/json"
},
{
"uri": "knowledge://popular/{limit?}",
"name": "热门文档",
"description": "查看量最高的文档",
"mimeType": "application/json"
},
{
"uri": "knowledge://stats/overview",
"name": "知识库统计概览",
"description": "知识库使用统计信息",
"mimeType": "application/json"
},
{
"uri": "knowledge://category/{category}",
"name": "分类文档",
"description": "按分类浏览文档",
"mimeType": "application/json"
},
{
"uri": "knowledge://user/{user_id}/recent",
"name": "用户最近访问",
"description": "用户最近访问的文档",
"mimeType": "application/json"
}
]
}
四、完整实现示例(Python)
import asyncio
import json
import base64
from typing import Dict, Any, List, Optional
from datetime import datetime
from enum import Enum
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
from pydantic import BaseModel, Field
import aiohttp
from elasticsearch import AsyncElasticsearch
from qdrant_client import QdrantClient
import hashlib
import uuid
# ========== 数据模型 ==========
class DocumentMetadata(BaseModel):
department: str
doc_type: str = Field(default="document")
security_level: str = Field(default="internal")
tags: List[str] = Field(default_factory=list)
author: str
created_at: datetime = Field(default_factory=datetime.now)
updated_at: datetime = Field(default_factory=datetime.now)
expires_at: Optional[datetime] = None
views: int = 0
permissions: Dict[str, List[str]] = Field(default_factory=dict)
class Document(BaseModel):
id: str = Field(default_factory=lambda: str(uuid.uuid4()))
title: str
content: str
summary: Optional[str] = None
metadata: DocumentMetadata
embeddings: Optional[List[float]] = None
file_hash: str # 用于去重
class SearchFilters(BaseModel):
department: Optional[str] = None
doc_type: Optional[str] = None
security_level: Optional[str] = None
author: Optional[str] = None
tags: Optional[List[str]] = None
date_range: Optional[Dict[str, datetime]] = None
# ========== 权限控制 ==========
class PermissionManager:
ROLES = {
"admin": ["read", "write", "delete", "manage"],
"editor": ["read", "write"],
"viewer": ["read"],
"guest": ["read_public"]
}
def __init__(self):
self.user_roles: Dict[str, str] = {}
def check_permission(self, user: str, doc: Document, action: str) -> bool:
"""检查用户对文档的权限"""
role = self.user_roles.get(user, "guest")
# 检查角色权限
if action not in self.ROLES.get(role, []):
return False
# 检查文档级别的权限
if doc.metadata.security_level == "confidential":
return role in ["admin", "editor"]
# 检查自定义权限
if user in doc.metadata.permissions.get(action, []):
return True
return role in ["admin", "editor"]
# ========== 知识库 MCP Server ==========
class KnowledgeBaseMCPServer:
def __init__(self):
# 存储
self.documents: Dict[str, Document] = {}
self.permission_mgr = PermissionManager()
# 搜索和向量数据库
self.es = AsyncElasticsearch(["localhost:9200"])
self.qdrant = QdrantClient("localhost", port=6333)
# 缓存
self.cache = {}
# 审计日志
self.audit_log = []
# ========== 工具实现 ==========
async def handle_knowledge_search(self, params: Dict[str, Any]) -> Dict[str, Any]:
"""搜索文档实现"""
try:
query = params.get("query", "")
filters = params.get("filters", {})
limit = params.get("limit", 10)
page = params.get("page", 1)
# 构建 ES 查询
es_query = {
"query": {
"bool": {
"must": [{
"multi_match": {
"query": query,
"fields": ["title^3", "content", "summary^2"],
"type": "best_fields"
}
}],
"filter": []
}
},
"from": (page - 1) * limit,
"size": limit
}
# 添加过滤器
if filters:
filter_clauses = []
if department := filters.get("department"):
filter_clauses.append({"term": {"department": department}})
if doc_type := filters.get("doc_type"):
filter_clauses.append({"term": {"doc_type": doc_type}})
if security_level := filters.get("security_level"):
filter_clauses.append({"term": {"security_level": security_level}})
if date_range := filters.get("date_range"):
filter_clauses.append({
"range": {
"created_at": {
"gte": date_range.get("from"),
"lte": date_range.get("to")
}
}
})
if filter_clauses:
es_query["query"]["bool"]["filter"] = filter_clauses
# 执行搜索
response = await self.es.search(
index="knowledge_docs",
body=es_query
)
# 处理结果
results = []
for hit in response["hits"]["hits"]:
doc = hit["_source"]
results.append({
"id": doc["id"],
"title": doc["title"],
"summary": doc.get("summary", ""),
"score": hit["_score"],
"metadata": {
"department": doc["department"],
"doc_type": doc["doc_type"],
"created_at": doc["created_at"],
"author": doc["author"]
}
})
return {
"content": [{
"type": "text",
"text": f"找到 {response['hits']['total']['value']} 个结果:"
}, {
"type": "text",
"text": self._format_search_results(results)
}],
"metadata": {
"total": response['hits']['total']['value'],
"page": page,
"page_size": limit
}
}
except Exception as e:
return {
"content": [{
"type": "text",
"text": f"搜索失败: {str(e)}"
}],
"isError": True
}
async def handle_knowledge_upload(self, params: Dict[str, Any]) -> Dict[str, Any]:
"""上传文档实现"""
try:
title = params["title"]
content = params.get("content", "")
metadata = params.get("metadata", {})
# 检查权限
user = metadata.get("author", "unknown")
if not self.permission_mgr.check_permission(user, None, "write"):
return self._error_response("权限不足")
# 创建文档
doc = Document(
title=title,
content=content,
metadata=DocumentMetadata(
department=metadata.get("department", "general"),
security_level=metadata.get("security_level", "internal"),
tags=metadata.get("tags", []),
author=user
),
file_hash=self._calculate_hash(content)
)
# 生成摘要
doc.summary = await self._generate_summary(content)
# 生成嵌入向量
doc.embeddings = await self._generate_embeddings(content)
# 存储文档
self.documents[doc.id] = doc
# 索引到搜索引擎
await self._index_document(doc)
# 存储到向量数据库
await self._store_embeddings(doc)
# 审计日志
self._log_audit("upload", user, doc.id)
return {
"content": [{
"type": "text",
"text": f"文档上传成功!\nID: {doc.id}\n标题: {title}\n安全等级: {doc.metadata.security_level}"
}],
"suggestedToolCalls": [{
"toolName": "summarize_document",
"arguments": {"doc_id": doc.id}
}]
}
except Exception as e:
return self._error_response(f"上传失败: {str(e)}")
async def handle_ask_question(self, params: Dict[str, Any]) -> Dict[str, Any]:
"""文档问答实现"""
try:
question = params["question"]
scope = params.get("scope", "all")
include_sources = params.get("include_sources", True)
# 向量搜索相关文档
query_embedding = await self._generate_embeddings(question)
similar_docs = await self._vector_search(query_embedding, top_k=5)
if not similar_docs:
return {
"content": [{
"type": "text",
"text": "没有找到相关文档来回答这个问题。"
}]
}
# 构建上下文
context = "\n\n".join([
f"文档: {doc.title}\n内容: {doc.content[:1000]}"
for doc in similar_docs
])
# 调用 LLM 生成答案
answer = await self._call_llm_for_qa(question, context)
# 构建响应
response_content = [{
"type": "text",
"text": f"**问题**: {question}\n\n**答案**: {answer}"
}]
if include_sources:
sources_text = "\n".join([
f"- {doc.title} (相关度: {score:.2f})"
for doc, score in similar_docs
])
response_content.append({
"type": "text",
"text": f"\n**参考文档**:\n{sources_text}"
})
return {"content": response_content}
except Exception as e:
return self._error_response(f"问答失败: {str(e)}")
# ========== 资源处理 ==========
async def handle_resource_read(self, uri: str) -> Dict[str, Any]:
"""处理资源读取请求"""
if uri.startswith("knowledge://recent/"):
limit = int(uri.split("/")[-1]) if uri.split("/")[-1].isdigit() else 10
recent_docs = sorted(
self.documents.values(),
key=lambda x: x.metadata.updated_at,
reverse=True
)[:limit]
return {
"contents": [{
"uri": uri,
"mimeType": "application/json",
"text": json.dumps([
{
"id": doc.id,
"title": doc.title,
"updated_at": doc.metadata.updated_at.isoformat(),
"author": doc.metadata.author
}
for doc in recent_docs
], ensure_ascii=False, indent=2)
}]
}
elif uri.startswith("knowledge://stats/overview"):
stats = self._generate_statistics()
return {
"contents": [{
"uri": uri,
"mimeType": "application/json",
"text": json.dumps(stats, ensure_ascii=False, indent=2)
}]
}
return {"error": f"未知资源: {uri}"}
# ========== 辅助方法 ==========
def _format_search_results(self, results: List[Dict]) -> str:
"""格式化搜索结果"""
formatted = []
for i, result in enumerate(results, 1):
formatted.append(
f"{i}. **{result['title']}**\n"
f" 概要: {result['summary'][:100]}...\n"
f" 部门: {result['metadata']['department']} | "
f"类型: {result['metadata']['doc_type']} | "
f"作者: {result['metadata']['author']}\n"
)
return "\n".join(formatted)
def _generate_statistics(self) -> Dict:
"""生成统计信息"""
total_docs = len(self.documents)
departments = {}
doc_types = {}
for doc in self.documents.values():
dept = doc.metadata.department
doc_type = doc.metadata.doc_type
departments[dept] = departments.get(dept, 0) + 1
doc_types[doc_type] = doc_types.get(doc_type, 0) + 1
return {
"total_documents": total_docs,
"by_department": departments,
"by_type": doc_types,
"last_updated": max(
[doc.metadata.updated_at for doc in self.documents.values()],
default=datetime.now()
).isoformat()
}
async def _generate_summary(self, content: str, length: str = "medium") -> str:
"""生成文档摘要(简化版)"""
# 实际实现应该调用 LLM API
sentences = content.split('.')
if len(sentences) <= 3:
return content
if length == "short":
return '.'.join(sentences[:2]) + '.'
elif length == "detailed":
return '.'.join(sentences[:10]) + '.'
else: # medium
return '.'.join(sentences[:5]) + '.'
async def _generate_embeddings(self, text: str) -> List[float]:
"""生成文本向量(简化版)"""
# 实际实现应该调用嵌入模型 API
import numpy as np
# 使用简单的哈希作为模拟嵌入
hash_val = int(hashlib.md5(text.encode()).hexdigest()[:8], 16)
np.random.seed(hash_val)
return np.random.randn(384).tolist() # 384维向量
async def _vector_search(self, query_embedding: List[float], top_k: int = 5):
"""向量搜索"""
# 简化实现
results = []
for doc in self.documents.values():
if doc.embeddings:
# 计算余弦相似度
similarity = self._cosine_similarity(query_embedding, doc.embeddings)
if similarity > 0.7: # 阈值
results.append((doc, similarity))
# 按相似度排序
results.sort(key=lambda x: x[1], reverse=True)
return results[:top_k]
def _cosine_similarity(self, a: List[float], b: List[float]) -> float:
"""计算余弦相似度"""
import numpy as np
a_np = np.array(a)
b_np = np.array(b)
return np.dot(a_np, b_np) / (np.linalg.norm(a_np) * np.linalg.norm(b_np))
def _calculate_hash(self, content: str) -> str:
"""计算内容哈希值"""
return hashlib.md5(content.encode()).hexdigest()
def _log_audit(self, action: str, user: str, doc_id: str):
"""记录审计日志"""
self.audit_log.append({
"timestamp": datetime.now().isoformat(),
"action": action,
"user": user,
"doc_id": doc_id,
"ip": "127.0.0.1" # 实际应该从请求获取
})
def _error_response(self, message: str) -> Dict[str, Any]:
"""错误响应"""
return {
"content": [{"type": "text", "text": f"错误: {message}"}],
"isError": True
}
async def _call_llm_for_qa(self, question: str, context: str) -> str:
"""调用 LLM 进行问答(简化版)"""
# 实际实现应该调用 LLM API
prompt = f"""
基于以下文档内容回答问题:
文档内容:
{context}
问题:{question}
答案:
"""
# 这里应该调用实际的 LLM API
return "这是基于文档内容生成的答案。"
# ========== MCP Server 主程序 ==========
async def main():
server = KnowledgeBaseMCPServer()
# 这里应该实现 MCP 协议的具体通信
# 包括 initialize, tools/list, tools/call, resources/list, resources/read 等
print("知识库 MCP Server 已启动")
print("可用工具:")
print("- knowledge_search: 搜索文档")
print("- knowledge_upload: 上传文档")
print("- get_document: 获取文档")
print("- ask_question: 文档问答")
print("- summarize_document: 文档摘要")
print("- find_similar: 查找相似文档")
if __name__ == "__main__":
asyncio.run(main())
五、配置文件示例
# config.yaml
server:
name: "enterprise-knowledge-base"
version: "1.0.0"
host: "0.0.0.0"
port: 8000
auth_required: true
database:
elasticsearch:
hosts: ["localhost:9200"]
index: "knowledge_docs"
qdrant:
host: "localhost"
port: 6333
collection: "doc_embeddings"
redis:
host: "localhost"
port: 6379
db: 0
embeddings:
model: "text-embedding-ada-002"
api_key: "${OPENAI_API_KEY}"
dimensions: 1536
llm:
model: "gpt-4-turbo"
api_key: "${OPENAI_API_KEY}"
temperature: 0.1
security:
jwt_secret: "${JWT_SECRET}"
token_expiry: 86400
rate_limit:
requests_per_minute: 60
allowed_origins:
- "https://claude.ai"
- "http://localhost:*"
storage:
document_store: "s3"
s3:
endpoint: "s3.amazonaws.com"
bucket: "knowledge-docs"
local_backup: "/var/backups/knowledge"
integrations:
confluence:
enabled: true
base_url: "https://your-company.atlassian.net/wiki"
github:
enabled: false
sharepoint:
enabled: false
logging:
level: "INFO"
file: "/var/log/knowledge-mcp.log"
audit_log: "/var/log/knowledge-audit.log"
六、部署与使用
1. 启动服务器
# 安装依赖
pip install mcp elasticsearch qdrant-client aiohttp pydantic
# 启动
python knowledge_mcp_server.py
2. Claude Desktop 配置
// ~/Library/Application Support/Claude/claude_desktop_config.json
{
"mcpServers": {
"knowledge-base": {
"command": "python",
"args": [
"/path/to/knowledge_mcp_server.py"
],
"env": {
"OPENAI_API_KEY": "sk-...",
"ELASTICSEARCH_HOSTS": "localhost:9200"
}
}
}
}
3. 使用示例
# 客户端调用示例
async with ClientSession(stdio_client(StdioServerParameters(
command="python",
args=["knowledge_mcp_server.py"]
))) as session:
# 初始化
await session.initialize()
# 列出工具
tools = await session.list_tools()
print("可用工具:", [t.name for t in tools.tools])
# 搜索文档
response = await session.call_tool(
"knowledge_search",
arguments={
"query": "Q4 销售报告",
"filters": {
"department": "sales",
"security_level": "internal"
}
}
)
# 文档问答
response = await session.call_tool(
"ask_question",
arguments={
"question": "公司今年的销售目标是多少?",
"scope": "sales"
}
)
七、扩展功能设计
1. 实时协作
-
文档协同编辑
-
评论和批注系统
-
变更历史追踪
2. 高级搜索
-
语义搜索增强
-
混合搜索(关键词+向量)
-
自然语言查询转换
3. 工作流集成
-
文档审批流程
-
自动化文档分类
-
过期文档清理
4. 分析与洞察
-
知识图谱构建
-
文档关联分析
-
知识缺口识别
5. 移动端支持
-
响应式设计
-
离线访问
-
移动端优化界面
八、安全与合规
-
数据加密
-
传输层:TLS 1.3
-
存储加密:AES-256
-
密钥管理:HSM/KMS
-
-
访问控制
-
基于角色的访问控制(RBAC)
-
属性基础的访问控制(ABAC)
-
多因素认证
-
-
合规性
-
GDPR 数据主体权利
-
SOX 文档保留策略
-
HIPAA 医疗文档保护
-
审计日志保留 7 年
-
-
监控与告警
-
异常访问检测
-
数据泄露防护
-
实时告警系统
-
这个设计提供了一个完整的企业级知识库 MCP Server 解决方案,可以根据实际需求进行调整和扩展。
火山引擎开发者社区是火山引擎打造的AI技术生态平台,聚焦Agent与大模型开发,提供豆包系列模型(图像/视频/视觉)、智能分析与会话工具,并配套评测集、动手实验室及行业案例库。社区通过技术沙龙、挑战赛等活动促进开发者成长,新用户可领50万Tokens权益,助力构建智能应用。
更多推荐
所有评论(0)