Redact Documents Before Feeding to AI 将文档送入 AI 前先脱敏
Strip personal data locally, get AI analysis with aliases, then map answers back to real names 在本地去除个人数据,使用别名获取 AI 分析,然后将回答映射回真实姓名
The Problem: AI Sees Everything You Paste
问题:AI 能看到你粘贴的一切
When you paste a contract into ChatGPT...
当你把合同粘贴到 ChatGPT 时...
When you paste a contract, spreadsheet or case note into an AI assistant, the text is processed on that provider's infrastructure. Retention, training and admin controls vary by plan and settings, but the core fact is simple: once the original values leave your device, you are depending on someone else's policy.
当你把合同、表格或案件笔记粘贴到 AI 助手里,文本会在该服务商的基础设施上处理。留存、训练和管理员控制取决于套餐与设置,但核心事实很简单:原始敏感值一旦离开你的设备,你就开始依赖别人的政策。
The Solution: Redact → AI → Reverse-Map
解决方案:脱敏 → AI → 反向映射
DocMask's approach: the AI never sees real names
DocMask 的方法:AI 永远看不到真实姓名
Replace identifiers such as emails, phones, account numbers or a client name you selected / supplied as a keyword with aliases before pasting. The AI analyzes the document using aliases. When it responds "Person_A should sign by Friday", you reverse-map locally: "John Smith should sign by Friday". The AI produced a useful answer without seeing that real value.
粘贴前,先把邮箱、电话、账号,或你选中/作为关键词提供的客户姓名替换成别名。AI 使用别名分析文档。当它回答 "人物A 应在周五签字" 时,你在本地反向映射:"张三应在周五签字"。AI 在未接触该真实值的情况下给出了有用回答。
Why Consistent Aliases Matter
为什么一致的别名很重要
Simple find-and-replace breaks when the same identifier appears in different contexts. DocMask uses consistent pseudonymization: once "John Smith" is selected or supplied as a keyword, it stays "Person_A" throughout the document — in headers, body text, footers, and across multiple pages. The AI understands that Person_A in paragraph 1 is the same entity as Person_A in paragraph 47.
简单的查找替换在同一标识符出现在不同上下文时会出错。DocMask 使用一致的假名化:一旦 "张三" 被选中或作为关键词提供,它在整个文档中始终是 "人物A" —— 包括标题、正文、页脚和跨页面。AI 理解第 1 段的人物A 与第 47 段的人物A 是同一实体。
- Names or custom entities → Person_A, Person_B, Person_C... when detected or supplied as keywords
- 姓名或自定义实体 → 人物A、人物B、人物C...(在被检测到或作为关键词提供时)
- Emails → Email_A, Email_B...
- 邮箱 → 邮箱A、邮箱B...
- Phone numbers → Phone_A, Phone_B...
- 电话号码 → 电话A、电话B...
- Locations and custom keywords → Loc_A / Item_A, or your own fixed alias
- 地点和自定义关键词 → 地点A / 词条A,或你指定的固定别名
The mapping is encrypted with AES-256-GCM and stored only on your device. No cloud, no sync, no data broker.
映射使用 AES-256-GCM 加密,仅存储在您的设备上。无云端、无同步、无数据中介。
Who Needs AI Document Redaction?
谁需要 AI 文档脱敏?
- Lawyers reviewing contracts with AI assistance — client names and case details must stay confidential.
- 律师使用 AI 辅助审查合同 — 客户姓名和案件详情必须保密。
- HR professionals analyzing employee records — GDPR/HIPAA requires PII protection.
- HR 专业人员分析员工记录 — GDPR/HIPAA 要求保护个人信息。
- Financial analysts processing client reports — regulatory compliance demands data minimization.
- 金融分析师处理客户报告 — 监管合规要求数据最小化。
- Researchers working with survey data or medical records — IRB protocols require de-identification.
- 研究人员处理调查数据或医疗记录 — IRB 协议要求去标识化。
- Anyone who pastes work documents into ChatGPT and wonders "should I be doing this?"
- 任何人把工作文档粘贴到 ChatGPT 时想过 "我应该这样做吗?"
Frequently Asked Questions
常见问题
Is it safe to paste documents into ChatGPT? 将文档粘贴到 ChatGPT 安全吗?
It depends on your product plan, workspace policy and settings. Even when training is disabled, the document is still processed outside your device. If the document contains personal, legal, health or financial data, redact sensitive values before pasting.
这取决于你的产品套餐、工作区策略和设置。即使关闭训练,文档仍会在你的设备之外被处理。如果文档包含个人、法律、健康或财务数据,应先脱敏再粘贴。
How do I protect personal data when using Claude? 使用 Claude 时如何保护个人数据?
Anthropic's Claude processes your input on their servers. While Claude doesn't use conversations for training by default, the data still leaves your device. Use DocMask to replace real names with aliases before pasting — Claude analyzes the redacted version, then you restore real names locally.
Anthropic 的 Claude 在其服务器上处理您的输入。虽然 Claude 默认不使用对话进行训练,但数据仍会离开您的设备。使用 DocMask 在粘贴前将真实姓名替换为别名 — Claude 分析脱敏版本,然后您在本地还原真实姓名。
What is the difference between redaction and anonymization? 脱敏和匿名化有什么区别?
Redaction removes or masks sensitive information (traditional: permanent black bars). Anonymization transforms data so individuals can't be re-identified. DocMask does reversible pseudonymization — replacing real values with consistent aliases that can be mapped back locally, which is the ideal approach for AI workflows.
脱敏移除或遮盖敏感信息(传统方式:永久黑条)。匿名化转换数据使个人无法被重新识别。DocMask 执行可逆假名化 — 将真实值替换为可在本地映射回的一致别名,这是 AI 工作流程的理想方法。
Can AI still give useful answers with redacted documents? AI 还能对脱敏文档给出有用的回答吗?
Yes. DocMask uses consistent aliases (Person_A stays Person_A throughout the document), so the AI understands relationships and context. Structural analysis, summarization, and question-answering all work normally. You then reverse-map aliases to real names in the AI's response.
可以。DocMask 使用一致的别名(例如 人物A 在整个文档中保持不变),因此 AI 理解关系和上下文。结构分析、摘要和问答都正常工作。然后您在 AI 的回答中将别名反向映射为真实值。