Skill v1.0.0
currentAutomated scan100/100version: "1.0.0" name: nutrient-document-processing description: 使用Nutrient DWS API处理、转换、OCR识别、提取、编辑、签名和填写文档。支持PDF、DOCX、XLSX、PPTX、HTML和图像格式。 origin: ECC
文档处理
使用 Nutrient DWS Processor API 处理文档。转换格式、提取文本和表格、对扫描文档进行 OCR、编辑 PII、添加水印、数字签名以及填写 PDF 表单。
设置
在 [nutrient.io](https://dashboard.nutrient.io/sign_up/?product=processor) 获取一个免费的 API 密钥
export NUTRIENT_API_KEY="pdf_live_..."
所有请求都以 multipart POST 形式发送到 https://api.nutrient.io/build,并附带一个 instructions JSON 字段。
操作
转换文档
# DOCX to PDFcurl -X POST https://api.nutrient.io/build \-H "Authorization: Bearer $NUTRIENT_API_KEY" \-F "document.docx=@document.docx" \-F 'instructions={"parts":[{"file":"document.docx"}]}' \-o output.pdf# PDF to DOCXcurl -X POST https://api.nutrient.io/build \-H "Authorization: Bearer $NUTRIENT_API_KEY" \-F "document.pdf=@document.pdf" \-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"docx"}}' \-o output.docx# HTML to PDFcurl -X POST https://api.nutrient.io/build \-H "Authorization: Bearer $NUTRIENT_API_KEY" \-F "index.html=@index.html" \-F 'instructions={"parts":[{"html":"index.html"}]}' \-o output.pdf
支持的输入格式:PDF, DOCX, XLSX, PPTX, DOC, XLS, PPT, PPS, PPSX, ODT, RTF, HTML, JPG, PNG, TIFF, HEIC, GIF, WebP, SVG, TGA, EPS。
提取文本和数据
# Extract plain textcurl -X POST https://api.nutrient.io/build \-H "Authorization: Bearer $NUTRIENT_API_KEY" \-F "document.pdf=@document.pdf" \-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"text"}}' \-o output.txt# Extract tables as Excelcurl -X POST https://api.nutrient.io/build \-H "Authorization: Bearer $NUTRIENT_API_KEY" \-F "document.pdf=@document.pdf" \-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"xlsx"}}' \-o tables.xlsx
OCR 扫描文档
# OCR to searchable PDF (supports 100+ languages)curl -X POST https://api.nutrient.io/build \-H "Authorization: Bearer $NUTRIENT_API_KEY" \-F "scanned.pdf=@scanned.pdf" \-F 'instructions={"parts":[{"file":"scanned.pdf"}],"actions":[{"type":"ocr","language":"english"}]}' \-o searchable.pdf
支持语言:通过 ISO 639-2 代码支持 100 多种语言(例如,eng, deu, fra, spa, jpn, kor, chi_sim, chi_tra, ara, hin, rus)。完整的语言名称如 english 或 german 也适用。查看 完整的 OCR 语言表 以获取所有支持的代码。
编辑敏感信息
# Pattern-based (SSN, email)curl -X POST https://api.nutrient.io/build \-H "Authorization: Bearer $NUTRIENT_API_KEY" \-F "document.pdf=@document.pdf" \-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"preset","strategyOptions":{"preset":"social-security-number"}},{"type":"redaction","strategy":"preset","strategyOptions":{"preset":"email-address"}}]}' \-o redacted.pdf# Regex-basedcurl -X POST https://api.nutrient.io/build \-H "Authorization: Bearer $NUTRIENT_API_KEY" \-F "document.pdf=@document.pdf" \-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"regex","strategyOptions":{"regex":"\\b[A-Z]{2}\\d{6}\\b"}}]}' \-o redacted.pdf
预设:social-security-number, email-address, credit-card-number, international-phone-number, north-american-phone-number, date, time, url, ipv4, ipv6, mac-address, us-zip-code, vin。
添加水印
curl -X POST https://api.nutrient.io/build \-H "Authorization: Bearer $NUTRIENT_API_KEY" \-F "document.pdf=@document.pdf" \-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"watermark","text":"CONFIDENTIAL","fontSize":72,"opacity":0.3,"rotation":-45}]}' \-o watermarked.pdf
数字签名
# Self-signed CMS signaturecurl -X POST https://api.nutrient.io/build \-H "Authorization: Bearer $NUTRIENT_API_KEY" \-F "document.pdf=@document.pdf" \-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"sign","signatureType":"cms"}]}' \-o signed.pdf
填写 PDF 表单
curl -X POST https://api.nutrient.io/build \-H "Authorization: Bearer $NUTRIENT_API_KEY" \-F "form.pdf=@form.pdf" \-F 'instructions={"parts":[{"file":"form.pdf"}],"actions":[{"type":"fillForm","formFields":{"name":"Jane Smith","email":"jane@example.com","date":"2026-02-06"}}]}' \-o filled.pdf
MCP 服务器(替代方案)
对于原生工具集成,请使用 MCP 服务器代替 curl:
{"mcpServers": {"nutrient-dws": {"command": "npx","args": ["-y", "@nutrient-sdk/dws-mcp-server"],"env": {"NUTRIENT_DWS_API_KEY": "YOUR_API_KEY","SANDBOX_PATH": "/path/to/working/directory"}}}}
使用场景
- 在格式之间转换文档(PDF, DOCX, XLSX, PPTX, HTML, 图像)
- 从 PDF 中提取文本、表格或键值对
- 对扫描文档或图像进行 OCR
- 在共享文档前编辑 PII
- 为草稿或机密文档添加水印
- 数字签署合同或协议
- 以编程方式填写 PDF 表单