331 lines
9.6 KiB
Markdown
331 lines
9.6 KiB
Markdown
|
|
# Word文档处理工具开发指导手册
|
|||
|
|
|
|||
|
|
> **项目**: CN_Gather 报告生成工具
|
|||
|
|
> **作者**: hongawen
|
|||
|
|
> **版本**: 2.1 (纯docx4j统一方案)
|
|||
|
|
> **日期**: 2025年9月5日
|
|||
|
|
|
|||
|
|
## 📋 核心决策
|
|||
|
|
|
|||
|
|
**技术选型原则:docx4j 唯一方案**
|
|||
|
|
|
|||
|
|
基于开发团队的技术洁癖和实际需求分析,CN_Gather项目的report-generator模块采用**纯docx4j**解决方案,完全移除Apache POI依赖。
|
|||
|
|
|
|||
|
|
### 🎯 为什么选择纯docx4j?
|
|||
|
|
|
|||
|
|
1. **技术栈统一**: 一个库解决所有Word文档需求,避免技术栈混乱
|
|||
|
|
2. **依赖简化**: 从8个依赖减至3个核心依赖
|
|||
|
|
3. **性能更优**: docx4j专为Office Open XML优化,处理速度更快
|
|||
|
|
4. **功能完整**: docx4j完全可以替代Apache POI的所有功能
|
|||
|
|
5. **维护简单**: 只需要掌握一套API,降低学习成本
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🔧 技术栈配置
|
|||
|
|
|
|||
|
|
### Maven依赖 (已清理)
|
|||
|
|
|
|||
|
|
```xml
|
|||
|
|
<!-- docx4j - 统一的Word文档处理解决方案 -->
|
|||
|
|
<dependency>
|
|||
|
|
<groupId>jakarta.xml.bind</groupId>
|
|||
|
|
<artifactId>jakarta.xml.bind-api</artifactId>
|
|||
|
|
<version>2.3.3</version>
|
|||
|
|
</dependency>
|
|||
|
|
|
|||
|
|
<dependency>
|
|||
|
|
<groupId>org.glassfish.jaxb</groupId>
|
|||
|
|
<artifactId>jaxb-runtime</artifactId>
|
|||
|
|
<version>2.3.3</version>
|
|||
|
|
</dependency>
|
|||
|
|
|
|||
|
|
<dependency>
|
|||
|
|
<groupId>org.docx4j</groupId>
|
|||
|
|
<artifactId>docx4j</artifactId>
|
|||
|
|
<version>6.1.0</version>
|
|||
|
|
</dependency>
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**注意**: 已完全移除Apache POI所有依赖 (poi, poi-ooxml, poi-ooxml-schemas, poi-scratchpad)
|
|||
|
|
|
|||
|
|
### 版本兼容性
|
|||
|
|
|
|||
|
|
- **JDK版本**: 1.8 (项目标准)
|
|||
|
|
- **docx4j版本**: 6.1.0 (JDK 8最佳兼容版本)
|
|||
|
|
- **Spring Boot**: 2.3.12.RELEASE (项目统一版本)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🛠️ 已实现的核心功能
|
|||
|
|
|
|||
|
|
### 1. 占位符替换系统
|
|||
|
|
|
|||
|
|
#### PlaceholderUtil.java (核心工具类)
|
|||
|
|
```java
|
|||
|
|
// 批量替换占位符 - 主要入口
|
|||
|
|
public static void replaceAllPlaceholders(MainDocumentPart mainDocumentPart, Map<String, String> placeholderMap)
|
|||
|
|
|
|||
|
|
// 预处理占位符格式
|
|||
|
|
public static Map<String, String> preprocessPlaceholderMap(Map<String, String> originalMap)
|
|||
|
|
|
|||
|
|
// 格式化占位符名称 (去掉${})
|
|||
|
|
public static String formatPlaceholder(String placeholder)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**核心特性**:
|
|||
|
|
- ✅ 处理docx4j的静默失败问题 (关键技术突破)
|
|||
|
|
- ✅ 支持批量替换和单个替换
|
|||
|
|
- ✅ 自动格式预处理 (${placeholder} → placeholder)
|
|||
|
|
- ✅ 验证替换成功性
|
|||
|
|
|
|||
|
|
### 2. 文档分析系统
|
|||
|
|
|
|||
|
|
#### WordDocumentUtil.java (分析工具类)
|
|||
|
|
```java
|
|||
|
|
// 提取文档中的所有占位符
|
|||
|
|
public static Set<String> extractPlaceholders(InputStream templateInputStream)
|
|||
|
|
|
|||
|
|
// 提取完整格式的占位符 (带${})
|
|||
|
|
public static Set<String> extractPlaceholdersWithFormat(InputStream templateInputStream)
|
|||
|
|
|
|||
|
|
// 验证占位符存在性
|
|||
|
|
public static boolean containsPlaceholder(InputStream templateInputStream, String placeholder)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 3. 服务层实现
|
|||
|
|
|
|||
|
|
#### IWordReportService.java + WordReportServiceImpl.java
|
|||
|
|
```java
|
|||
|
|
// 核心服务接口
|
|||
|
|
public interface IWordReportService {
|
|||
|
|
InputStream replacePlaceholders(InputStream templateInputStream, Map<String, String> placeholderMap) throws Exception;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// 实现类 - 使用PlaceholderUtil
|
|||
|
|
@Service
|
|||
|
|
public class WordReportServiceImpl implements IWordReportService {
|
|||
|
|
@Override
|
|||
|
|
public InputStream replacePlaceholders(InputStream templateInputStream, Map<String, String> placeholderMap) throws Exception {
|
|||
|
|
WordprocessingMLPackage wordPackage = WordprocessingMLPackage.load(templateInputStream);
|
|||
|
|
MainDocumentPart mainDocumentPart = wordPackage.getMainDocumentPart();
|
|||
|
|
|
|||
|
|
PlaceholderUtil.replaceAllPlaceholders(mainDocumentPart, placeholderMap);
|
|||
|
|
|
|||
|
|
try (ByteArrayOutputStream outputStream = new ByteArrayOutputStream()) {
|
|||
|
|
wordPackage.save(outputStream);
|
|||
|
|
return new ByteArrayInputStream(outputStream.toByteArray());
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🚀 docx4j完整能力规划
|
|||
|
|
|
|||
|
|
基于docx4j的XML直接操作能力,以下功能完全可以实现:
|
|||
|
|
|
|||
|
|
### 待开发的工具类
|
|||
|
|
|
|||
|
|
#### 1. DocxMergeUtil.java - 文档合并
|
|||
|
|
```java
|
|||
|
|
/**
|
|||
|
|
* 替代Apache POI的WordUtil.appendDocument功能
|
|||
|
|
* 使用docx4j的XmlUtils.deepCopy实现完整格式保持
|
|||
|
|
*/
|
|||
|
|
public static void mergeDocuments(WordprocessingMLPackage target, List<WordprocessingMLPackage> sources)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 2. DocxTableUtil.java - 动态表格
|
|||
|
|
```java
|
|||
|
|
/**
|
|||
|
|
* 使用ObjectFactory创建表格
|
|||
|
|
* 比Apache POI更精确的表格控制
|
|||
|
|
*/
|
|||
|
|
public static Tbl createDynamicTable(List<String> headers, List<List<String>> rows)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 3. DocxImageUtil.java - 图片处理
|
|||
|
|
```java
|
|||
|
|
/**
|
|||
|
|
* 使用BinaryPartAbstractImage处理图片
|
|||
|
|
* 精确控制图片尺寸和位置
|
|||
|
|
*/
|
|||
|
|
public static void insertImage(MainDocumentPart mainPart, byte[] imageBytes, String fileName, int widthEmu, int heightEmu)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 4. DocxStyleUtil.java - 样式控制
|
|||
|
|
```java
|
|||
|
|
/**
|
|||
|
|
* 直接操作XML样式元素
|
|||
|
|
* 比Apache POI更底层更精确的样式控制
|
|||
|
|
*/
|
|||
|
|
public static void setParagraphStyle(P paragraph, String fontFamily, int fontSize, boolean bold, String alignment)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📖 开发最佳实践
|
|||
|
|
|
|||
|
|
### 1. JDK 8兼容性要求
|
|||
|
|
|
|||
|
|
```java
|
|||
|
|
// ✅ 正确 - JDK 8兼容写法
|
|||
|
|
Map<String, String> data = new HashMap<>();
|
|||
|
|
data.put("companyName", "灿能公司");
|
|||
|
|
data.put("reportDate", "2025-09-05");
|
|||
|
|
|
|||
|
|
// ❌ 错误 - JDK 9+语法
|
|||
|
|
Map<String, String> data = Map.of("companyName", "灿能公司"); // 不兼容JDK 8
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2. docx4j静默失败处理
|
|||
|
|
|
|||
|
|
```java
|
|||
|
|
// ✅ 使用PlaceholderUtil (已处理静默失败)
|
|||
|
|
PlaceholderUtil.replaceAllPlaceholders(mainDocumentPart, placeholderMap);
|
|||
|
|
|
|||
|
|
// ❌ 直接使用docx4j (可能静默失败)
|
|||
|
|
mainDocumentPart.variableReplace(placeholderMap); // 替换失败不报错
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 3. 占位符格式规范
|
|||
|
|
|
|||
|
|
```java
|
|||
|
|
// ✅ 正确 - Map的key是纯变量名
|
|||
|
|
data.put("companyName", "灿能公司"); // Word文档中: ${companyName}
|
|||
|
|
|
|||
|
|
// ❌ 错误 - Map的key包含格式符号
|
|||
|
|
data.put("${companyName}", "灿能公司"); // docx4j不认识这种格式
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 4. 资源管理模式
|
|||
|
|
|
|||
|
|
```java
|
|||
|
|
// ✅ 推荐 - 使用try-with-resources
|
|||
|
|
try (ByteArrayOutputStream outputStream = new ByteArrayOutputStream()) {
|
|||
|
|
WordprocessingMLPackage wordPackage = WordprocessingMLPackage.load(templateInputStream);
|
|||
|
|
// 处理逻辑
|
|||
|
|
wordPackage.save(outputStream);
|
|||
|
|
return new ByteArrayInputStream(outputStream.toByteArray());
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## ⚡ 核心技术突破
|
|||
|
|
|
|||
|
|
### docx4j静默失败问题的解决
|
|||
|
|
|
|||
|
|
这是本项目的关键技术突破。docx4j的`variableReplace()`方法在替换失败时不抛异常,导致占位符仍然存在但开发者不知情。
|
|||
|
|
|
|||
|
|
**解决方案** (已在PlaceholderUtil中实现):
|
|||
|
|
|
|||
|
|
1. **批量替换后验证**: 检查文档中是否还残留占位符
|
|||
|
|
2. **降级策略**: 批量失败时自动切换到逐个替换
|
|||
|
|
3. **多格式尝试**: 尝试`${placeholder}`、`{{placeholder}}`等多种格式
|
|||
|
|
4. **详细日志**: 记录替换过程,便于调试
|
|||
|
|
|
|||
|
|
```java
|
|||
|
|
// 核心验证逻辑
|
|||
|
|
mainDocumentPart.variableReplace(processedMap);
|
|||
|
|
|
|||
|
|
// 验证是否真正成功
|
|||
|
|
int remainingPlaceholders = 0;
|
|||
|
|
for (String placeholder : processedMap.keySet()) {
|
|||
|
|
String checkFormat = "${" + placeholder + "}";
|
|||
|
|
if (containsPlaceholder(mainDocumentPart, checkFormat)) {
|
|||
|
|
remainingPlaceholders++;
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
if (remainingPlaceholders > 0) {
|
|||
|
|
log.warn("批量替换后仍有 {} 个占位符未被替换,降级为逐个处理", remainingPlaceholders);
|
|||
|
|
// 执行降级策略
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎯 使用指南
|
|||
|
|
|
|||
|
|
### 快速上手 - 标准报告生成
|
|||
|
|
|
|||
|
|
```java
|
|||
|
|
@Service
|
|||
|
|
public class ReportGenerator {
|
|||
|
|
|
|||
|
|
@Autowired
|
|||
|
|
private IWordReportService wordReportService;
|
|||
|
|
|
|||
|
|
public InputStream generateReport(TestRecord record) throws Exception {
|
|||
|
|
// 1. 加载模板
|
|||
|
|
InputStream template = loadTemplate("report-template.docx");
|
|||
|
|
|
|||
|
|
// 2. 准备数据
|
|||
|
|
Map<String, String> data = new HashMap<>();
|
|||
|
|
data.put("companyName", "灿能公司");
|
|||
|
|
data.put("deviceModel", record.getDeviceModel());
|
|||
|
|
data.put("testResult", record.getResult());
|
|||
|
|
data.put("reportDate", formatDate(new Date()));
|
|||
|
|
|
|||
|
|
// 3. 生成报告 (3行代码完成)
|
|||
|
|
return wordReportService.replacePlaceholders(template, data);
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 模板验证
|
|||
|
|
|
|||
|
|
```java
|
|||
|
|
// 分析模板中的占位符
|
|||
|
|
Set<String> placeholders = WordDocumentUtil.extractPlaceholders(templateStream);
|
|||
|
|
System.out.println("模板需要的数据字段: " + placeholders);
|
|||
|
|
|
|||
|
|
// 验证特定字段
|
|||
|
|
boolean hasCompanyName = WordDocumentUtil.containsPlaceholder(templateStream, "companyName");
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🔮 发展路线
|
|||
|
|
|
|||
|
|
### 短期目标 (当前版本)
|
|||
|
|
- ✅ 占位符替换系统 (已完成)
|
|||
|
|
- ✅ 文档分析工具 (已完成)
|
|||
|
|
- ✅ 服务层架构 (已完成)
|
|||
|
|
|
|||
|
|
### 中期目标 (按需开发)
|
|||
|
|
- 📋 DocxMergeUtil - 文档合并功能
|
|||
|
|
- 📋 DocxTableUtil - 动态表格生成
|
|||
|
|
- 📋 DocxImageUtil - 图片插入处理
|
|||
|
|
|
|||
|
|
### 长期目标 (扩展功能)
|
|||
|
|
- 📋 DocxStyleUtil - 样式精确控制
|
|||
|
|
- 📋 模板管理系统
|
|||
|
|
- 📋 Word转PDF功能
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📞 技术支持
|
|||
|
|
|
|||
|
|
### 开发参考
|
|||
|
|
- **docx4j官方文档**: https://www.docx4java.org/
|
|||
|
|
- **已实现工具类**: `com.njcn.gather.tools.report.util.*`
|
|||
|
|
- **服务接口**: `com.njcn.gather.tools.report.service.*`
|
|||
|
|
|
|||
|
|
### 常见问题
|
|||
|
|
1. **占位符不替换**: 检查Map的key是否包含`${}`符号 (应该去掉)
|
|||
|
|
2. **JDK 8兼容性**: 避免使用`Map.of()`等JDK 9+语法
|
|||
|
|
3. **性能优化**: 大批量处理时使用模板克隆而不是重复加载
|
|||
|
|
|
|||
|
|
### 维护原则
|
|||
|
|
- **统一技术栈**: 坚持纯docx4j方案,不引入Apache POI
|
|||
|
|
- **向后兼容**: 新功能不破坏现有API
|
|||
|
|
- **性能优先**: 利用docx4j的XML直接操作优势
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**文档结束**
|
|||
|
|
|
|||
|
|
> 💡 **核心理念**: 通过纯docx4j方案实现技术栈统一,满足开发团队的技术洁癖,同时提供更优的性能和更精确的控制能力。
|