融合审查逻辑与知识库驱动的专利层级化自动分类研究——以人工智能领域为例

doi:10.13998/j.cnki.issn1002-1248.25-0763

农业图书情报学报 ›› 2026, Vol. 38 ›› Issue (6): 28-42.doi: 10.13998/j.cnki.issn1002-1248.25-0763

融合审查逻辑与知识库驱动的专利层级化自动分类研究——以人工智能领域为例

席崇俊¹^,², 赵亚娟¹^,²(), 吕璐成¹^,², 苏莹¹^,²

^1. 中国科学院文献情报中心，北京 100190
^2. 中国科学院大学经济与管理学院信息资源管理系，北京 100190

收稿日期:2025-12-29 出版日期:2026-06-05 发布日期:2026-06-17
通讯作者: 赵亚娟
作者简介:
席崇俊（1996- ），男，博士研究生，研究方向为知识产权情报研究
吕璐成（1989- ），男，博士，副研究员，硕士生导师，研究方向为知识产权情报研究
苏莹（1992- ），女，博士，助理研究员，研究方向为知识产权情报研究
基金资助:
国家自然科学基金青年科学基金项目“技术距离视角下的技术融合模式、特征及预测研究”(72304268)

Hierarchical Automatic Patent Classification Driven by the Integration of Examination Logic and Knowledge Bases: A Case Study in the Artificial Intelligence Domain

XI Chongjun¹^,², ZHAO Yajuan¹^,²(), LV Lucheng¹^,², SU Ying¹^,²

^1. National Science Library, Chinese Academy of Sciences, Beijing 100190
^2. Department of Information Resource Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190

Received:2025-12-29 Online:2026-06-05 Published:2026-06-17
Contact: ZHAO Yajuan

摘要/Abstract

摘要：

[目的/意义] 针对人工智能专利激增带来的审查压力及现有自动化分类稳定性不足、逻辑脱节等问题，本研究旨在构建一种融合审查逻辑与知识库驱动的专利层级化自动分类框架，通过显性编码审查逻辑，实现技术语义与IPC标准的精准对齐。 [方法/过程] 构建IPC分类标准库与知识库，建立层级化语义映射；采用“技术内容抽取→技术主题凝练→层级匹配映射”三阶段机制，利用DeepSeek-V3进行深度语义解构；结合独立匹配与自下向上层级映射策略，对人工智能领域专利进行“压力测试”与分类验证。 [结果/结论] 研究结果显示：最优融合策略在IPC部、大类、小类、大组、小组层级的准确率分别达到100%、97.32%、91.84%、86.48%和71.25%，均显著优于PatentBERT及直接使用LLM分类；消融实验进一步揭示了方案的“架构稳定性”：通过嵌入审查逻辑编码，成功将大语言模型的生成随机性限制在结构化的分类轨道内，实现了从“概率性语义对齐”向“确定性稳态推理”的范式转型。

关键词: 专利分类, 审查逻辑, 分类知识库, 层级化分类, 大语言模型, 人工智能

Abstract:

[Purpose/Significance] The rapid evolution of artificial intelligence (AI) has led to a surge in patent applications, placing immense pressure on traditional patent examination and management systems. While automated classification has gained attention, existing methods often suffer from "semantic drift," hierarchical conflicts, and a lack of interpretability, primarily because they treat classification as a flat probabilistic task rather than a structured logical inference. This study aims to develop a hierarchical automatic patent classification framework that is not only efficient but also hierarchy-consistent and deeply aligned with the professional logic of patent examiners. By shifting the paradigm from black-box probabilistic guessing to knowledge-driven steady-state inference, this study provides a scalable and reliable pathway for intelligent patent classification in high-density technical domains. [Method/Process] The proposed framework was built upon a three-stage mechanism: technical content extraction, technical theme condensation, and hierarchical mapping, utilizing DeepSeek-V3 as the core semantic engine. First, the study constructed an IPC classification standard library and a patent classification knowledge base. A key innovation here is the "Hierarchical Fusion Strategy," which explicitly encodes the examination logic by embedding parent-level technical definitions into child-level descriptions to provide a complete semantic boundary. This ensures that the model perceives the nested structure of the IPC system rather than treating categories as independent labels. Second, the framework performs a semantic-anchored extraction of technical information. Unlike traditional methods that rely on raw text, this process utilizes the IPC standard library as a reference to filter and condense patent claims and descriptions into structured "technical themes". This intermediate representation mitigates the risks of semantic hallucination and handles data sparsity by compressing the semantic space into a more consistent and discriminative form. Third, a "bottom-up" hierarchical mapping strategy was implemented. The system prioritizes matching at the most granular level (the IPC subgroup) and then derives higher-level categories through the established hierarchical chain. To ensure robustness, a dual-path verification mechanism - parallel comparison between independent matching and hierarchical mapping - was introduced. When results conflicted, the system employed a logic of confidence priority to perform local error correction, ensuring that the final output was both fine-grainedly accurate and hierarchically consistent. [Results/Conclusions] Experimental validation conducted on a dataset of Chinese AI invention patents from 2021 to 2025 demonstrates the superior "architecture stability" of the framework. The optimal fusion strategy achieved accuracy rates of 100%, 97.32%, 91.84%, 86.48%, and 71.25% at the IPC section, class, subclass, main group, and subgroup levels, respectively, significantly outperforming the PatentBERT baseline and direct large language model (LLM) classification. Ablation studies confirmed that the integration of IPC knowledge guidance, the condensation of technical themes, and the bottom-up mapping strategy are all critical contributors to performance gains. The results demonstrate that by encoding examination logic into the model, the inherent randomness of LLMs can be effectively constrained within a structured logical track. This framework essentially functions as a "classification skill" for AI Agents, capable of being integrated into intelligent examination systems via an API for constant and automated category updates. Despite limitations in domain coverage, the model-agnostic nature of the architecture suggests high potential for migration to other complex technical fields, providing a foundational methodology for the unified representation and analysis of multi-source innovation data.

Key words: patent classification, examination logic, classification knowledge base, hierarchical classification, large language model, artificial intelligence

中图分类号: G254.11

席崇俊, 赵亚娟, 吕璐成, 苏莹. 融合审查逻辑与知识库驱动的专利层级化自动分类研究——以人工智能领域为例[J]. 农业图书情报学报, 2026, 38(6): 28-42.

XI Chongjun, ZHAO Yajuan, LV Lucheng, SU Ying. Hierarchical Automatic Patent Classification Driven by the Integration of Examination Logic and Knowledge Bases: A Case Study in the Artificial Intelligence Domain[J]. Journal of library and information science in agriculture, 2026, 38(6): 28-42.

图/表 11

图1

图2

图3

图4

图5

图6

图7

图8

图9

图10

图11

参考文献 31

[1]	韩洪灵, 董恬媛, 刘强, 等. 专利加速审查与企业创新[J/OL]. 南开管理评论, 1-31[2025-06-11].
	Han Hongling, Dong Tianyuan, Liu Qiang, et al. Patent accelerated examination and corporate innovation[J/OL]. Nankai Business Review, 1-31[2025-06-11].
[2]	中国政府网. 这一领域多个全球第一!看最新安排[EB/OL]. (2025-04-24)[2025-06-11].
[3]	项芮, 孙巍. 基于PhraseLDA-SNA和机器学习的技术主题影响力测度方法研究[J]. 农业图书情报学报, 2024, 36(4): 45-62.
	Xiang Rui, Sun Wei. Methodology for assessing the influence of technical topics based on PhraseLDA-SNA and machine learning[J]. Journal of Library and Information Science in Agriculture, 2024, 36(4): 45-62.
[4]	吴玉莲. 大模型技术在专利领域中的应用研究系统综述[J]. 中国发明与专利, 2025, 22(4): 66-76.
	Wu Yulian. A systematic review of the application of large language models in patents[J]. China Invention & Patent, 2025, 22(4): 66-76.
[5]	朱梦珂, 朱一帆, 上官博屹, 等. 基于大语言模型的思维链技术应用展望[C]//第十三届中国指挥控制大会论文集. 2025: 258-263.
[6]	吴洁, 桂亮, 刘鹏, 等. 多维特征视角下基于图卷积网络的专利技术领域自动识别研究[J]. 中国管理科学, 2022, 30(12): 185-197.
	Wu Jie, Gui Liang, Liu Peng, et al. Patent classification based on multi-dimensional feature and graph convolutional networks[J]. Chinese Journal of Management Science, 2022, 30(12): 185-197.
[7]	Chang S B, Lai K K, Chang Shumin. Exploring technology diffusion and classification of business methods: Using the patent citation network[J]. Technological Forecasting and Social Change, 2009, 76(1): 107-117.
[8]	Lai K K, Wu S J. Using the patent co-citation approach to establish a new patent classification system[J]. Information Processing & Management, 2005, 41(2): 313-330.
[9]	Fang Lintao, Zhang Le, Wu Han, et al. Patent2Vec: Multi-view representation learning on patent-graphs for patent classification[J]. World Wide Web, 2021, 24(5): 1791-1812.
[10]	Fall C J, Törcsvári A, Benzineb K, et al. Automated categorization in the international patent classification[J]. ACM SIGIR Forum, 2003, 37(1): 10-25.
[11]	杨超宇, 陈雯君, 耿显亚. 基于改进SVM的中文专利文本分类比较研究[J]. 武汉理工大学学报(信息与管理工程版), 2023, 45(2): 292-298, 303.
	Yang Chaoyu, Chen Wenjun, Geng Xianya. Comparative study on Chinese patent text classification based on improved SVM[J]. Journal of Wuhan University of Technology (Information & Management Engineering), 2023, 45(2): 292-298, 303.
[12]	贾杉杉, 刘畅, 孙连英, 等. 基于多特征多分类器集成的专利自动分类研究[J]. 数据分析与知识发现, 2017, 1(8): 76-84.
	Jia Shanshan, Liu Chang, Sun Lianying, et al. Patent classification based on multi-feature and multi-classifier integration[J]. Data Analysis and Knowledge Discovery, 2017, 1(8): 76-84.
[13]	慎金花, 陈红艺, 张更平, 等. 基于层次分类器的专利文本分类模型研究[J]. 情报杂志, 2023, 42(8): 157-163, 68.
	Shen Jinhua, Chen Hongyi, Zhang Gengping, et al. Research on patent text classification model based on hierarchical classifier[J]. Journal of Intelligence, 2023, 42(8): 157-163, 68.
[14]	Verberne S, Vogel M, D'Hondt E. Patent classification experiments with the Linguistic classification system LCS[C]//CLEF 2010 LABs and Workshops, Notebook Papers. Padua: CLEF, 2010.
[15]	Stutzki J, Schubert M. Geodata supported classification of patent applications[J]. ACM Transactions on Spatial Algorithms and Systems, 2016, 2(3): 1-18.
[16]	马双刚. 基于深度学习理论与方法的中文专利文本自动分类研究[D]. 镇江: 江苏大学, 2016.
	Ma Shuanggang. The Study of Automatic Chinese Patent Classification Based on Deep Learning Theory and Method[D]. Zhenjiang: Jiangsu University, 2016.
[17]	胡杰, 李少波, 于丽娅, 等. 基于卷积神经网络与随机森林算法的专利文本分类模型[J]. 科学技术与工程, 2018, 18(6): 268-272.
	Hu Jie, Li Shaobo, Yu Liya, et al. A patent classification model based on convolutional neural networks and rand forest[J]. Science Technology and Engineering, 2018, 18(6): 268-272.
[18]	马建红, 王瑞杨, 姚爽, 等. 基于深度学习的专利分类方法[J]. 计算机工程, 2018, 44(10): 209-214.
	Ma Jianhong, Wang Ruiyang, Yao Shuang, et al. Patent classification method based on depth learning[J]. Computer Engineering, 2018, 44(10): 209-214.
[19]	金晶, 陶皖, 皇苏斌, 等. 基于混合嵌入的专利数据层次多标签分类模型研究[J]. 长春理工大学学报(自然科学版), 2025, 48(2): 91-101.
	Jin Jing, Tao Wan, Huang Subin, et al. Research on hierarchical multi-label classification model of patent data based on hybrid embedding[J]. Journal of Changchun University of Science and Technology, 2025, 48(2): 91-101.
[20]	Li Shaobo, Hu Jie, Cui Yuxin, et al. DeepPatent: Patent classification with convolutional neural networks and word embedding[J]. Scientometrics, 2018, 117(2): 721-744.
[21]	Haghighian Roudsari A, Afshar J, Lee W, et al. PatentNet: Multi-label classification of patent documents using deep learning based on language understanding[J]. Scientometrics, 2022, 127(1): 207-231.
[22]	Jung G, Shin J, Lee S. Impact of preprocessing and word embedding on extreme multi-label patent classification tasks[J]. Applied Intelligence, 2023, 53(4): 4047-4062.
[23]	Xu Xuejie, Xu Yi, Miao Ziqi, et al. Memory-enhanced hierarchical and temporal semantic learning for multi-label patent classification[J]. Expert Systems with Applications, 2025, 285: 127556.
[24]	Qiang Yifan, Sun Gaojie, Liu Hui. PatentALL: Multi-label patent classification using adaptive label learning[C]//2024 IEEE 36th International Conference on Tools with Artificial Intelligence (ICTAI). Piscataway, New Jersey: IEEE, 2024: 108-115.
[25]	Bekamiri H, Hain D S, Jurowetzki R. PatentSBERTa: A deep NLP based hybrid model for patent distance and classification using augmented SBERT[J]. Technological Forecasting and Social Change, 2024, 206: 123536.
[26]	Li Munan, Wang Liang. Leveraging patent classification based on deep learning: The case study on smart cities and industrial Internet of Things[J]. Journal of Informetrics, 2025, 19(1): 101616.
[27]	Kamateri E, Salampasis M, Perez-Molina E. Will AI solve the patent classification problem?[J]. World Patent Information, 2024, 78: 102294.
[28]	Yoshikawa N, Krestel R. Do large language models understand patents? Enhancing patent classification through AI-generated summaries[J]. World Patent Information, 2025, 81: 102353.
[29]	Rafieian B, Vázquez P P. Improved multi-label hierarchical patent classification using LLMs[J]. World Patent Information, 2025, 81: 102356.
[30]	国家知识产权局. 国际专利分类使用指南(2025版)[EB/OL]. (2025-05-08)[2025-06-11].
[31]	吕国燕, 戴佳呈, 吕学强, 等. 中文专利文本结构信息提取方法[J]. 计算机工程与设计, 2025, 46(3): 665-672.
	Lv Guoyan, Dai Jiacheng, Lv Xueqiang, et al. Extraction method of Chinese patent text structure information[J]. Computer Engineering and Design, 2025, 46(3): 665-672.

融合审查逻辑与知识库驱动的专利层级化自动分类研究——以人工智能领域为例

Hierarchical Automatic Patent Classification Driven by the Integration of Examination Logic and Knowledge Bases: A Case Study in the Artificial Intelligence Domain

RichHTML

PDF (PC)

赞

可视化

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献 31

相关文章 15

Metrics

本文评价

推荐阅读 0

[1]	吴丹, 许浩. 从人机交互到人智协同：中国信息资源管理学科自主知识体系建构的前沿视角[J]. 农业图书情报学报, 2026, 38(5): 55-64.
[2]	吕璐成, 周健, 孙文君, 赵亚娟, 韩涛. 微调大模型在专利文本挖掘中的应用效果研究[J]. 农业图书情报学报, 2026, 38(4): 36-46.
[3]	安琳. 生成式人工智能迭代中的个人信息安全治理：基于大模型技术演进视角[J]. 农业图书情报学报, 2026, 38(4): 61-70.
[4]	李白杨, 任尚升. 开源智能体的技术演化与应用场景——以“龙虾”OpenClaw为例[J]. 农业图书情报学报, 2026, 38(4): 23-35.
[5]	钱力, 杨颜僖, 张元哲, 胡懋地, 常志军. OpenClaw对科技文献情报工作的影响与启示[J]. 农业图书情报学报, 2026, 38(4): 4-12.
[6]	胡安琪. 高校学生AI素养能力框架及培训体系建设[J]. 农业图书情报学报, 2026, 38(2): 42-55.
[7]	黄晓棠, 姚奇彬. 基于AIGC技术应用的GLAM机构协同发展路径研究[J]. 农业图书情报学报, 2026, 38(2): 66-78.
[8]	易臣何, 张雨婷. 基于优化 BP 神经网络的生成式人工智能对网络舆情影响风险评估预警研究[J]. 农业图书情报学报, 2026, 38(2): 30-41.
[9]	郭海玲, 曾美云, 冯予希. AI赋能高校图书馆服务科技成果转化的模式构建与策略研究[J]. 农业图书情报学报, 2026, 38(2): 56-65.
[10]	张玲. 数字人文与农业知识服务的融合创新：基于仿真模拟的视角[J]. 农业图书情报学报, 2026, 38(2): 79-89.
[11]	吴玉浩, 刘艺浩, 李庆军, 胡旭. 基于大语言模型的图书馆数据开放共享：逻辑、路径与策略[J]. 农业图书情报学报, 2026, 38(1): 28-43.
[12]	江京泽, 周天旻, 李妹, 程诚, 陈海燕. 高校图书馆智能化转型中的复合型AI馆员核心能力模型的探索与研究[J]. 农业图书情报学报, 2025, 37(9): 97-109.
[13]	王晓宇, 胡靖源, 巫若羽, 王舒, 翟羽佳. 基于大语言模型数据增强的“科学-技术”主题关联方法研究——以节能领域为例[J]. 农业图书情报学报, 2025, 37(9): 63-81.
[14]	沈洪杰, 沈洪伟, 王均莉. 生成式AI赋能数字图书馆信息素养教育：路径探索、挑战分析与应对策略[J]. 农业图书情报学报, 2025, 37(7): 50-60.
[15]	董克, 宋雨宸, 吴佳纯. 欧盟人工智能数据治理的政策布局与治理特征研究[J]. 农业图书情报学报, 2025, 37(7): 4-18.