中文    English

Journal of library and information science in agriculture

   

Hierarchical Automatic Patent Classification Driven by the Integration of Examination Logic and Knowledge Bases: A Case Study in the Artificial Intelligence Domain

XI Chongjun1,2, ZHAO Yajuan1,2(), LV Lucheng1,2, SU Ying1,2   

  1. 1. National Science Library, Chinese Academy of Sciences, Beijing 100190
    2. Department of Information Resource Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190
  • Received:2025-12-29 Online:2026-04-27
  • Contact: ZHAO Yajuan

Abstract:

[Purpose/Significance] The rapid evolution of artificial intelligence (AI) has led to a surge in patent applications, placing immense pressure on traditional patent examination and management systems. While automated classification has gained attention, existing methods often suffer from "semantic drift," hierarchical conflicts, and a lack of interpretability, primarily because they treat classification as a flat probabilistic task rather than a structured logical inference. This study aims to develop a hierarchical automatic patent classification framework that is not only efficient but also hierarchy-consistent and deeply aligned with the professional logic of patent examiners. By shifting the paradigm from black-box probabilistic guessing to knowledge-driven steady-state inference, this study provides a scalable and reliable pathway for intelligent patent classification in high-density technical domains. [Method/Process] The proposed framework was built upon a three-stage mechanism: technical content extraction, technical theme condensation, and hierarchical mapping, utilizing DeepSeek-V3 as the core semantic engine. First, the study constructed an IPC classification standard library and a patent classification knowledge base. A key innovation here is the "Hierarchical Fusion Strategy," which explicitly encodes the examination logic by embedding parent-level technical definitions into child-level descriptions to provide a complete semantic boundary. This ensures that the model perceives the nested structure of the IPC system rather than treating categories as independent labels. Second, the framework performs a semantic-anchored extraction of technical information. Unlike traditional methods that rely on raw text, this process utilizes the IPC standard library as a reference to filter and condense patent claims and descriptions into structured "technical themes". This intermediate representation mitigates the risks of semantic hallucination and handles data sparsity by compressing the semantic space into a more consistent and discriminative form. Third, a "bottom-up" hierarchical mapping strategy was implemented. The system prioritizes matching at the most granular level (the IPC subgroup) and then derives higher-level categories through the established hierarchical chain. To ensure robustness, a dual-path verification mechanism - parallel comparison between independent matching and hierarchical mapping - was introduced. When results conflicted, the system employed a logic of confidence priority to perform local error correction, ensuring that the final output was both fine-grainedly accurate and hierarchically consistent. [Results/Conclusions] Experimental validation conducted on a dataset of Chinese AI invention patents from 2021 to 2025 demonstrates the superior "architecture stability" of the framework. The optimal fusion strategy achieved accuracy rates of 100%, 97.32%, 91.84%, 86.48%, and 71.25% at the IPC section, class, subclass, main group, and subgroup levels, respectively, significantly outperforming the PatentBERT baseline and direct large language model (LLM) classification. Ablation studies confirmed that the integration of IPC knowledge guidance, the condensation of technical themes, and the bottom-up mapping strategy are all critical contributors to performance gains. The results demonstrate that by encoding examination logic into the model, the inherent randomness of LLMs can be effectively constrained within a structured logical track. This framework essentially functions as a "classification skill" for AI Agents, capable of being integrated into intelligent examination systems via an API for constant and automated category updates. Despite limitations in domain coverage, the model-agnostic nature of the architecture suggests high potential for migration to other complex technical fields, providing a foundational methodology for the unified representation and analysis of multi-source innovation data.

Key words: patent classification, examination logic, classification knowledge base, hierarchical classification, large language model, artificial intelligence

CLC Number: 

  • G254.11

Fig.1

Hierarchical automatic patent classification process"

Fig.2

Experimental design flowchart"

Fig.3

Schematic of prompts for constructing the IPC classification knowledge base using large language models"

Fig.4

Schematic of prompts for constructing the patent information base for classification using large language models"

Fig.5

Number of IPC categories included in different datasets"

Fig.6

An example of the IPC classification standard library"

Fig.7

An example of the patent classification knowledge base"

Fig.8

An example of the patent information base for classification"

Fig.9

An example of patent classification results"

Fig.10

Classification accuracy before and after incorporating IPC information into technical content extraction"

Fig.11

Classification accuracy using two different matching methods"

Table 1

Comparative analysis of results"

分类结果准确率 IPC部 IPC大类 IPC小类 IPC大组 IPC小组
独立匹配:技术内容 84.89 91.59 80.02 70.15 62.48
独立匹配:技术主题 56.63 90.37 76.61 70.15 61.75
独立匹配:技术内容+技术主题 88.43 93.91 85.99 78.81 71.25
层级映射:技术内容 92.57 88.55 76.61 70.16 61.75
层级映射:技术主题 93.06 89.77 73.45 66.99 62.48
层级映射:技术内容+技术主题 94.15 92.45 87.09 81.12 71.25
技术内容:独立+层级 98.42 95.25 89.89 79.54 62.48
技术主题:独立+层级 100.00 94.64 76.61 80.88 61.75
内容+主题AND独立+层级 100.00 97.32 91.84 86.48 71.25
DeepSeek直接匹配(Baseline) 84.29 84.04 76.25 71.01 69.79
PatentBERT模型(Baseline) 87.45 84.41 48.23 28.38 32.11
[1]
韩洪灵, 董恬媛, 刘强, 等. 专利加速审查与企业创新[J/OL]. 南开管理评论, 1-31[2025-06-11].
Han Hongling, Dong Tianyuan, Liu Qiang, et al. Patent accelerated examination and corporate innovation[J/OL]. Nankai Business Review, 1-31[2025-06-11].
[2]
中国政府网. 这一领域多个全球第一!看最新安排[EB/OL]. (2025-04-24)[2025-06-11].
[3]
项芮, 孙巍. 基于PhraseLDA-SNA和机器学习的技术主题影响力测度方法研究[J]. 农业图书情报学报, 2024, 36(4): 45-62.
Xiang Rui, Sun Wei. Methodology for assessing the influence of technical topics based on PhraseLDA-SNA and machine learning[J]. Journal of Library and Information Science in Agriculture, 2024, 36(4): 45-62.
[4]
吴玉莲. 大模型技术在专利领域中的应用研究系统综述[J]. 中国发明与专利, 2025, 22(4): 66-76.
Wu Yulian. A systematic review of the application of large language models in patents[J]. China Invention & Patent, 2025, 22(4): 66-76.
[5]
朱梦珂, 朱一帆, 上官博屹, 等. 基于大语言模型的思维链技术应用展望[C]//第十三届中国指挥控制大会论文集. 2025: 258-263.
[6]
吴洁, 桂亮, 刘鹏, 等. 多维特征视角下基于图卷积网络的专利技术领域自动识别研究[J]. 中国管理科学, 2022, 30(12): 185-197.
Wu Jie, Gui Liang, Liu Peng, et al. Patent classification based on multi-dimensional feature and graph convolutional networks[J]. Chinese Journal of Management Science, 2022, 30(12): 185-197.
[7]
Chang S B, Lai K K, Chang Shumin. Exploring technology diffusion and classification of business methods: Using the patent citation network[J]. Technological Forecasting and Social Change, 2009, 76(1): 107-117.
[8]
Lai K K, Wu S J. Using the patent co-citation approach to establish a new patent classification system[J]. Information Processing & Management, 2005, 41(2): 313-330.
[9]
Fang Lintao, Zhang Le, Wu Han, et al. Patent2Vec: Multi-view representation learning on patent-graphs for patent classification[J]. World Wide Web, 2021, 24(5): 1791-1812.
[10]
Fall C J, Törcsvári A, Benzineb K, et al. Automated categorization in the international patent classification[J]. ACM SIGIR Forum, 2003, 37(1): 10-25.
[11]
杨超宇, 陈雯君, 耿显亚. 基于改进SVM的中文专利文本分类比较研究[J]. 武汉理工大学学报(信息与管理工程版), 2023, 45(2): 292-298, 303.
Yang Chaoyu, Chen Wenjun, Geng Xianya. Comparative study on Chinese patent text classification based on improved SVM[J]. Journal of Wuhan University of Technology (Information & Management Engineering), 2023, 45(2): 292-298, 303.
[12]
贾杉杉, 刘畅, 孙连英, 等. 基于多特征多分类器集成的专利自动分类研究[J]. 数据分析与知识发现, 2017, 1(8): 76-84.
Jia Shanshan, Liu Chang, Sun Lianying, et al. Patent classification based on multi-feature and multi-classifier integration[J]. Data Analysis and Knowledge Discovery, 2017, 1(8): 76-84.
[13]
慎金花, 陈红艺, 张更平, 等. 基于层次分类器的专利文本分类模型研究[J]. 情报杂志, 2023, 42(8): 157-163, 68.
Shen Jinhua, Chen Hongyi, Zhang Gengping, et al. Research on patent text classification model based on hierarchical classifier[J]. Journal of Intelligence, 2023, 42(8): 157-163, 68.
[14]
Verberne S, Vogel M, D'Hondt E. Patent classification experiments with the Linguistic classification system LCS[C]//CLEF 2010 LABs and Workshops, Notebook Papers. Padua: CLEF, 2010.
[15]
Stutzki J, Schubert M. Geodata supported classification of patent applications[J]. ACM Transactions on Spatial Algorithms and Systems, 2016, 2(3): 1-18.
[16]
马双刚. 基于深度学习理论与方法的中文专利文本自动分类研究[D]. 镇江: 江苏大学, 2016.
Ma Shuanggang. The Study of Automatic Chinese Patent Classification Based on Deep Learning Theory and Method[D]. Zhenjiang: Jiangsu University, 2016.
[17]
胡杰, 李少波, 于丽娅, 等. 基于卷积神经网络与随机森林算法的专利文本分类模型[J]. 科学技术与工程, 2018, 18(6): 268-272.
Hu Jie, Li Shaobo, Yu Liya, et al. A patent classification model based on convolutional neural networks and rand forest[J]. Science Technology and Engineering, 2018, 18(6): 268-272.
[18]
马建红, 王瑞杨, 姚爽, 等. 基于深度学习的专利分类方法[J]. 计算机工程, 2018, 44(10): 209-214.
Ma Jianhong, Wang Ruiyang, Yao Shuang, et al. Patent classification method based on depth learning[J]. Computer Engineering, 2018, 44(10): 209-214.
[19]
金晶, 陶皖, 皇苏斌, 等. 基于混合嵌入的专利数据层次多标签分类模型研究[J]. 长春理工大学学报(自然科学版), 2025, 48(2): 91-101.
Jin Jing, Tao Wan, Huang Subin, et al. Research on hierarchical multi-label classification model of patent data based on hybrid embedding[J]. Journal of Changchun University of Science and Technology, 2025, 48(2): 91-101.
[20]
Li Shaobo, Hu Jie, Cui Yuxin, et al. DeepPatent: Patent classification with convolutional neural networks and word embedding[J]. Scientometrics, 2018, 117(2): 721-744.
[21]
Haghighian Roudsari A, Afshar J, Lee W, et al. PatentNet: Multi-label classification of patent documents using deep learning based on language understanding[J]. Scientometrics, 2022, 127(1): 207-231.
[22]
Jung G, Shin J, Lee S. Impact of preprocessing and word embedding on extreme multi-label patent classification tasks[J]. Applied Intelligence, 2023, 53(4): 4047-4062.
[23]
Xu Xuejie, Xu Yi, Miao Ziqi, et al. Memory-enhanced hierarchical and temporal semantic learning for multi-label patent classification[J]. Expert Systems with Applications, 2025, 285: 127556.
[24]
Qiang Yifan, Sun Gaojie, Liu Hui. PatentALL: Multi-label patent classification using adaptive label learning[C]//2024 IEEE 36th International Conference on Tools with Artificial Intelligence (ICTAI). Piscataway, New Jersey: IEEE, 2024: 108-115.
[25]
Bekamiri H, Hain D S, Jurowetzki R. PatentSBERTa: A deep NLP based hybrid model for patent distance and classification using augmented SBERT[J]. Technological Forecasting and Social Change, 2024, 206: 123536.
[26]
Li Munan, Wang Liang. Leveraging patent classification based on deep learning: The case study on smart cities and industrial Internet of Things[J]. Journal of Informetrics, 2025, 19(1): 101616.
[27]
Kamateri E, Salampasis M, Perez-Molina E. Will AI solve the patent classification problem?[J]. World Patent Information, 2024, 78: 102294.
[28]
Yoshikawa N, Krestel R. Do large language models understand patents? Enhancing patent classification through AI-generated summaries[J]. World Patent Information, 2025, 81: 102353.
[29]
Rafieian B, Vázquez P P. Improved multi-label hierarchical patent classification using LLMs[J]. World Patent Information, 2025, 81: 102356.
[30]
国家知识产权局. 国际专利分类使用指南(2025版)[EB/OL]. (2025-05-08)[2025-06-11].
[31]
吕国燕, 戴佳呈, 吕学强, 等. 中文专利文本结构信息提取方法[J]. 计算机工程与设计, 2025, 46(3): 665-672.
Lv Guoyan, Dai Jiacheng, Lv Xueqiang, et al. Extraction method of Chinese patent text structure information[J]. Computer Engineering and Design, 2025, 46(3): 665-672.
[1] LYU Lucheng, ZHOU Jian, SUN Wenjun, ZHAO Yajuan, HAN Tao. Performance of Fine-Tuned Large Language Models in Patent Text Mining [J]. Journal of library and information science in agriculture, 2026, 38(4): 36-46.
[2] AN Lin. Governance of Personal Information Security in the Iteration of Generative AI: From the Perspective of the Technological Evolution of Large Models [J]. Journal of library and information science in agriculture, 2026, 38(4): 61-70.
[3] LI Baiyang, REN Shangsheng. Technical Evolution and Application Scenarios of Open-Source Agents:A Case Study of "OpenClaw" [J]. Journal of library and information science in agriculture, 2026, 38(4): 23-35.
[4] QIAN Li, YANG Yanxi, ZHANG Yuanzhe, HU Maodi, CHANG Zhijun. The Impacts and Implications of OpenClaw for Scientific and Technical Literature Intelligence Work [J]. Journal of library and information science in agriculture, 2026, 38(4): 4-12.
[5] DENG Qiping, KE Jiaxiu, GAN Peng, ZHOU Song. Construction of an Intelligent Agent for Academic Output Data Analysis Oriented to Academic Evaluation [J]. Journal of library and information science in agriculture, 2026, 38(3): 76-87.
[6] HU Anqi. Construction of an Artificial Intelligence Literacy Ability Framework and Training System for College Students [J]. Journal of library and information science in agriculture, 2026, 38(2): 42-55.
[7] HUANG Xiaotang, YAO Qibin. Collaborative Development Path of GLAM Institutions Based on AIGC Technology Application [J]. Journal of library and information science in agriculture, 2026, 38(2): 66-78.
[8] YI Chenhe, ZHANG Yuting. Risk Assessment and Early Warning of Generative Artificial Intelligence Impact on Network Public Opinion Based on Optimized BP Neural Network [J]. Journal of library and information science in agriculture, 2026, 38(2): 30-41.
[9] GUO Hailing, ZENG Meiyun, FENG Yuxi. Model Construction and Strategies for AI-enabled University Library Services to Facilitate Scientific and Technological Achievement Transformation [J]. Journal of library and information science in agriculture, 2026, 38(2): 56-65.
[10] ZHANG Ling. Integrating Digital Humanities and Agricultural Knowledge Services A Simulation Modeling Perspectives [J]. Journal of library and information science in agriculture, 2026, 38(2): 79-89.
[11] WU Yuhao, LIU Yihao, LI Qingjun, HU Xu. Open Sharing of Library Data Based on Large Language Models: Logic, Path and Strategy [J]. Journal of library and information science in agriculture, 2026, 38(1): 28-43.
[12] JIANG Jingze, ZHOU Tianmin, LI Mei, CHENG Cheng, CHEN Haiyan. A study of the Core Competence Model of Compound AI Librarians in the Intelligent Transformation of University Libraries [J]. Journal of library and information science in agriculture, 2025, 37(9): 97-109.
[13] WANG Xiaoyu, HU Jingyuan, WU Ruoyu, WANG Shu, ZHAI Yujia. An LLM-based Data Augmentation Method for Constructing Science & Technology Topic Linkages: Taking the Energy Conservation Field as an Example [J]. Journal of library and information science in agriculture, 2025, 37(9): 63-81.
[14] SHEN Hongjie, SHEN Hongwei, WANG Junli. Generative AI Empowering Information Literacy Education in Digital Libraries: Path Exploration, Challenge Analysis, and Response Strategies [J]. Journal of library and information science in agriculture, 2025, 37(7): 50-60.
[15] DONG Ke, SONG Yuchen, WU Jiachun. Layout and Characteristics of European AI Data Governance Policy [J]. Journal of library and information science in agriculture, 2025, 37(7): 4-18.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!