农业图书情报学报

• •    

融合审查逻辑与知识库驱动的专利层级化自动分类研究

席崇俊1,2, 赵亚娟1,2(), 吕璐成1,2, 苏莹1,2   

  1. 1. 中国科学院文献情报中心,北京 100190
    2. 中国科学院大学 经济与管理学院信息资源管理系,北京 100190
  • 收稿日期:2025-12-29 出版日期:2026-04-27
  • 通讯作者: 赵亚娟
  • 作者简介:

    席崇俊(1996- ),男,博士研究生,研究方向为知识产权情报研究

    吕璐成(1989- ),男,博士,副研究员,硕士生导师,研究方向为知识产权情报研究

    苏莹(1992- ),女,博士,助理研究员,研究方向为知识产权情报研究

  • 基金资助:
    国家自然科学基金青年科学基金项目“技术距离视角下的技术融合模式、特征及预测研究”(72304268)

Hierarchical Automatic Patent Classification Driven by the Integration of Examination Logic and Knowledge Bases: A Case Study in the Artificial Intelligence Domain

XI Chongjun1,2, ZHAO Yajuan1,2(), LV Lucheng1,2, SU Ying1,2   

  1. 1. National Science Library, Chinese Academy of Sciences, Beijing 100190
    2. Department of Information Resource Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190
  • Received:2025-12-29 Online:2026-04-27
  • Contact: ZHAO Yajuan

摘要:

[目的/意义] 针对人工智能专利激增带来的审查压力及现有自动化分类稳定性不足、逻辑脱节等问题,本研究旨在构建一种融合审查逻辑与知识库驱动的专利层级化自动分类框架,通过显性编码审查逻辑,实现技术语义与IPC标准的精准对齐。 [方法/过程] 构建IPC分类标准库与知识库,建立层级化语义映射;采用“技术内容抽取→技术主题凝练→层级匹配映射”三阶段机制,利用DeepSeek-V3进行深度语义解构;结合独立匹配与自下向上层级映射策略,对人工智能领域专利进行“压力测试”与分类验证。 [结果/结论] 研究结果显示:最优融合策略在IPC部、大类、小类、大组、小组层级的准确率分别达到100%、97.32%、91.84%、86.48%和71.25%,均显著优于PatentBERT及直接使用LLM分类;消融实验进一步揭示了方案的“架构稳定性”:通过嵌入审查逻辑编码,成功将大语言模型的生成随机性限制在结构化的分类轨道内,实现了从“概率性语义对齐”向“确定性稳态推理”的范式转型。

关键词: 专利分类, 审查逻辑, 分类知识库, 层级化分类, 大语言模型, 人工智能

Abstract:

[Purpose/Significance] The rapid evolution of artificial intelligence (AI) has led to a surge in patent applications, placing immense pressure on traditional patent examination and management systems. While automated classification has gained attention, existing methods often suffer from "semantic drift," hierarchical conflicts, and a lack of interpretability, primarily because they treat classification as a flat probabilistic task rather than a structured logical inference. This study aims to develop a hierarchical automatic patent classification framework that is not only efficient but also hierarchy-consistent and deeply aligned with the professional logic of patent examiners. By shifting the paradigm from black-box probabilistic guessing to knowledge-driven steady-state inference, this study provides a scalable and reliable pathway for intelligent patent classification in high-density technical domains. [Method/Process] The proposed framework was built upon a three-stage mechanism: technical content extraction, technical theme condensation, and hierarchical mapping, utilizing DeepSeek-V3 as the core semantic engine. First, the study constructed an IPC classification standard library and a patent classification knowledge base. A key innovation here is the "Hierarchical Fusion Strategy," which explicitly encodes the examination logic by embedding parent-level technical definitions into child-level descriptions to provide a complete semantic boundary. This ensures that the model perceives the nested structure of the IPC system rather than treating categories as independent labels. Second, the framework performs a semantic-anchored extraction of technical information. Unlike traditional methods that rely on raw text, this process utilizes the IPC standard library as a reference to filter and condense patent claims and descriptions into structured "technical themes". This intermediate representation mitigates the risks of semantic hallucination and handles data sparsity by compressing the semantic space into a more consistent and discriminative form. Third, a "bottom-up" hierarchical mapping strategy was implemented. The system prioritizes matching at the most granular level (the IPC subgroup) and then derives higher-level categories through the established hierarchical chain. To ensure robustness, a dual-path verification mechanism - parallel comparison between independent matching and hierarchical mapping - was introduced. When results conflicted, the system employed a logic of confidence priority to perform local error correction, ensuring that the final output was both fine-grainedly accurate and hierarchically consistent. [Results/Conclusions] Experimental validation conducted on a dataset of Chinese AI invention patents from 2021 to 2025 demonstrates the superior "architecture stability" of the framework. The optimal fusion strategy achieved accuracy rates of 100%, 97.32%, 91.84%, 86.48%, and 71.25% at the IPC section, class, subclass, main group, and subgroup levels, respectively, significantly outperforming the PatentBERT baseline and direct large language model (LLM) classification. Ablation studies confirmed that the integration of IPC knowledge guidance, the condensation of technical themes, and the bottom-up mapping strategy are all critical contributors to performance gains. The results demonstrate that by encoding examination logic into the model, the inherent randomness of LLMs can be effectively constrained within a structured logical track. This framework essentially functions as a "classification skill" for AI Agents, capable of being integrated into intelligent examination systems via an API for constant and automated category updates. Despite limitations in domain coverage, the model-agnostic nature of the architecture suggests high potential for migration to other complex technical fields, providing a foundational methodology for the unified representation and analysis of multi-source innovation data.

Key words: patent classification, examination logic, classification knowledge base, hierarchical classification, large language model, artificial intelligence

中图分类号:  G254.11

引用本文

席崇俊, 赵亚娟, 吕璐成, 苏莹. 融合审查逻辑与知识库驱动的专利层级化自动分类研究[J/OL]. 农业图书情报学报. https://doi.org/10.13998/j.cnki.issn1002-1248.25-0763.

XI Chongjun, ZHAO Yajuan, LV Lucheng, SU Ying. Hierarchical Automatic Patent Classification Driven by the Integration of Examination Logic and Knowledge Bases: A Case Study in the Artificial Intelligence Domain[J/OL]. Journal of library and information science in agriculture. https://doi.org/10.13998/j.cnki.issn1002-1248.25-0763.