农业图书情报学报 ›› 2020, Vol. 32 ›› Issue (7): 63-72.doi: 10.13998/j.cnki.issn1002-1248.2020.07.20-0181

所属专题: 知识产权服务

• 研究论文 • 上一篇    下一篇

渔业专利文献分类类目设置与机器标引策略研究

程锦祥1, 张钟月2, 曹淼3, 张红林3,*   

  1. 1. 中国水产科学研究院,北京 100141;
    2. 华中科技大学 同济医学院医药卫生管理学院,武汉 430030;
    3. 中国水产科学研究院 长江水产研究所,武汉 430223
  • 收稿日期:2020-03-19 出版日期:2020-07-05 发布日期:2020-07-16
  • 通讯作者: *张红林(1967- ),男,研究员,研究方向为渔业信息管理与编辑出版工作。Email:zhl@yfi.ac.cn
  • 作者简介:程锦祥(1990- ),男,助理研究员,研究方向为知识服务系统建设与管理研究工作。张钟月(1994- ),女,硕士研究生,研究方向为图书情报与数字图书馆。曹淼(1988- ),女,硕士,助理研究员,研究方向为政务与档案信息管理工作
  • 基金资助:
    中国工程科技知识中心“渔业专业知识服务系统”(CKCEST-2020-1-15)

Taxonomy Construction and Machine Indexing Strategies of Fishery Patent Literature

CHENG Jinxiang1, ZHANG Zhongyue2, CAO Miao3, ZHANG Honglin3,*   

  1. 1. Chinese Academy of Fishery Sciences, Beijing 100141;
    2. School of Medicine and Health Management, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030;
    3. Yangtze River Fisheries Research Institute, Chinese Academy of Fishery Sciences, Wuhan 430223
  • Received:2020-03-19 Online:2020-07-05 Published:2020-07-16

摘要: [目的/意义]为实现对渔业专利文献的深度利用,在“渔业专业知识服务系统”的信息组织中尝试了设置专业化的分类体系并进行了标引试验。[方法/过程]首先,研究以筛选出的10 323条中文渔业专利元数据为分析样本,通过对元数据中国际专利分类号(IPC)4位类号和6位类号频次的统计分析,确定了用于渔业专利文献分类的12个分类类目;然后,通过对专利题名结构的分析,提出了由行业属性词、业务类型词、发明类型词构成的专利题名中,业务类型词最适合专利文献的分类与标引的观点;通过对题名关键词以及题尾词组的统计分析,列举了每一业务类型所包含的高频主题词或词组。最后,试验设计了利用IPC分类号结合高频主题词分步标引渔业专利文献的策略,通过计算机辅助实现了对绝大部分渔业专利文献的专业化标引。[结果/结论]经对样本数据中2016年文献的机器标引结果与人工标引结果的比对,得出机器标引总正确率为91.44%,漏标率为7.94%,达到了预期目标。研究表明,所设置的渔业专利文献分类类目切合实际,标引策略具有很强的实践应用价值。

关键词: 渔业, 专利文献, 分类, 机器标引

Abstract: [Purpose / Significance] In order to deeply utilize fishery patent literature in "fishery knowledge service system", we construct a specialized taxonomy for the information organization and test it as an indexing tool in the system. [Method / Process] First, 10, 323 metadata of Chinese fishery patents are selected and analyzed in terms of 4-digit and 6-digit IPC numbers frequency. Based on the results, 12 IPC classification codes are chosen as categories in fishery patent literature classification. Then, based on the analysis of patent titles, it is proposed that a patent title consists of three kinds of phrases: those of industry attributes, of business types, and of invention types, among which phrases of business types are the most applicable for classifying and indexing patent literature. In addition, the subject terms and the ending parts of subject terms are analyzed and the high-frequency subject terms and words of each business type are ranked. Last, the strategy of indexing fishery patent literature by using IPC classification numbers and high frequency subject terms is put forward, and as a result, most of the fishery patent literature is classified properly with the assistance of computers. [Results / Conclusions] Giving that the total accuracy rate of machine indexing is 91.44%, and the missing rate is 7.94%, with the comparison between the manually classified and machine classified sampling data of patent literature in 2016, it is concluded that the goal of this study is achieved. The research shows that the classification system constructed from fishery patent literature is practical and the indexing strategy has a high application value.

Key words: fishery, patent literature, classification, machine indexing

中图分类号: 

  • G254.11

引用本文

程锦祥, 张钟月, 曹淼, 张红林. 渔业专利文献分类类目设置与机器标引策略研究[J]. 农业图书情报学报, 2020, 32(7): 63-72.

CHENG Jinxiang, ZHANG Zhongyue, CAO Miao, ZHANG Honglin. Taxonomy Construction and Machine Indexing Strategies of Fishery Patent Literature[J]. Journal of Library and Information Science in Agriculture, 2020, 32(7): 63-72.