农业图书情报学报 ›› 2022, Vol. 34 ›› Issue (6): 36-49.doi: 10.13998/j.cnki.issn1002-1248.22-0114

• 研究论文 • 上一篇    下一篇

基于生命周期模型的科技文献数据管理体系研究

常志军1,2, 许丽媛1,*, 于倩倩1, 张建勇1,2, 王永吉3   

  1. 1.中国科学院文献情报中心 数据资源部,北京 100190;
    2.中国科学院大学 图书情报与档案管理系,北京 100049;
    3.中国科学院软件研究所 计算机科学国家重点实验室,北京 100190
  • 收稿日期:2022-03-01 出版日期:2022-06-05 发布日期:2022-07-08
  • 通讯作者: * 许丽媛,女,硕士,馆员,研究方向为数据资源管理与质量控制。Email:xuly@mail.las.ac.cn
  • 作者简介:常志军,男,硕士,副研究馆员,硕士研究生导师,研究方向为大数据平台建设与管理、领域知识图谱构建等。于倩倩,女,硕士,副研究馆员,研究方向为数据管理与组织。张建勇,男,硕士,研究馆员,研究方向为数据管理与组织。王永吉,男,博士,研究员,研究方向为计算机科学与技软件工程等
  • 基金资助:
    国家社会科学基金“面向循证医学的领域文献实体关系识别方法研究”(21BTQ106)

Scientific and Technical Literature Data Management System Based on Life Cycle Model

CHANG ZhiJun1,2, XU LiYuan1,*, YU QianQian1, ZHANG JianYong1,2, WANG YongJi3   

  1. 1. National Science Library, Chinese Academy Sciences, Beijing 100190;
    2. Department of Library Information and Archives Management, National Science Library, Chinese Academy of Sciences, Beijing 100049;
    3. State Key Laboratory of Computer Science Institute of Software, The Chinese Academy of Sciences, Beijing 100190
  • Received:2022-03-01 Online:2022-06-05 Published:2022-07-08

摘要: [目的/意义]科技文献数据资源具有覆盖广、数量大、类型多、更新快、时效强等特点,为提高科技文献数据管理效果和数据安全,本文基于数据生命周期模型对科技文献管理体系进行研究。[方法/过程]对科技文献管理模式进行探索,基于数据管理流程,构建了科技文献的生命周期体系,并从数据创建、数据存储、数据预处理、数据计算、数据服务、数据归档、数据销毁等7个阶段对数据管理工具和数据管理方法进行阐述。[结果/结论]本文对科瑞唯安核心数据集WOS BP数据进行了基于科技文献生命周期的管理和实践,同时基于DAMA数据质量的6个评估维度对数据管理效果进行综合评价。

关键词: 生命周期管理, 科技文献, 数据管理, 大数据治理, 知识图谱

Abstract: [Purpose/Significance] Scientific and technical (S&T) literature data resources are characterized with wide coverage, large quantity, many types, fast update and strong timeliness. In order to improve the effect and security of S&T literature data management, this paper studies the S&T literature management system based on the data life cycle model. [Method/Process] This paper explores the management mode of S&T documents, constructs the life cycle system of S&T documents based on the data management process, and expounds the data management tools and methods from the stages of data creation, data storage, data pre-processing, data calculation, data service, data archiving and data destruction. In the data creation stage, specific data access forms are formulated for different sources and data types, and personalized data creation tools are built to receive data completely. In the data storage stage, a unified document metadata storage system is developed by analyzing the characteristics and shortcomings of various types of data, so as to better explain and organize scientific and technological document data. In the data pre-processing stage, various tools are built to realize the formatting pre-processing, parsing, conversion, structuring and other operations of various types of data. In the data computing stage, data enrichment processing, entity relationship extraction and knowledge graph construction are mainly completed. Data provides services through a unified service interface. Data archiving completes data archiving and saving. In the data destruction phase, unnecessary data is safely destroyed. [Results/Conclusions] In this paper, the management and practice based on the life cycle of S&T literature were first carried out based on the core data set Web Of Science BP data , and then explored from the seven phases of creation, storage, pre-processing, calculation, service, archiving and destruction. Finally, based on the DAMA data quality evaluation principle, the comprehensive evaluation and evaluation of the data management effect were carried out from the six dimensions of integrity, uniqueness, real-time, validity, accuracy and consistency. The receiving integrity of data was 100%, and the non-null integrity of data was 59.75%. The uniqueness of data reached 99.23%. The real time of data was controllable. The validity of data met the constraint conditions. The accuracy of the data reached 100%. The consistency of data reached 90%. It basically solved the problem that data can be effectively managed and applied in each life cycle stage. Finally, the management model was verified to take effect and achieve desirable service effect.

Key words: life cycle management, scientific and technical (S&T) literature, data management, big data governance, knowledge graph

中图分类号: 

  • TN919

引用本文

常志军, 许丽媛, 于倩倩, 张建勇, 王永吉. 基于生命周期模型的科技文献数据管理体系研究[J]. 农业图书情报学报, 2022, 34(6): 36-49.

CHANG ZhiJun, XU LiYuan, YU QianQian, ZHANG JianYong, WANG YongJi. Scientific and Technical Literature Data Management System Based on Life Cycle Model[J]. Journal of Library and Information Science in Agriculture, 2022, 34(6): 36-49.