农业图书情报学刊 ›› 2015, Vol. 27 ›› Issue (2): 57-59.doi: 10.13998/j.cnki.issn1002-1248.2015.02.015

• 网络技术 • 上一篇    下一篇

基于特征及规则模式的学位论文元数据信息自动抽取研究

陈淑平   

  1. 燕山大学图书馆,河北 秦皇岛 066004
  • 收稿日期:2014-07-01 出版日期:2015-02-05 发布日期:2015-03-04
  • 作者简介:陈淑平(1980-),女,硕士,馆员,主要从事信息检索、纸本文献数字化。

Automatic Extraction of Metadata Information for Dissertation based on Feature and Rule Pattern

CHEN Shu-ping   

  1. Library of Yanshan University, Yanshan University, Hebei 066004, China
  • Received:2014-07-01 Online:2015-02-05 Published:2015-03-04

摘要: 目前,在中国高校数字图书馆,学位论文数据库是重要的数字资源,然而,其元数据录入一直依赖手工完成,效率低,耗费大量的人力。针对这一问题,采用基于文档特征与规则模式匹配的方法,利用正则表达式研究学位论文元数据的自动抽取,该算法包括信息定位和元数据抽取两个模块。实验数据表明,该算法具有较高的准确率和召回率以及综合性能指数F。

关键词: 学位论文, 元数据, 信息抽取, 正则表达式, 模式匹配

Abstract: Currently, in our digital library, dissertations database is one important of digital resources. However, metadata entry has relied on manual to complete, which is low efficiency, and cost a lot of manpower. For this problem, our applied the method of document features and pattern matching, and made use of regular expressions to research automatic extraction of dissertation metadata. The algorithm includes two modules of information field location and metadata extraction. The experimental data shows that the algorithm has higher precision and recall, and overall performance index F.

Key words: Dissertation

中图分类号: 

  • G203

引用本文

陈淑平. 基于特征及规则模式的学位论文元数据信息自动抽取研究[J]. 农业图书情报学刊, 2015, 27(2): 57-59.

CHEN Shu-ping. Automatic Extraction of Metadata Information for Dissertation based on Feature and Rule Pattern[J]. , 2015, 27(2): 57-59.