农业图书情报学报 ›› 2021, Vol. 33 ›› Issue (9): 27-36.doi: 10.13998/j.cnki.issn1002-1248.21-0262

所属专题: 新时代古籍工作 数字人文

• 数字人文专题 • 上一篇    下一篇

基于Bi-LSTM的古籍事件句触发词分类方法研究

马晓雯, 何琳*, 刘建斌, 李章超, 高丹   

  1. 南京农业大学 信息管理学院,南京 210095
  • 收稿日期:2021-04-11 出版日期:2021-09-05 发布日期:2021-09-28
  • 通讯作者: * 何琳(ORCID:0000-0002-4207-3588),女,南京农业大学信息管理学院教授,博士生导师,研究方向为信息组织及文本挖掘。Email:helin@njau.edu.cn
  • 作者简介:马晓雯(ORCID:0000-0002-3725-4958),女,硕士研究生,研究方向为文本挖掘与人文计算。刘建斌(ORCID:0000-0003-4001-0337),男,硕士研究生,研究方向为文本挖掘。李章超(ORCID:0000-0002-9252-2142),男,博士研究生,研究方向为知识组织与文本挖掘。高丹(ORCID:0000-0002-0716-5508),女,博士研究生,研究方向为知识服务与数字人文
  • 基金资助:
    国家社科基金项目“基于典籍的中华传统文化知识表达体系自动构建方法研究”(18BTQ063)

The Trigger Verb Classification Method of Event Sentences in Ancient Chinese Classics Based on Bi-LSTM

MA Xiaowen, HE Lin*, LIU Jianbin, LI Zhangchao, GAO Dan   

  1. College of Information Management, Nanjing Agricultural University, Nanjing 210095
  • Received:2021-04-11 Online:2021-09-05 Published:2021-09-28

摘要: [目的/意义]开展面向数字人文的古籍触发动词识别及分类研究,对于古籍文本的深层次挖掘和内容揭示具有重大的意义。本文利用深度学习分类算法,探索依据古籍触发词进行事件句文本多元分类的自动化方法。[方法/过程]在构建了典籍事件触发词分类体系和触发词典的基础上,选取4个不同类别的事件句文本作为实验数据,利用Onehot和Tokenizer对类别标签和句子文本进行分别编码后,输入Bi-LSTM模型中训练分类器,并通过调整参数设置了对比实验,采取通用的评价指标分析了分类器的性能。[结果/结论]经过多次训练和调整之后得到的分类器,在测试集的评估中精确度达到了0.95,证明基于深度学习的实验方法和构建的触发词数据集能够有效的帮助我们实现古籍事件句文本的自动化多元分类。

关键词: 触发词分类, Bi-LSTM模型, 多元分类, 《左传》

Abstract: [Purpose/Significance] It is of great significance to carry out research on the recognition and classification of trigger verbs in ancient books oriented to digital humanities for the deep mining and content revealing of ancient texts. This paper uses the deep learning classification algorithm to explore an automated method for multivariate classification of event sentence text based on trigger words in ancient books. [Method/Process] Based on the construction of the classic event trigger word classification system and trigger dictionary, four different types of event sentence texts are selected as experimental data, and the category labels and sentence texts are coded separately using Onehot and Tokenizer, and then the classifier is trained in the Bi-LSTM model, and a comparative experiment is set by adjusting the parameters, and the performance of the classifier is analyzed by using a general evaluation index. [Results/Conclusions] The classifier after many training and adjustments has an accuracy of 0.95 in the evaluation of the test set, which proves that the experimental method based on deep learning and the constructed trigger word data set can effectively help us realize automatic multivariate classification of event sentence text of ancient books.

Key words: trigger word classification, Bi-LSTM model, multivariate classification, Zuo Zhuan

中图分类号: 

  • G350

引用本文

马晓雯, 何琳, 刘建斌, 李章超, 高丹. 基于Bi-LSTM的古籍事件句触发词分类方法研究[J]. 农业图书情报学报, 2021, 33(9): 27-36.

MA Xiaowen, HE Lin, LIU Jianbin, LI Zhangchao, GAO Dan. The Trigger Verb Classification Method of Event Sentences in Ancient Chinese Classics Based on Bi-LSTM[J]. Journal of Library and Information Science in Agriculture, 2021, 33(9): 27-36.