农业图书情报学报 ›› 2023, Vol. 35 ›› Issue (6): 72-82.doi: 10.13998/j.cnki.issn1002-1248.23-0419

• 研究论文 • 上一篇    下一篇

基于融合关键词和对比学习训练的生成式报告摘要模型研究——以中国智库报告为例

陈媛媛1, 王磊2   

  1. 1.上海出版印刷高等专科学校 国家新闻出版署智能与绿色柔板印刷重点实验室,上海 200093;
    2.新疆师范大学 计算机科学技术学院,乌鲁木齐 830054
  • 收稿日期:2023-04-29 出版日期:2023-06-05 发布日期:2023-08-02
  • 作者简介:陈媛媛(1977- ),女,硕士生导师,研究方向为智库建设与评价、网络计量等研究。王磊(1996- ),男,硕士研究生,研究方向为智库研究、自然语言处理
  • 基金资助:
    2019年度国家社会科学基金项目“美国著名智库网络影响力研究及其启示研究”(19BTQ067)

Think-Tank's Text Summarization Based on Combined Keywords and Contrastive Learning Training

CHEN Yuanyuan1, WANG Lei2   

  1. 1. Shanghai Publishing and Printing College, Key Lab of Intelligent and Green Flexographic Printing, Shanghai 200093;
    2. College of Computer Science and Technology, Xinjiang Normal University, Urumqi 830054
  • Received:2023-04-29 Online:2023-06-05 Published:2023-08-02

摘要: [目的/意义]智库报告是由独立研究机构提供的专业分析和政策建议,为决策者和公众提供决策支持和促进社会进步的重要工具。智库报告摘要可以为读者提供一个简洁明了的概述,以便他们能够快速了解报告的主要内容和结论,从而提高信息筛选、传播效果和知识传递的效率。目前的智库报告存在主题差异大导致生成摘要不准确的问题,亟需对现有的文本摘要方法进行改进。[方法/过程]针对上述问题,本文提出了一种融合关键词的智库报告摘要模型。首先使用关键词抽取算法抽取原文中的关键词信息,其次提出一种基于交叉注意力机制的融合关键词模块用于提升模型对于主题信息的感知能力,并且在训练的过程中使用了对比学习训练。[结果/结论]结果表明,融合关键词的摘要模型在智库报告文本摘要任务中的Rouge-1、Rouge-2和Rouge-L值优于其他文本摘要模型,分别达到了48.23、32.55和42.50,在智库报告文本摘要任务上具有更优的摘要效果。

关键词: 智库报告, 报告摘要, 融合, 对比学习训练

Abstract: [Purpose/Significance] Think tank reports are professional analysis and policy recommendations provided by independent research institutions, which provide decision support and an important tool for policy makers and the public to promote social progress. The purpose of think tank report summary is to provide readers with a concise and clear overview, so that they can quickly understand the main content and conclusion of the report, so as to improve the efficiency of information screening, dissemination effect and knowledge transfer. At present, there are many differences in the think tank reports, which leads to inaccurate summaries. It is urgent to improve the existing text summarization methods. This paper focuses on the characteristics of think tank reports in the context of multi-topic text summarization technology. [Method/Process] Aiming at the problem that the existing models have poor effect on the summarization of think tank reports, not only the crawler technology was used to construct a think tank report dataset, but also a report summarization method was proposed using the "combined keywords" search method.. First, a keyword extraction algorithm was used to extract the keyword information in the original text. Second, a "combined keywords" search module based on cross-attention mechanism was used to improve the model's ability to capture the topic information in the text and help improve the accuracy of the summary generated by the model. Finally, in order to avoid excessive attention to keywords while ignoring the overall information of a think tank report, a contrastive learning training method was designed in the training process. [Results/Conclusions] The experimental results show that the Rouge-1, Rouge-2 and Rouge-L values of the think tank report summarization model reached 48.23, 32.55 and 42.50, respectively. The summarization model with the "combined keywords" search method proposed in this study can effectively solve the problem of inaccurate summarization caused by multi-topic texts, and the text summarization effect of the model in the field of think tank reports is better than other similar models. In addition, ablation experiments were used to prove the effectiveness of the "combined keywords" search module and contrastive learning training. There are still some shortcomings of this paper. For example, this study does not explore the location and frequency information of keywords. In addition, we will adjust the weight of keywords according to their position, frequency and importance in the text, and further expand the think tank report summary dataset.

Key words: think tank reports, report summary, fused, contrastive learning training

中图分类号: 

  • TP183

引用本文

陈媛媛, 王磊. 基于融合关键词和对比学习训练的生成式报告摘要模型研究——以中国智库报告为例[J]. 农业图书情报学报, 2023, 35(6): 72-82.

CHEN Yuanyuan, WANG Lei. Think-Tank's Text Summarization Based on Combined Keywords and Contrastive Learning Training[J]. Journal of Library and Information Science in Agriculture, 2023, 35(6): 72-82.