中文    English

Journal of Library and Information Science in Agriculture ›› 2023, Vol. 35 ›› Issue (6): 72-82.doi: 10.13998/j.cnki.issn1002-1248.23-0419

Previous Articles     Next Articles

Think-Tank's Text Summarization Based on Combined Keywords and Contrastive Learning Training

CHEN Yuanyuan1, WANG Lei2   

  1. 1. Shanghai Publishing and Printing College, Key Lab of Intelligent and Green Flexographic Printing, Shanghai 200093;
    2. College of Computer Science and Technology, Xinjiang Normal University, Urumqi 830054
  • Received:2023-04-29 Online:2023-06-05 Published:2023-08-02

Abstract: [Purpose/Significance] Think tank reports are professional analysis and policy recommendations provided by independent research institutions, which provide decision support and an important tool for policy makers and the public to promote social progress. The purpose of think tank report summary is to provide readers with a concise and clear overview, so that they can quickly understand the main content and conclusion of the report, so as to improve the efficiency of information screening, dissemination effect and knowledge transfer. At present, there are many differences in the think tank reports, which leads to inaccurate summaries. It is urgent to improve the existing text summarization methods. This paper focuses on the characteristics of think tank reports in the context of multi-topic text summarization technology. [Method/Process] Aiming at the problem that the existing models have poor effect on the summarization of think tank reports, not only the crawler technology was used to construct a think tank report dataset, but also a report summarization method was proposed using the "combined keywords" search method.. First, a keyword extraction algorithm was used to extract the keyword information in the original text. Second, a "combined keywords" search module based on cross-attention mechanism was used to improve the model's ability to capture the topic information in the text and help improve the accuracy of the summary generated by the model. Finally, in order to avoid excessive attention to keywords while ignoring the overall information of a think tank report, a contrastive learning training method was designed in the training process. [Results/Conclusions] The experimental results show that the Rouge-1, Rouge-2 and Rouge-L values of the think tank report summarization model reached 48.23, 32.55 and 42.50, respectively. The summarization model with the "combined keywords" search method proposed in this study can effectively solve the problem of inaccurate summarization caused by multi-topic texts, and the text summarization effect of the model in the field of think tank reports is better than other similar models. In addition, ablation experiments were used to prove the effectiveness of the "combined keywords" search module and contrastive learning training. There are still some shortcomings of this paper. For example, this study does not explore the location and frequency information of keywords. In addition, we will adjust the weight of keywords according to their position, frequency and importance in the text, and further expand the think tank report summary dataset.

Key words: think tank reports, report summary, fused, contrastive learning training

CLC Number: 

  • TP183
[1] 李金鹏, 张闯, 陈小军, 等. 自动文本摘要研究综述[J]. 计算机研究与发展, 2021, 58(1): 1-21.
LI J P, ZHANG C, CHEN X J, et al.Survey on automatic text sum-marization[J]. Journal of computer research and development, 2021, 58(1): 1-21.
[2] LIU C Y, CHEN M S, TSENG C Y.IncreSTS: Towards real-time incremental short text summarization on comment streams from social network services[J]. IEEE transactions on knowledge and data engineering, 2015, 27(11): 2986-3000.
[3] MARUJO L, LING W, RIBEIRO R, et al.Exploring events and distributed representations of text in multi-document summarization[J]. Knowledge-based systems, 2016, 94: 33-42.
[4] CODINA-FILBà J, BOUAYAD-AGHA N, BURGA A, et al.Using genre-specific features for patent summaries[J]. Information processing & management, 2017, 53(1): 151-174.
[5] LóPEZ CONDORI R E, SALGUEIRO PARDO T A. Opinion summarization methods: Comparing and extending extractive and abstractive approaches[J]. Expert systems with applications, 2017, 78: 124-134.
[6] MISHRA R, BIAN J T, FISZMAN M, et al.Text summarization in the biomedical domain: A systematic review of recent research[J]. Journal of biomedical informatics, 2014, 52: 457-467.
[7] 黄晓斌, 张明鑫. 新技术环境下的智库情报服务创新研究[J]. 图书与情报, 2020(1): 112-119.
HUANG X B, ZHANG M X.Innovations of think tank information services in new technological environment[J]. Library & information, 2020(1): 112-119.
[8] 李程. 基于人工智能技术的新型出版智库建设浅析[J]. 出版广角, 2020(4): 6-10.
LI C.Analysis on the construction of new publishing think tank based on artificial intelligence technology[J]. View on publishing, 2020(4): 6-10.
[9] KUPIEC J, PEDERSEN J, CHEN F.A trainable document summarizer[C]//Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 1995: 68-73.
[10] NALLAPATI R, ZHAI F F, ZHOU B W.SummaRuNNer: A recurrent neural network based sequence model for extractive summarization of documents[C]//Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. New York: ACM, 2017: 3075-3081.
[11] ZHOU Q Y, YANG N, WEI F R, et al.Neural document summarization by jointly learning to score and select sentences[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA, USA: Association for Computational Linguistics, 2018, 1:654-663.
[12] 王红斌, 金子铃, 毛存礼. 结合层级注意力的抽取式新闻文本自动摘要[J]. 计算机科学与探索, 2022, 16(4): 877-887.
WANG H B, JIN Z L, MAO C L.Extractive news text automatic summarization combined with hierarchical attention[J]. Journal of frontiers of computer science and technology, 2022, 16(4): 877-887.
[13] SUTSKEVER I, VINYALS O, LE Q V.Sequence to sequence learning with neural networks[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. New York: ACM, 2014: 3104-3112.
[14] K, MARASEK K.Neural-based machine translation for medical text domain based on European medicines agency leaflet texts[J]. Procedia computer science, 2015, 64: 2-9.
[15] YAO Z J, CHEN A X, XIE H.Chinese long text summarization using improved sequence-to-sequence lstm[J]. Journal of physics: Conference series, 2020, 1550(3): 032162.
[16] JUNG C, YOON W C, DATTA R, et al.Knowledge base driven automatic text summarization using multi-objective optimization[J]. International journal of advanced computer science and applications, 2021, 12(8): 836-849.
[17] SUMALATHA B, BULUSU V V.BERT tokenization and hybrid-op-timized deep recurrent neural network for Hindi document summa-rization[J]. International journal of fuzzy system applications, 2022, 11(1): 1-28.
[18] SHEHAB A, AHMED R.Performance study on extractive text summarization using BERT models[J]. Information, 2022, 13(2): 67-71.
[19] ZHANG J Q, ZHAO Y, SALEH M, et al.PEGASUS: Pre-training with extracted gap-sentences for abstractive summarization[C]//Proceedings of the 37th International Conference on Machine Learning. New York: ACM, 2020: 11328-11339.
[20] TSVIGUN A, LYSENKO I, SEDASHOV D, et al. Active learning for abstractive text summarization[EB/OL]. arXiv:2301.03252, 2023. https://arxiv.org/abs/2301.03252.
[21] HU B T, CHEN Q C, ZHU F Z. LCSTS: A large scale Chinese short text summarization dataset[EB/OL]. arXiv:1506.05865, 2015. https://arxiv.org/abs/1506.05865.
[22] MIHALCEA R, TARAU P.Textrank: Bringing order into text[C]//Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, 2004: 404-411.
[23] 宁珊, 严馨, 徐广义, 等. 融合关键词的中文新闻文本摘要生成[J]. 计算机工程与科学, 2020, 42(12): 2265-2272.
NING S, YAN X, XU G Y, et al.Chinese news text abstractive summarization with keywords fusion[J]. Computer engineering & science, 2020, 42(12): 2265-2272.
[24] ZHU Y Q, ZHAO P, MU X D, et al.Chinese text summarization generation algorithm based on ERNIE-GEN with few-shot learning[J]. Journal of physics: Conference series, 2021, 1961(1): 012051.
[25] ROUGE L C Y. A package for automatic evaluation of summaries[C]//Proceedings of Workshop on Text Summarization of ACL, Spain, 2004.
[26] 祝超群, 彭艳兵. 基于集成学习的文本摘要抽取方法研究[J]. 计算机与数字工程, 2022, 50(7): 1540-1544, 1592.
ZHU C Q, PENG Y B.Research of text summarization extractive method based on ensemble learning[J]. Computer & digital engi-neering, 2022, 50(7): 1540-1544, 1592.
[27] SEE A, LIU P J, MANNING C D.Get to the point: Summarization with pointer-generator networks[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA, USA: Association for Computational Linguistics, 2017: 1073-1083.
[28] XI X F, PI Z, ZHOU G D.Global encoding for long Chinese text summarization[J]. ACM transactions on Asian and low-resource language information processing, 19(6): 1-17.
[29] 高巍, 马辉, 李大舟, 等. 基于双编码器的中文文本摘要技术的研究与实现[J]. 计算机工程与设计, 2021, 42(9): 2687-2695.
GAO W, MA H, LI D Z, et al.Research and implementation of Chinese text abstract technology based on double encoder[J]. Computer engineering and design, 2021, 42(9): 2687-2695.
[30] LIU Z, LIN W, SHI Y, et al.A robustly optimized BERT pre-train-ing approach with post-training[C]//China national conference on Chinese computational linguistics. Cham: Springer, 2021: 471-484.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!