农业图书情报学报 ›› 2023, Vol. 35 ›› Issue (6): 51-59.doi: 10.13998/j.cnki.issn1002-1248.23-0288

• 研究论文 • 上一篇    下一篇

大型预训练语言模型在网络健康信息鉴别中的应用探讨

王超1, 孔祥辉2,*   

  1. 1.辽宁工业大学 图书馆,锦州 121000;
    2.锦州医科大学 图书馆,锦州 121000
  • 收稿日期:2023-05-08 出版日期:2023-06-05 发布日期:2023-08-02
  • 通讯作者: * 孔祥辉(1987- ),男,硕士,馆员,锦州医科大学图书馆,研究方向为信息服务。Email:741797025@qq.com
  • 作者简介:王超(1983- ),男,硕士,副研究馆员,辽宁工业大学图书馆,研究方向为智能计算与图书馆信息化建设
  • 基金资助:
    2022年度辽宁省社会科学规划基金青年项目“可重复性危机视域下医学院校图书馆数据素养教育研究”(L22CTQ004)

Application of Large-scale Pre-Training Language Model in Network Health Information Identification

WANG Chao1, KONG Xianghui2,*   

  1. 1. Library of Liaoning University of Technology, Jinzhou 121000;
    2. Library of Jinzhou Medical University, Jinzhou 121000
  • Received:2023-05-08 Online:2023-06-05 Published:2023-08-02

摘要: [目的/意义]探讨ChatGPT等大规模预训练语言模型在网络健康信息识别中的应用效果,为人工智能在健康信息领域的应用提供参考。[方法/过程]以国内某权威辟谣平台与健康相关的信息为研究对象,使用“ChatGPT”和“讯飞星火”对其真实性进行鉴定,对其性能进行评估,并将鉴定结果与医学专家或权威机构的鉴定结果进行比较。[结果/结论]ChatGPT和讯飞星火的鉴别准确率分别为93.9%和92.9%,F1值分别为0.951和0.946,应用效果良好。两者生成的解释文本内容比较详细,语言比较流畅,文本长度和语义相似度与专家文本高度接近,但对个别信息的解释仍存在科学依据不够详细、逻辑错误等问题。实验结果表明,大规模预训练语言模型在辅助网络健康信息识别任务方面具有一定的优势,但仍需要人工干预以保证结果的准确性和可靠性。

关键词: 人工智能, 健康信息, 鉴别, ChatGPT

Abstract: [Purpose/Significance] Taking the popular "chat robot" ChatGPT and the recently launched similar product "iFLYTEK Spark" as the research object, this paper explores their applications in the identification of online health information, and discusses their advantages and disadvantages, in order to provide reference for the large-scale pre-training language model in the field of health information identification. Based on the review of relevant literature on online health information authentication, deep learning models have been widely applied in the task of online health information authentication in recent years. With the rapid development of large pre-training language models such as ChatGPT, it is a novel idea to explore their discriminating ability in online health information. [Method/Process] Researchers selected health-related information from the most authoritative rumor-refuting websites in China, used "ChatGPT" and "iFLYTEK Spark" to verify the authenticity of the online health information, evaluated their performance, and compared their identification results with the expert identification results. The identification accuracy of ChatGPT and iFLYTEK Spark language model was 93.9% and 92.9%, respectively, and the F1 value was 0.951 and 0.946, respectively, which had a good application effect. The generated explanatory texts were more detailed and the language was relatively smooth. In terms of the length and dispersion of the explanatory text, ChatGPT is closer to that of medical experts, while iFLYTEK Spark's explanatory text is relatively long and less discrete. In terms of semantic similarity, ChatGPT and iFLYTEK Spark were almost equal in performance, and their understanding of health information was close to that of human experts to some extent. Through the analysis of typical samples, it can be seen that an AI large model cannot accurately identify news or emergency information for the time being, and the understanding of individual health propositions with complex semantics will occasionally be biased. [Results/Conclusions] The experimental results show that ChatGPT and iFLYTEK Spark have good discriminative effect in the field of online health information identification, but there are shortcomings, and manual intervention is needed to ensure the accuracy and reliability of the results. In the future, in the field of AI large model research, researchers are suggested to attach importance to the construction and application of high-quality corpora in vertical fields. In the field of online health information identification, practitioners can use models such as ChatGPT as tools to help identify and refine health information. There are also limitations in this article. For example, the amount of data involved in the test is not large enough, ChatGPT uses GPT3.5 model, and the online application time of iFLYTEK Spark language model is relatively short. In future studies, the amount of online health information can be further increased, and the updated version of an AI large model can be tested and evaluated.

Key words: artificial intelligence, health information, identification, ChatGPT

中图分类号: 

  • G252

引用本文

王超, 孔祥辉. 大型预训练语言模型在网络健康信息鉴别中的应用探讨[J]. 农业图书情报学报, 2023, 35(6): 51-59.

WANG Chao, KONG Xianghui. Application of Large-scale Pre-Training Language Model in Network Health Information Identification[J]. Journal of Library and Information Science in Agriculture, 2023, 35(6): 51-59.