农业图书情报学报 ›› 2023, Vol. 35 ›› Issue (12): 49-59.doi: 10.13998/j.cnki.issn1002-1248.23-0813

• 研究论文 • 上一篇    下一篇

基于深度学习的网络水军识别技术研究

张霁阳1,2, 张鹏1,*, 龚泗羽3, 宋乃鹏1   

  1. 1.中国人民警察大学 网络舆情研究中心,廊坊 065000;
    2.宁波出入境边防检查站,宁波 315040;
    3.宜昌市公安局夷陵区分局,宜昌 443100
  • 收稿日期:2023-12-27 出版日期:2023-12-05 发布日期:2024-04-07
  • 通讯作者: *张鹏(1981- ),男,博士,副教授,硕士生导师,研究方向为网络危机管理。Email:zhangpeng@cppu.edu.cn
  • 作者简介:张霁阳(1999- ),男,硕士,研究方向为网络舆情研究。龚泗羽(2001- ),女,本科,一级警员,研究方向为公安情报。宋乃鹏(2003- ),男,本科,研究方向为数据警务技术
  • 基金资助:
    教育部人文社会科学研究基金“智媒体时代网络舆情风险异常识别与治理研究”(23YJCZH131); 警察大学科研重点专项课题“人工智能背景下网络虚假信息识别与治理对策研究”(ZDZX202201)

Online Social Spammer Detection Based on Deep Learning

ZHANG Jiyang1,2, ZHANG Peng1,*, GONG Siyu3, SONG Naipeng1   

  1. 1. Research Center for Network Public Opinion Governance, China People's Police University, Langfang 065000;
    2. Ningbo Entry Exit Border Inspection Station, Ningbo 315040;
    3. Yiling Branch of Yichang Public Security Bureau, Yichang 443100
  • Received:2023-12-27 Online:2023-12-05 Published:2024-04-07

摘要: [目的/意义]互联网的发展带动了社交网络的快速发展,为用户提供了一个方便的信息发布、传播和接受的渠道,但其低门槛的特性也催生了一批灰黑色力量——网络水军,他们传递虚假信息,破坏网络秩序,成为互联网生态中的一大问题。[方法/过程]本研究提出了一种基于深度学习的网络水军识别模型,结合用户的基础信息、历史言论、交互行为3方面特征,并加入了“社交亲密度”属性,通过特征提取与向量融合,利用卷积神经网络构建起水军识别模型。[结果/结论]通过实证分析与模型对比,实验构建的模型在精确率、准确率等指标均取得了较好的效果,可以为网络水军识别提供一定技术支持与理论指导。实验表明,利用机器学习方法主动识别网络水军账号,对重点账号进行实时监管与事前防范,可以更加及时有效地避免恶性网络事件发生,降低非法势力破坏舆情生态的风险。

关键词: 网络水军, 深度学习, 卷积神经网络, 词袋模型

Abstract: [Purpose/Significance] The development of the Internet has led to the rapid development of social networks, providing users with a convenient channel for the release, dissemination and acceptance of information. However, its low-threshold characteristics have also given rise to a group of the "Internet water army"--online social spammers, who are paid to post online comments with particular content and spread false information on purpose. They have become a major problem for the Internet ecology. It is of great significance to detect the Internet water army, prevent their malicious attacks, and combat and eliminate their negative effects on the security of the online public opinion. [Method/Process] First, we analyzed the development process and characteristics of the online social spammers, summarized the algorithms used in previous studies and the characteristics mentioned, and sorted out three research starting points: text features, interaction features and graph structure features. Then, an online social spammer detection method based on deep learning was proposed. Combined with the three aspects of user basic information, historical remarks and interaction behavior, six types of features were extracted from the basic information, recent remarks, social intimacy, interaction behavior, microblog number and membership level. Through feature depth extraction and vector splicing and fusion, the user feature vectors were formed with the same length. Finally, a convolutional neural network was used as a classifier to build an automatic, high-precision and high-efficiency spammer detection model. Two Chinese online spammer datasets collected from the Sina Weibo platform were selected for the experiment. The features of the datasets were spliced and aligned to form the Weibo Spammer 2023 dataset as the model training dataset, which prevented the data features of a single dataset from being too discrete and reducing modle generalization. Considering the overfitting problem in the model training process, we solved the problem by adding abandoned layers. [Results/Conclusions] The online spammer detection model constructed in this experiment has significantly improved in terms of metrics such as precision and accuracy. At the same time, the ablation experiment shows that the six features extracted in this experiment have a positive effect on the detection process. Through empirical analysis, the model constructed in this paper has a high detection accuracy and detection efficiency, which can provide certain technical support and theoretical guidance for online spammer identification. By using machine learning methods to actively identify online social spammer accounts, real-time monitoring and prevention of key spammer accounts can prevent the occurrence of malicious network events more timely and effectively and reduce the risk of illegal forces damaging the public opinion ecology.

Key words: network spammer, deep learning, CNN, bag-of-words

中图分类号:  G356.8

引用本文

张霁阳, 张鹏, 龚泗羽, 宋乃鹏. 基于深度学习的网络水军识别技术研究[J]. 农业图书情报学报, 2023, 35(12): 49-59.

ZHANG Jiyang, ZHANG Peng, GONG Siyu, SONG Naipeng. Online Social Spammer Detection Based on Deep Learning[J]. Journal of Library and Information Science in Agriculture, 2023, 35(12): 49-59.