农业图书情报学报 ›› 2024, Vol. 36 ›› Issue (10): 76-85.doi: 10.13998/j.cnki.issn1002-1248.24-0624

• 研究论文 • 上一篇    下一篇

面向突发事件识别与分类的多模态数据集构建研究

张一帆, 陈祖琴, 葛继科, 何明坤, 谭杰   

  1. 重庆科技大学 计算机科学与工程学院,重庆 401331
  • 收稿日期:2024-08-23 出版日期:2024-10-05 发布日期:2025-03-12
  • 作者简介:

    张一帆(2001- ),女,硕士,研究方向为自然语言处理、多模态学习

    陈祖琴(1981- ),女,博士,教授,研究方向为智能信息处理

    葛继科(1977- ),男,博士,教授,研究方向为信息安全,自然语言处理

    何明坤(2000- ),男,硕士,研究方向为软件安全、人工智能

    谭杰(2001- ),男,硕士,研究方向为自然语言处理、多模态学习

  • 基金资助:
    国家社会科学基金一般项目“大语言模型下基于多维量化分析的网络舆情演化规律挖掘及应用研究”(24BTQ049)

Construction of a Multimodal Dataset for Emergency Event Identification and Classification

Yifan ZHANG, Zuqin CHEN, Jike GE, Mingkun HE, Jie TAN   

  1. Department of Computer Science and Engineering, Chongqing University of Science and Technology, Chongqing 401331
  • Received:2024-08-23 Online:2024-10-05 Published:2025-03-12

摘要:

[目的/意义] 丰富的互联网数据为洞悉突发事件提供了多维视角,基于多模态的突发事件分类方法也因此油然而生,然而现有突发事件多模态数据集不仅稀缺,而且类别不够多样,对相关研究支撑力度不够,很大程度上影响着后续研究的进展。 [方法/过程] 构建一个基于多模态信息的突发事件数据集(MEED),包含事故灾难、公共卫生、社会安全、自然灾害和非突发事件5个类别的数据,并将自然灾害数据划分为地质灾害、生物灾害、干旱灾害、海洋灾害、气象灾害、地震灾害和森林草原火灾7个子类别。 [结果/结论] 利用现有的突发事件分类方法在突发事件公开数据集和MEED上进行分析和验证,结果表明,相较于目前公开的突发事件数据集,MEED有助于多模态模型性能提升10%以上。

关键词: 突发事件, 多模态, 数据集, 深度学习, 数据采集, 数据标注

Abstract:

[Purpose/Significance] Rich Internet data provide a multi-dimensional perspective for understanding emergencies, and multimodal emergency classification methods have emerged. However, the existing multimodal datasets of emergencies are not only scarce, but also lacking in diversity in categories, which is not enough to support related research, and greatly affects the progress of subsequent research. Compared with previous public datasets, the dataset constructed in this paper has richer categories and more improved modalities. This dataset solves the key gaps in the availability and diversity of multimodal datasets of emergencies. It not only expands the category range, but also provides more detailed classification in the natural disaster category, which is crucial for developing robust and accurate multimodal classification models. [Method/Process] An emergency event dataset (MEED) based on multimodal information was constructed, which contains data from five categories: accident disasters, public health, social security, natural disasters, and non-emergency events. The natural disaster data are divided into seven subcategories: geological disasters, biological disasters, drought disasters, marine disasters, meteorological disasters, earthquake disasters, and forest and grassland fires. [Results/Conclusions] The existing emergency classification methods were analyzed and validated on the emergency public dataset and MEED. The results showed that MEED helped improve the performance of multimodal models by more than 10% compared with the currently available emergency datasets. The results show that the improvement in model performance highlights the value of MEED in promoting emergency management and response research and applications. The dataset enables researchers and practitioners to better understand the complexity of emergencies and develop more effective prevention, mitigation, and response strategies. The improvement in model performance also shows that multimodal methods are a promising direction for analyzing emergency events because it leverages the advantages of different types of data to achieve higher accuracy and reliability in classification tasks. The creation of MEED is a major advancement in the field of emergency management, providing researchers with a valuable resource and potentially leading to the development of more sophisticated tools for responding to emergencies. However, the dataset still has certain limitations. Over time, the number of emergencies on the Internet continues to grow, which requires us to continuously update the dataset to adapt to new situations. The size of the dataset largely determines the performance of the classification model. The class imbalance problem of the emergency dataset constructed in this paper needs to be solved. In future research, we will continue to update and maintain the dataset in a timely manner to address these issues.

Key words: incidents, multimodal, dataset, deep learning, data acquisition, data annotations

中图分类号:  TP391

引用本文

张一帆, 陈祖琴, 葛继科, 何明坤, 谭杰. 面向突发事件识别与分类的多模态数据集构建研究[J]. 农业图书情报学报, 2024, 36(10): 76-85.

Yifan ZHANG, Zuqin CHEN, Jike GE, Mingkun HE, Jie TAN. Construction of a Multimodal Dataset for Emergency Event Identification and Classification[J]. Journal of Library and Information Science in Agriculture, 2024, 36(10): 76-85.