中文    English

Journal of Library and Information Science in Agriculture ›› 2024, Vol. 36 ›› Issue (10): 76-85.doi: 10.13998/j.cnki.issn1002-1248.24-0624

Previous Articles     Next Articles

Construction of a Multimodal Dataset for Emergency Event Identification and Classification

Yifan ZHANG, Zuqin CHEN, Jike GE, Mingkun HE, Jie TAN   

  1. Department of Computer Science and Engineering, Chongqing University of Science and Technology, Chongqing 401331
  • Received:2024-08-23 Online:2024-10-05 Published:2025-03-12

Abstract:

[Purpose/Significance] Rich Internet data provide a multi-dimensional perspective for understanding emergencies, and multimodal emergency classification methods have emerged. However, the existing multimodal datasets of emergencies are not only scarce, but also lacking in diversity in categories, which is not enough to support related research, and greatly affects the progress of subsequent research. Compared with previous public datasets, the dataset constructed in this paper has richer categories and more improved modalities. This dataset solves the key gaps in the availability and diversity of multimodal datasets of emergencies. It not only expands the category range, but also provides more detailed classification in the natural disaster category, which is crucial for developing robust and accurate multimodal classification models. [Method/Process] An emergency event dataset (MEED) based on multimodal information was constructed, which contains data from five categories: accident disasters, public health, social security, natural disasters, and non-emergency events. The natural disaster data are divided into seven subcategories: geological disasters, biological disasters, drought disasters, marine disasters, meteorological disasters, earthquake disasters, and forest and grassland fires. [Results/Conclusions] The existing emergency classification methods were analyzed and validated on the emergency public dataset and MEED. The results showed that MEED helped improve the performance of multimodal models by more than 10% compared with the currently available emergency datasets. The results show that the improvement in model performance highlights the value of MEED in promoting emergency management and response research and applications. The dataset enables researchers and practitioners to better understand the complexity of emergencies and develop more effective prevention, mitigation, and response strategies. The improvement in model performance also shows that multimodal methods are a promising direction for analyzing emergency events because it leverages the advantages of different types of data to achieve higher accuracy and reliability in classification tasks. The creation of MEED is a major advancement in the field of emergency management, providing researchers with a valuable resource and potentially leading to the development of more sophisticated tools for responding to emergencies. However, the dataset still has certain limitations. Over time, the number of emergencies on the Internet continues to grow, which requires us to continuously update the dataset to adapt to new situations. The size of the dataset largely determines the performance of the classification model. The class imbalance problem of the emergency dataset constructed in this paper needs to be solved. In future research, we will continue to update and maintain the dataset in a timely manner to address these issues.

Key words: incidents, multimodal, dataset, deep learning, data acquisition, data annotations

CLC Number: 

  • TP391

Table 1

Existing single-modal and multi-modal emergency event datasets"

数据集分类 数据集名称 标签数量/个
单模态

CEC-Corpus

CEEC-Corpus

DuEE1.0

HumAID

TREC

5

6

8

5

5

多模态 CrisisMMD 4

Fig.1

MEED construction process"

Table 2

Analysis of the number of MEED events"

类别名称 数量/个
事故灾难 4 619
公共卫生 4 271
社会安全 1 132
自然灾害 1 822
非突发事件 3 337

Table 3

Fine-grained quantitative analysis of natural disasters"

类别名称 数量/个
地质灾害 343
生物灾害 34
干旱灾害 6
海洋灾害 14
气象灾害 643
地震灾害 668
森林草原火灾 114

Table 4

Comparison between MEED and existing multimodal emergency event datasets"

数据集 突发事件/个 非突发事件/个 类别数量/个
CrisisMMD 12 043 0 4
MEED 11 844 3 337 5

Table 5

Parameter setting for different emergency classification methods"

分类方法 批量大小 优化器
VGG-16 10 Adam
TextCNN 128 Adam
BERT-base 128 Adam
TextCNN + VGG16 10 Adam
BERT+Vit 10 Adam

Table 6

Detection effects of various emergency classification methods on different datasets"

模型 MEED CrisisMMD
Accuracy F1-Score Accuracy F1-Score
VGG-16 0.856 0.855 0.833 0.832
TextCNN 0.858 0.860 0.808 0.809
BERT-base 0.966 0.957 0.852 0.891
TextCNN + VGG16 0.973 0.967 0.844 0.842
BERT+Vit 0.979 0.973 0.851 0.853

Fig.2

Comparison of indicators of different classification methods on MEED and CrisisMMD"

Fig.3

Evaluation of BERT+Vit multimodal classification method on MEED"

1
LIU Y F, NIU J W, ZHAO Q J, et al. A novel text classification method for emergency event detection on social media[C]//2018 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation(SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI). Guangzhou, China. IEEE, 2018: 1106-1111.
2
陈国兰. 基于爆发词识别的微博突发事件监测方法研究[J]. 情报杂志, 2014, 33(9): 123-128.
CHEN G L. Micro-blog emergencies detection approach based on burst words distinguishing[J]. Journal of intelligence, 2014, 33(9): 123-128.
3
张馨月, 宋绍成. 突发事件中基于支持向量机算法的文本分类研究[J]. 信息技术与信息化, 2022(8): 13-16.
ZHANG X Y, SONG S C. Research on text classification based on support vector machine algorithm in emergencies[J]. Information technology and informatization, 2022(8): 13-16.
4
闫宏丽, 罗永莲. 基于决策树方法的突发事件新闻分类[J]. 电子技术与软件工程, 2020(2): 194-195.
YAN H L, LUO Y L. Classification of emergency news based on decision tree method[J]. Electronic technology & software engineering, 2020(2): 194-195.
5
LAI S W, XU L H, LIU K, et al. Recurrent convolutional neural networks for text classification[C]//Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. Austin, Texas. ACM, 2015: 2267-2273.
6
ZHOU B, ZOU L, MOSTAFAVI A, et al. VictimFinder: Harvesting rescue requests in disaster response from social media with BERT[J]. Computers, environment and urban systems, 2022, 95: 101824.
7
胡庭恺, 陈祖琴, 葛继科, 等. 开放领域新闻中基于自适应决策边界的突发事件识别和分类研究[J]. 情报理论与实践, 2023, 46(2): 194-200.
HU T K, CHEN Z Q, GE J K, et al. Research on the recognition and classification of emergency events based on adaptive decision boundaries in open domain news[J]. Information studies: Theory & application, 2023, 46(2): 194-200.
8
范昊, 何灏. 融合上下文特征和BERT词嵌入的新闻标题分类研究[J]. 情报科学, 2022, 40(6): 90-97.
FAN H, HE H. News title classification based on contextual features and BERT word embedding[J]. Information science, 2022, 40(6): 90-97.
9
宋英华, 吕龙, 刘丹. 基于组合深度学习模型的突发事件新闻识别与分类研究[J]. 情报学报, 2021, 40(2): 145-151.
SONG Y H, LYU L, LIU D. Study on identification and classification of emergency news based on the combined deep learning model[J]. Journal of the China society for scientific and technical information, 2021, 40(2): 145-151.
10
陈锟, 裴雷, 范涛. 基于多模态融合的突发事件分类研究[J]. 现代情报, 2023, 43(6): 24-34.
CHEN K, PEI L, FAN T. Research on emergency classification based on multimodal fusion[J]. Journal of modern information, 2023, 43(6): 24-34.
11
周红磊, 张海涛, 栾宇, 等. 基于文本—图像增强的突发事件识别及分类方法研究[J]. 情报理论与实践, 2024, 47(4): 181-188.
ZHOU H L, ZHANG H T, LUAN Y, et al. Research on emergencies identification and classification method based on text-image enhancement[J]. Information studies: Theory & application, 2024, 47(4): 181-188.
12
DODDINGTON G R, MITCHELL A, PRZYBOCKI M A, et al. The automatic content extraction (ACE) program-tasks, data, and evaluation[C]//2004 Fourth International Conference on Language Resources and Evaluation, Portugal, 2004. The European Language Resources Association (ELRA): LREC, 2004, 2(1): 837-840.
13
MIRZA P, SPRUGNOLI R, TONELLI S, et al. Annotating causality in the TempEval-3 corpus[C]//Proceedings of the EACL 2014 Workshop on Computational Approaches to Causality in Language (CAtoCL). Gothenburg, Sweden. Stroudsburg, PA, USAACL, 2014: 10-19.
14
LI X Y, LI F Y, PAN L, et al. DuEE: A large-scale dataset for Chinese event extraction in real-world scenarios[M]//Natural Language Processing and Chinese Computing. Cham: Springer International Publishing, 2020: 534-545.
15
ALAM F, QAZI U, IMRAN M, et al. HumAID: Human-annotated disaster incidents data from twitter with deep learning benchmarks[J]. Proceedings of the international AAAI conference on web and social media, 2021, 15: 933-942.
16
PALEN L, VIEWEG S, SUTTON J, et al. Crisis informatics: Studying crisis in a networked world[C]//Proceedings of the Third International Conference on E-Social Science, Ann Arbor, Michigan, 2007. United States: ConnectivIT Lab & the Natural Hazards Center University of Colorado, Boulder, 2007: 7-9.
17
OFLI F, ALAM F, IMRAN M. Analysis of social media data using multimodal deep learning for disaster response[J/OL]. arXiv:2004.11838, 2020.
18
JIN Z W, CAO J, GUO H, et al. Multimodal fusion with recurrent neural networks for rumor detection on microblogs[C]//Proceedings of the 25th ACM International Conference on Multimedia. Mountain View, California, USA. ACM, 2017: 795-816.
19
国家质量监督检验检疫总局, 中国国家标准化管理委员会. 突发事件分类与编码: GB/T 35561-2017 [S]. 北京: 中国标准出版社, 2018.
General Administration of Quality Supervision, Inspection and Quarantine of the People's Republic of China, Standardization Administration of the People's Republic of China. Emergency classification and coding: GB/T 35561-2017 [S]. Beijing: Standards Press of China, 2018.
20
国家质量监督检验检疫总局, 中国国家标准化管理委员会. 自然灾害分类与代码: GB/T 28921-2012 [S]. 北京: 中国标准出版社, 2013.
General Administration of Quality Supervision, Inspection and Quarantine of the People's Republic of China, Standardization Administration of the People's Republic of China. Classification and codes for natural disasters: GB/T 28921-2012 [S]. Beijing: Standards Press of China, 2013.
21
DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, 2019. North American: ACL, 2019: 4171-4186.
22
ZHANG Y, WALLACE B. A sensitivity analysis of (and practitioners' guide to) convolutional neural networks for sentence classification[EB/OL]. 2015: 1510.03820.
[1] WANG Sili, ZHANG Ling, YANG Heng, LIU Wei. Review of Deep Learning for Language Modeling [J]. Journal of Library and Information Science in Agriculture, 2023, 35(8): 4-18.
[2] LIU Nanzhu, CUI Yunpeng, WANG Mo. Construction and Application of Semantic Retrieval Model for Ancient Agricultural Literature [J]. Journal of Library and Information Science in Agriculture, 2023, 35(7): 52-62.
[3] LI Jiao, ZHAO Ruixue, XIAN Guojian, HUANG Yongwen, SUN Tan. Research Advances in Argument Mining [J]. Journal of Library and Information Science in Agriculture, 2023, 35(6): 16-28.
[4] LU Lina, YU Xiao. Recognition and Classification of Deep Learning in Soybean Leaf Image Data Management [J]. Journal of Library and Information Science in Agriculture, 2023, 35(2): 87-94.
[5] ZHANG Jiyang, ZHANG Peng, GONG Siyu, SONG Naipeng. Online Social Spammer Detection Based on Deep Learning [J]. Journal of Library and Information Science in Agriculture, 2023, 35(12): 49-59.
[6] HOU Xiangying, CUI Yunpeng, LIU Juan. Applications and Prospect Analysis of Deep Learning in Plant Genomics and Crop Breeding [J]. Journal of Library and Information Science in Agriculture, 2022, 34(8): 4-18.
[7] SHI Yunlai, CUI Yunpeng, DU Zhigang. A Classification Method of Agricultural News Text Based on BERT and Deep Active Learning [J]. Journal of Library and Information Science in Agriculture, 2022, 34(8): 19-29.
[8] MAO Jin, CHEN Ziyang. A Deep Learning Based Approach to Structural Function Recognition of Scientific Literature Abstracts [J]. Journal of Library and Information Science in Agriculture, 2022, 34(3): 15-27.
[9] LYU Lucheng, HAN Tao. Artificial Intelligence Empowers Library and Information Service ——Review of Forums about Information Technology for Library 2019 [J]. Journal of Library and Information Science in Agriculture, 2020, 32(5): 13-18.
[10] WANG Xuejing. Research on Intelligent Service Mode of Digital Library Based on Deep Learning Technology [J]. , 2018, 30(9): 150-153.
[11] DI Yafei. Investigation and Research on Linked Open Datasets in Related Field of Geographic [J]. , 2017, 29(9): 70-74.
[12] XIANG Yu. Research on Heterogeneous File resources data and its Acquisition of University [J]. , 2015, 27(6): 18-21.
[13] YAN Xue, OU yang Haiying, ZENG Shou-ying,GE Chang-shui, TANG Lin, SHAO Ping,CHEN Bai-song. On the Data Acquisition and Cleaning Preparation for Bibliometric Analysis:a case study of the Chinese journal papers of CAFS [J]. , 2014, 26(4): 36-40.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!