中文    English

Journal of library and information science in agriculture

   

Factors Influencing the Communication Effectiveness of Intangible Cultural Heritage Short Videos: A Multimodal Machine Learning Approach

LIU Yihan, CHU Yuxia, ZHAI Yujia   

  1. Tianjin Normal University, Tianjin 300387
  • Received:2025-10-17 Online:2025-12-17

Abstract:

[Purpose/Significance] Short video platforms have become the core arena for the digital presentation and dissemination of intangible cultural heritage (ICH). However, the "Matthew Effect" in the digital attention economy often causes high-quality ICH content to be submerged. Existing research predominantly suffers from "modal segmentation," focusing on single modalities such as text and visuals in isolation, which fails to explain how these elements synergistically drive user engagement. To address this gap, this study constructs a communication effect evaluation model based on multimodal machine learning. The innovation of this research lies in integrating computational communication methods with traditional persuasion theories, moving beyond simple content analysis to a quantifiable predictive framework. By identifying key influencing factors through data fusion, this study provides a scientific basis for optimizing the digital production strategies of the ICH content, offering significant value for enhancing the visibility of traditional culture and overcoming the barriers of digital dissemination. [Method/Process] This study integrates the elaboration likelihood model (ELM) and media ritual theory to establish a "cognitive-behavioral-cultural" dual-path analytical framework. Theoretically, the study maps content quality (video/audio/text) to the "Central Route" and source credibility (author attributes) to the "Peripheral Route." Empirically, focusing on ICH videos on Douyin as the subject, the study collected data from May 2024 to May 2025. After rigorous cleaning, a dataset of 2,869 valid samples was established. The study employs a multimodal feature engineering approach: visual and textual features are extracted to represent content quality; audio features (including FBank and MFCC) are processed using the OpenSMILE toolkit to capture prosodic and spectral characteristics; and author data are collected to quantify social influence. The Random Forest algorithm is utilized to fuse these heterogeneous data sources, analyze feature importance, and predict communication effectiveness. [Results/Conclusions] The empirical results demonstrate that the multimodal fusion model significantly outperforms single-modality approaches in predicting communication effects, confirming that ICH dissemination is a result of complex symbol interaction. Feature importance analysis reveals a distinct hierarchy: Author attributes make the highest contribution, indicating that the "Peripheral Route" - driven by the creator's social capital - is the decisive factor in determining communication heat. Its persuasive power far surpasses that of the content itself. Regarding content modalities, text and video follow in importance, serving as critical tools for user retention, while the audio modality holds supplementary semantic value by setting the emotional atmosphere. The study does not account for dynamic temporal changes or external trending events. Effective ICH dissemination requires a synergistic strategy: prioritizing the accumulation of the author's social influence as the core driver, while simultaneously optimizing visual and textual quality. Future research should incorporate time-series analysis to capture dynamic communication trends.

Key words: information diffusion, Sora, short video communication effect, intangible cultural heritage (ICH), multimodal data, feature engineering

CLC Number: 

  • G206

Fig.1

Theoretical analysis framework of factors influencing the dissemination effectiveness of intangible cultural heritage short videos"

Table 1

Metadata for intangible cultural heritage short videos"

标题 话题 作者 作者抖音号 发布时间 粉丝/个 关注数/个
永子原来是滴出来的 永子围棋;很高兴认识你;非遗传承;非遗技艺 江寻千(9月) JYjiuyue

2024/6/23

14:08:00

22 170 764 215
曾黎身着7米非遗剪纸裙惊艳亮相,设计者是河北非遗代表性传承人李娜 非遗;传统文化 河北日报

2025/1/12

14:35:00

4 465 473 475
千年前的木偶站上村晚舞台 主播办村晚;非遗 芒果妈妈 mangmama

2025/1/12

15:00:00

6 774 743 925
我为非遗举大旗!螺钿工艺真的可以! 李子柒;飞亚达小金表;螺钿工艺;非遗传承 楠瓜娱乐 NGLINE.2121

2025/1/14

14:36:00

79 182 28
世界醒狮看中国,中国醒狮看广东,广东醒狮看佛山 醒狮文化;2025蛇年大吉;春节守护计划;非遗传承 非遗传承 37745782098

2024/12/28

23:13:00

37 6

Table 2

Feature content"

路径分布 模态 特征指标 理论依据
中央路径 文字特征 视频内容相关性、字幕有无 信息的匹配度与清晰度,直接影响用户对核心内容的理解[23]
视频属性 视频内容分类、深度语义特征 视频的核心主题与所呈现的具体内容,是论证本身
声音特征 人声有无 是否包含解说或讲解,直接传递信息,属于内容论证的一部分[24]
边缘路径 作者属性 官方认证、粉丝数量、获赞数量 信源可信度、吸引力与社会认同[7,25]
文字特征 标题字数、情感得分、标签热度 标题的表面形式[2]、情感色彩、是否为热门,是快速判断的线索[26]
视频属性 视频时长、画面方向、分辨率、色彩(HSV)、场景切换频率 制作精良度、视觉冲击力、观看体验,均属于感官层面的吸引力线索[27,28]
声音特征 音频情感、MFCC与FBank 背景音乐的节奏与情感基调,影响用户情绪,而非理性判断[29]

Table 3

Video content classification"

标签名称 含义
民俗节庆 记录与传统节气、地方性庆典、族群仪式相关的非物质文化遗产活动
文化艺术 展现非遗体系中的表演艺术、文学表达与审美创造形式,例如昆曲、剪纸文化
教学教程 传授非遗技艺,工艺、手艺教学。如苏绣针法步骤、扎染教学、修复壁画技法
人物故事 介绍非遗代表性传承人、守护者的人生经历与文化坚守,突出非遗守护人的传承精神
工艺流程 展示非遗手工艺从原料到成品的制作工序,包括工艺流程、工艺特色等
知识科普 非遗知识科普,阐释非遗项目的历史渊源、文化内涵及科学原理
物品展示 展示非遗物品,如紫砂壶形制、苗族银饰款式、福建土楼结构
跨界创新 非遗元素在现代语境下的创新应用与跨界融合,例如昆曲妆面变妆视频

Fig.2

Mel-spectrogram and MFCC feature map"

Table 4

Video metadata encoding"

特征名称 编码方式 维度
场景切换频率 独热编码(有字幕=1, 0;无字幕=0, 1) 2
时长 log(T/60+ϵ)(ϵ为极小值) 1
分辨率 归一化到[0,1]区间 1
字幕有无 归一化w′=w/4096;h′=h/4096 2
画面方向 独热编码(横屏=1, 0;竖屏=0, 1) 1

Table 5

Model summary"

模型类别 模型名称 适用场景
线性模型 Logistic Regression 二分类、可解释性需求
核方法模型 SVM(RBF Kernel) 小规模非线性分类
集成学习模型 XGBoost 结构化数据分类/回归
神经网络 MLP 非结构化数据分类/回归
多模态融合模型 Transformer 多模态数据分类/回归

Fig.3

Importance scores"

Table 6

Comparison of multimodal data model results"

Model Precision Recall F1 Accuracy Auc
Logistic Regression 0.676 113 0.692 946 0.684 426 0.680 498 0.729 619
SVM 0.709 016 0.717 842 0.713 402 0.711 618 0.781 090
XGBoost 0.701 550 0.751 037 0.725 451 0.715 768 0.798 110
MLP 0.695 122 0.709 544 0.702 259 0.699 170 0.768 513
Transformer 0.701 195 0.730 290 0.715 447 0.709 544 0.761 635

Fig.4

ROC curve for multimodal data model 2"

Table 7

Video modality model results"

Model Precision Recall F1 Accuracy
Logistic Regression 0.680 203 0.666 667 0.673 367 0.677 419
SVM 0.678 392 0.671 642 0.675 000 0.677 419
XGBoost 0.684 729 0.691 542 0.688 119 0.687 345
MLP 0.682 292 0.65 1741 0.666 667 0.674 938

Fig.5

ROC curve for video modality data model"

Table 8

Results of the acoustic modal model"

Model Precision Recall F1-score Accuracy
Logistic Regression 0.587 0 0.602 7 0.594 7 0.596 9
SVM(RBF Kernel) 0.606 0 0.701 3 0.650 2 0.629 6
XGBoost 0.588 7 0.664 0 0.624 1 0.607 3
MLP 0.592 3 0.616 0 0.603 9 0.603 4

Fig.6

ROC curve for the audio modality data model"

Table 9

Results of text modality"

Model Precision Recall F1-score Accuracy
Logistic Regression 0.604 113 0.626 667 0.615 183 0.615 183
SVM(RBF Kernel) 0.630 137 0.736 000 0.678 967 0.658 377
XGBoost 0.635 294 0.720 000 0.675 000 0.659 686
MLP 0.629 921 0.640 000 0.634 921 0.638 743

Fig.7

Text modality data model ROC"

Table 10

Results of author attribute modality"

Model Precision Recall F1-score Accuracy
Logistic Regression 1.000 0.381 0.552 0.700
SVM(RBF Kernel) 1.000 0.224 0.366 0.624
XGBoost 0.892 0.895 0.893 0.897
MLP 0.982 0.751 0.851 0.873

Fig.8

ROC curve for author attribute modal data model"

[1]
蔡晓芳, 王传琪. 数字媒介重塑北京中轴线文化表达[J]. 人民论坛, 2024(11): 107-109.
CAI X F, WANG C Q. Digital media remodels the cultural expression of Beijing central axis[J]. People's tribune, 2024(11): 107-109.
[2]
杨达森, 李诗轩, 丛颖男. 抖音阅读推广短视频传播效果影响因素研究[J]. 图书馆学研究, 2021(23): 34-44.
YANG D S, LI S X, CONG Y N. Research on influencing factors of transmission effect of reading promotion short videos on TikTok[J]. Research on library science, 2021(23): 34-44.
[3]
张舒涵, 孔朝蓬, 孔婧媛. 新媒体时代短视频信息传播影响力研究[J]. 情报科学, 2021, 39(9): 59-66.
ZHANG S H, KONG Z P, KONG J Y. Analysis on influence of short video information dissemination in new media age[J]. Information science, 2021, 39(9): 59-66.
[4]
LU Y D, PAN J. The pervasive presence of Chinese government content on Douyin trending videos[J]. Computational communication research, 2022, 4(1): 68-97.
[5]
ZHU C Y, XU X L, ZHANG W, et al. How health communication via tik Tok makes a difference: A content analysis of tik Tok accounts Run by Chinese provincial health committees[J]. International journal of environmental research and public health, 2020, 17(1): 192.
[6]
王磊, 胥瑜, 黎静仪. “浸”乡情“切”: 乡村旅游宣传片中方言配音的积极效应[J/OL]. 旅游科学, 2024: 1-16.
WANG L, XU Y, LI J Y. The positive dialect effect in rural tourism promotional videos[J/OL]. Tourism science, 2024: 1-16.
[7]
陈忆金, 潘沛. 健康类短视频信息有用性感知的影响因素研究[J]. 现代情报, 2021, 41(11): 43-56.
CHEN Y J, PAN P. Investigation on factors influencing perception of information usefulness of health short videos[J]. Journal of modern information, 2021, 41(11): 43-56.
[8]
王美权, 王芳. 政务短视频分面分类框架构建研究[J/OL]. 情报科学, 2025: 1-22.
WANG M Q, WANG F. Developing a faceted classification framework for the information organization with government short videos[J/OL]. Information science, 2025: 1-22.
[9]
付少雄, 曾源来, 邓胜利. 多模态特征影响下辟谣短视频互动效果研究: 基于意见氛围中介视角[J]. 情报学报, 2025, 44(4): 466-481.
FU S X, ZENG Y L, DENG S L. Interactive effects of rumor-defying short videos under the influence of multimodal features: Based on the opinion climate mediation perspective[J]. Journal of the China society for scientific and technical information, 2025, 44(4): 466-481.
[10]
BREWER S M, KELLEY J M, JOZEFOWICZ J J. A blueprint for success in the US film industry[J]. Applied economics, 2009, 41(5): 589-606.
[11]
DELLAROCAS C, ZHANG X (, AWAD N F. Exploring the value of online product reviews in forecasting sales: The case of motion pictures[J]. Journal of interactive marketing, 2007, 21(4): 23-45.
[12]
倪渊, 李晓娜, 张健, 等. 多源异构数据融合视角下文化UGC传播效果预测: 基于GRA-PSO-WRF的组合建模[J]. 管理评论, 2024, 36(11): 235-247.
NI Y, LI X N, ZHANG J, et al. Prediction of cultural UGC communication effectiveness from the perspective of multi-source heterogeneous data fusion: A combination modeling of GRA-PSO-WRF method[J]. Management review, 2024, 36(11): 235-247.
[13]
赵智慧, 周毅, 李炜弘, 等. 基于深度学习多模态融合的2型糖尿病中医证素辨证模型的构建[J]. 世界科学技术-中医药现代化, 2024, 26(4): 908-918.
ZHAO Z H, ZHOU Y, LI W H, et al. Construction of a Chinese medicine zhengsu differentiation model for type 2 diabetes based on deep learning multimodal fusion[J]. Modernization of traditional Chinese medicine and materia Medica-world science and technology, 2024, 26(4): 908-918.
[14]
TANG S S, LI Q, MA X T, et al. Knowledge-based temporal fusion network for interpretable online video popularity prediction[C]//Proceedings of the ACM Web Conference 2022. Virtual Event, Lyon, France: ACM, 2022: 2879-2887.
[15]
付少雄, 宋金铃, 苏一琦, 等. 虚假短视频多模态内容结构操纵对用户分享意愿的影响: 基于混合方法的研究[J]. 情报资料工作, 2025, 46(6): 54-62.
FU S X, SONG J L, SU Y Q, et al. The impact of multimodal content structure manipulation of false short videos on users' sharing intention: A mixed methods study[J]. Information and documentation services, 2025, 46(6): 54-62.
[16]
付少雄, 宋金铃, 邓胜利, 等. 虚假短视频多模态内容语义操纵对用户信任的影响研究[J]. 图书馆杂志, 2025, 44(2): 108-119, 129.
FU S X, SONG J L, DENG S L, et al. Research on the influence of semantic manipulation of multimodal content of fake short videos on user trust[J]. Library journal, 2025, 44(2): 108-119, 129.
[17]
付少雄, 成琦. 内容情感视角下的虚假短视频传播影响因素研究: 基于CAC与ELM双模型[J]. 情报资料工作, 2025, 46(2): 61-69.
FU S X, CHENG Q. Research on the influencing factors of false short video dissemination from the perspective of content emotion: Based on CAC and ELM dual models[J]. Information and documentation services, 2025, 46(2): 61-69.
[18]
朱恒民, 高凯力, 魏宏程, 等. 融合异质信息网络结构特征的短视频主题识别方法[J/OL]. 情报杂志, 2025: 1-8.
ZHU H M, GAO K L, WEI H C, et al. Short video topic detection by integrating structural features of heterogeneous information network[J/OL]. Journal of intelligence, 2025: 1-8.
[19]
MASSARO D W, PETTY R E, CACIOPPO J T. Communication and persuasion: Central and peripheral routes to attitude change[J]. The American journal of psychology, 1988, 101(1): 155.
[20]
COULDRY N. Media rituals: A critical approach[M]. London: Routledge, 2005.
[21]
JING P G, SU Y T, NIE L Q, et al. Low-rank multi-view embedding learning for micro-video popularity prediction[J]. IEEE transactions on knowledge and data engineering, 2018, 30(8): 1519-1532.
[22]
朱恒民, 徐凝, 魏静, 等. 基于网络表示学习的短视频流行度预测研究[J]. 情报学报, 2024, 43(9): 1105-1115.
ZHU H M, XU N, WEI J, et al. Study of short video popularity prediction based on network representation learning[J]. Journal of the China society for scientific and technical information, 2024, 43(9): 1105-1115.
[23]
闫岩. 双重编码理论及其传播学应用[J]. 国际新闻界, 2013, 35(10): 42-52.
YAN Y. The dual coding theory and its applications in communication studies[J]. Chinese journal of journalism & communication, 2013, 35(10): 42-52.
[24]
赵义堃, 尚乐乐. 乡村生活短视频的“声音语言”: 以李子柒视频为例[J]. 淮南师范学院学报, 2024, 26(1): 108-112.
ZHAO Y K, SHANG L L. Analysis of sound language in short videos of rural life: Taking Li Ziqi's video as an example[J]. Journal of Huainan normal university, 2024, 26(1): 108-112.
[25]
闫奕文, 张海涛, 孙思阳, 等. 基于BP神经网络的政务微信公众号信息传播效果评价研究[J]. 图书情报工作, 2017, 61(20): 53-62.
YAN Y W, ZHANG H T, SUN S Y, et al. Research on the effect evaluation of the government we chat information communication based on the BP neural network[J]. Library and information service, 2017, 61(20): 53-62.
[26]
陈巧芬. 认知负荷理论及其发展[J]. 现代教育技术, 2007, 17(9): 16-19, 15.
CHEN Q F. Cognitive load theory and its development[J]. Modern educational technology, 2007, 17(9): 16-19, 15.
[27]
陈冠. 主色体系的色彩心理学特性[J]. 中央民族大学学报, 2004, 31(5): 92-94.
CHEN G. Tinct psychological character in main color system[J]. Journal of the central university for nationalities, 2004, 31(5): 92-94.
[28]
宋建明. 色彩心理的学理、设计职业与实验[J]. 装饰, 2020(4): 21-26.
SONG J M. Theory, design profession and experiment of color psychology[J]. ZhuangShi, 2020(4): 21-26.
[29]
高晓晶, 喻梦倩, 杨家燕, 等. 图书馆短视频传播及互动效果影响因素模型及实证分析: 基于“上瘾模型”的探索[J]. 图书情报工作, 2021, 65(10): 13-22.
GAO X J, YU M Q, YANG J Y, et al. Model and empirical analysis of influencing factors of library short video dissemination and interaction effect: An exploration based on the "Hook Model"[J]. Library and information service, 2021, 65(10): 13-22.
[30]
DEARING J W, COX J G. Diffusion of innovations theory, principles, and practice[J]. Health affairs, 2018, 37(2): 183-190.
[31]
史维国, 刘佳佳. 关联词单用标题句的类型特征、语用功能及创新机制[J]. 语言文字应用, 2024(4): 58-71.
SHI W G, LIU J J. Type characteristics, pragmatic functions and innovative mechanisms of headlines featuring single conjunction[J]. Applied linguistics, 2024(4): 58-71.
[1] Qiaofei CHEN, Haomin ZHOU, Xin XU. Digital-Intelligence Empowers Cultural Heritage Protection and Inheritance:Taking the International Communication of Chinese Tea Culture as an Example [J]. Journal of library and information science in agriculture, 2024, 36(6): 62-78.
[2] HU Shoumin, DONG Huanqing. Framework for the Semantic Description of Images with Integrated Events and Emotions [J]. Journal of library and information science in agriculture, 2024, 36(2): 51-60.
[3] WANG Wei, XU Xin. Transformation and Development of Intangible Cultural Heritage through Technology [J]. Journal of library and information science in agriculture, 2024, 36(1): 58-70.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!