中文    English
Current Issue
05 June 2026, Volume 38 Issue 6
Data Resources and Data Intelligence: Origins, Value, and Application Domains | Open Access
HUANG Shuiqing, LIU Liu, ZHANG Wei
2026, 38(6):  4-16.  DOI: 10.13998/j.cnki.issn1002-1248.26-0327
Asbtract ( 55 )   HTML ( 41)   PDF (795KB) ( 9 )  
Figures and Tables | References | Related Articles | Metrics

[Purpose/Significance] In response to the urgent need to cultivate talent in the era of data elements, this study clarifies the core connotations, disciplinary orientation, and practical value of "Data Resources and Data Intelligence" as an emerging undergraduate major. It provides a theoretical reference and practical pathways for transforming, upgrading, and constructing undergraduate programs in the discipline of Information Resources Management. [Method/Process] After systematically reviewing the rationale and implementation process for establishing the "Data Resources and Data Intelligence" undergraduate major, this study transcends the traditional framework of collection, processing, storage, retrieval, and utilization. With the core objective of realizing or facilitating the release of data value, it establishes the disciplinary foundation and knowledge framework of "Data Resources and Data Intelligence." The study corresponds data resources, data intelligence, and data elements with the first, second, and third releases of data value, respectively, explaining the core connotations and intrinsic relationships among the three concepts, and clarifying their logical connections with the DIKW model. For the first time, it provides a clear theoretical definition of "data intelligence" and reveals the role of artificial intelligence in the three value releases. Finally, through specific cases in typical application scenarios, it further illustrates the external empowerment process and practical value of data resources, data intelligence, and data elements. [Results/Conclusions] The study proposes that data intelligence is a capability system built upon data resources that integrates data and intelligent technologies to achieve systematic and intelligent generalized decision-making. This undergraduate major's core objectives are to transform data into organized resources, extract knowledge from data, and enable intelligent application, while fostering data as a productive factor as its broader value proposition. It integrates data and intelligent technologies, covers the entire process of the first, second, and third releases of data value, addresses current gaps in talent cultivation, represents a significant initiative for the transformation of the discipline of Information Resources Management in the era of data, and serves as a direct response to the national data element strategy.

Connotations and Innovative Path of Data Element Discipline Construction under the Background of New Liberal Arts | Open Access
FAN Zhenjia, ZHANG Yunong, LIU Tingxiao, JI Xiangfei
2026, 38(6):  17-27.  DOI: 10.13998/j.cnki.issn1002-1248.26-0337
Asbtract ( 28 )   HTML ( 19)   PDF (884KB) ( 2 )  
Figures and Tables | References | Related Articles | Metrics

[Purpose/Significance] In the context of the development of the New Liberal Arts and the Digital China strategy, the construction of data element-related disciplines and professional programs has become an important response to the growing demand for highly skilled digital talent. As data are increasingly recognized as a key production factor, talent cultivation must go beyond technical training and address the full process through which data are organized, governed, developed, circulated, and transformed into value. This paper focuses on the connotations and innovative pathways of data element-related program construction. It highlights the supporting role of Information Resources Management, a discipline that has long focused on the organization, retrieval, preservation, governance, service, and utilization of information resources. The study contributes to current discussions by linking data element talent cultivation with disciplinary transformation, curriculum reconstruction, practice-based education, and the development of China's independent knowledge system in philosophy and social sciences. [Method/Process] This paper adopts an interpretive analytical approach based on policy texts, disciplinary evolution, and professional construction practices. First, it analyzes the strategic background of Digital China, the digital economy, data element marketization, and New Liberal Arts development, and clarifies the new requirements these contexts place on talent cultivation. Second, it examines the transformation from Library, Information and Archives Management to Information Resources Management (IRM), emphasizing the expansion of disciplinary objects from documents, archives, and information services to data resources, digital knowledge, data governance, intelligent information services, and value realization. Third, it reviews the knowledge foundations of IRM, including information organization, knowledge organization, metadata, information retrieval, digital preservation, information behavior, knowledge services, information policy, public data opening, data quality, data ethics, and intelligence analysis. Finally, it draws on the construction of the Data Resources and Data Intelligence program as an illustrative practice to discuss possible pathways for curriculum design, practical teaching, digital-intelligent empowerment, multi-actor collaboration, and international cooperation. [Results/Conclusions] The study found that data element-related disciplinary and professional program construction should be organized around the full lifecycle governance and value realization of data resources. This lifecycle includes the processes of data resource management, assetization, and capitalization, covering data collection, storage, description, organization, quality control, rights confirmation, valuation, product development, circulation, application, and security governance. This perspective helps integrate fragmented knowledge modules and connects data management education with the practical process of data value creation. IRM provides a distinctive disciplinary foundation for this task. Compared with fields that focus mainly on algorithms, systems, or market allocation, IRM emphasizes the relationship between data, users, organizations, institutions, scenarios, and public value. It can therefore support the cultivation of compound, governance-oriented, and scenario-oriented digital talent. Such talent should possess foundational knowledge of data resources, methods of digital and intelligent technologies, an understanding of governance institutions, capabilities for industry and scenario application, and international communication competence. The paper further proposes that data element-related program construction can proceed through several connected pathways. Teaching content should cover the full process of data value realization and integrate data organization, data governance, data intelligence, data ethics, data asset management, and scenario-based data application. Practical teaching should be strengthened through real projects, industry scenarios, government departments, enterprises, data circulation institutions, and research organizations. Digital and intelligent technologies should be deeply embedded in professional training, while data security, ethical awareness, and responsible innovation should remain central. International cooperation should also be expanded to enhance students' global competence and awareness of cross-border data governance. Overall, data element-related program construction provides an important opportunity for IRM to reorganize its knowledge system, renew its educational model, and better serve Digital China, the data element market, and global data governance.

Hierarchical Automatic Patent Classification Driven by the Integration of Examination Logic and Knowledge Bases: A Case Study in the Artificial Intelligence Domain | Open Access
XI Chongjun, ZHAO Yajuan, LV Lucheng, SU Ying
2026, 38(6):  28-42.  DOI: 10.13998/j.cnki.issn1002-1248.25-0763
Asbtract ( 150 )   HTML ( 57)   PDF (1463KB) ( 40 )  
Figures and Tables | References | Related Articles | Metrics

[Purpose/Significance] The rapid evolution of artificial intelligence (AI) has led to a surge in patent applications, placing immense pressure on traditional patent examination and management systems. While automated classification has gained attention, existing methods often suffer from "semantic drift," hierarchical conflicts, and a lack of interpretability, primarily because they treat classification as a flat probabilistic task rather than a structured logical inference. This study aims to develop a hierarchical automatic patent classification framework that is not only efficient but also hierarchy-consistent and deeply aligned with the professional logic of patent examiners. By shifting the paradigm from black-box probabilistic guessing to knowledge-driven steady-state inference, this study provides a scalable and reliable pathway for intelligent patent classification in high-density technical domains. [Method/Process] The proposed framework was built upon a three-stage mechanism: technical content extraction, technical theme condensation, and hierarchical mapping, utilizing DeepSeek-V3 as the core semantic engine. First, the study constructed an IPC classification standard library and a patent classification knowledge base. A key innovation here is the "Hierarchical Fusion Strategy," which explicitly encodes the examination logic by embedding parent-level technical definitions into child-level descriptions to provide a complete semantic boundary. This ensures that the model perceives the nested structure of the IPC system rather than treating categories as independent labels. Second, the framework performs a semantic-anchored extraction of technical information. Unlike traditional methods that rely on raw text, this process utilizes the IPC standard library as a reference to filter and condense patent claims and descriptions into structured "technical themes". This intermediate representation mitigates the risks of semantic hallucination and handles data sparsity by compressing the semantic space into a more consistent and discriminative form. Third, a "bottom-up" hierarchical mapping strategy was implemented. The system prioritizes matching at the most granular level (the IPC subgroup) and then derives higher-level categories through the established hierarchical chain. To ensure robustness, a dual-path verification mechanism - parallel comparison between independent matching and hierarchical mapping - was introduced. When results conflicted, the system employed a logic of confidence priority to perform local error correction, ensuring that the final output was both fine-grainedly accurate and hierarchically consistent. [Results/Conclusions] Experimental validation conducted on a dataset of Chinese AI invention patents from 2021 to 2025 demonstrates the superior "architecture stability" of the framework. The optimal fusion strategy achieved accuracy rates of 100%, 97.32%, 91.84%, 86.48%, and 71.25% at the IPC section, class, subclass, main group, and subgroup levels, respectively, significantly outperforming the PatentBERT baseline and direct large language model (LLM) classification. Ablation studies confirmed that the integration of IPC knowledge guidance, the condensation of technical themes, and the bottom-up mapping strategy are all critical contributors to performance gains. The results demonstrate that by encoding examination logic into the model, the inherent randomness of LLMs can be effectively constrained within a structured logical track. This framework essentially functions as a "classification skill" for AI Agents, capable of being integrated into intelligent examination systems via an API for constant and automated category updates. Despite limitations in domain coverage, the model-agnostic nature of the architecture suggests high potential for migration to other complex technical fields, providing a foundational methodology for the unified representation and analysis of multi-source innovation data.

Transformation Logic and Practical Pathways for University Libraries Integrating into the Trusted Data Space Construction | Open Access
CHENG Cheng, ZHOU Jie, WANG Han
2026, 38(6):  43-58.  DOI: 10.13998/j.cnki.issn1002-1248.26-0080
Asbtract ( 178 )   HTML ( 43)   PDF (923KB) ( 2 )  
Figures and Tables | References | Related Articles | Metrics

[Purpose/Significance] The release of the Trusted Data Space Development Action Plan (2024-2028) marks the strategic shift of data governance in China, and the trusted data space is positioned as the key infrastructure for the safe circulation and value realization of data elements. For university libraries, this national strategic deployment is not only an important opportunity to expand the service boundary, but also an inevitable requirement for their transformation from traditional document-based institutions to data-driven knowledge service institutions. At present, the research on trusted data space mostly focuses on the field of public libraries or pan-library, and research on university libraries with unique resource endowments, service traditions and institutional constraints is still scarce. This study aims to address a research gap by systematically exploring the integration of university libraries into trusted data space. It seeks to provide theoretical guidance and practical solutions for their transformation and development in the data age. [Method/Process] This study is guided by the action plan and adopts a multi-stage research approach. First, the trusted data space is systematically analyzed in four dimensions, core features, technical architecture, business activities, and governance mechanisms. Secondly, based on the public information on the official website of the library, the policy documents of the institutional repository and the publicly published literature, five representative university libraries in China and two international cases were selected for comparative analysis, and a four-dimensional analysis framework covering data resource construction, technology platform deployment, data governance mechanism and knowledge service innovation was constructed to evaluate their existing data capabilities. Thirdly, combined with the identified shortcomings, the transformation logic framework was constructed from the four dimensions of function, service, governance and value. Finally, the design includes three stages of start-up, construction and expansion, and refines seven practical paths: strategic guidance, data center governance, technology base construction, talent cultivation, service innovation, ecological coordination and value balance. [Results/Conclusions] The study found that university libraries have natural advantages in integrating into the trusted data space, including large-scale data resource aggregation capabilities, independent technology platform upgrade capabilities, scenario-based expansion of knowledge service capabilities, and institutional credibility. However, systematic ability still has four shortcomings. First, ownership of the data is unclear as it involves multiple right holders, such as authors, sponsors, libraries and database providers. Second, the standard connection is insufficient, and traditional metadata standards such as MARC and DC are difficult to adapt to data space standards such as DCAT and Schema.org. Third, the security mechanism is weak, and there is a lack of data classification, dynamic access control and the whole process audit. Fourth, the service model is limited, and the service is still mainly within the campus, lacking cross-domain expansion. In view of the above problems, this study proposed that university libraries should undergo a fundamental transformation in four areas. 1) A shift in focus from collecting resources to becoming a data hub. 2) A shift in services, from provding information to co-creating knowledge. 3) A shift in governance, from self-management to ecological synergy. 4) In terms of value orientation, realizing the dynamic balance between openness and security. The seven practical paths and the three-stage roadmap can provide operational implementation guidance for university libraries to break through the traditional service boundary, activate the potential of data elements, and deeply participate in the national data infrastructure and innovative ecological construction. Future research can empirically test and optimize the proposed framework through action research or pilot projects, and further explore the quantitative evaluation mechanism and value return model of data contribution.

Data Quality Assessment and Improvement Strategies: A Diagnostic Analysis Based on the Public Basic Databases (Population and Legal Entity Databases) of a City | Open Access
PAN Yong, SUN Jing, WANG Jiandong
2026, 38(6):  59-69.  DOI: 10.13998/j.cnki.issn1002-1248.25-0664
Asbtract ( 807 )   HTML ( 7)   PDF (1418KB) ( 4 )  
Figures and Tables | References | Related Articles | Metrics

[Purpose/Significance] As data become a strategic resource in the digital economy, its quality directly affects the efficiency of value creation and the effectiveness of public governance. However, with the continuous expansion of data scale and the deepening of application scenarios, pervasive quality issues - such as inconsistencies, errors, and redundancies - have emerged as a significant bottleneck restricting the release of data element potential. High-quality public data are particularly critical for empowering government decision-making and optimizing public services. Addressing the urgent practical need for high-quality data supply, this paper relies on the public basic databases (specifically the Population Database and Legal Entity Database) of a representative city to construct a scientific, systematic, and operable data quality assessment system. The study aims to diagnose existing quality defects in these foundational assets and provide theoretical support and actionable references for relevant departments to transition from passive data management to active quality governance. [Method/Process] To ensure the assessment is both scientifically rigorous and practically applicable, this study establishes a comprehensive evaluation framework based on domestic and international research, combined with the national standard GB/T 36344-2018 and local data characteristics. The framework comprises a hierarchical structure with 6 primary indicators (Normativity, Integrity, Consistency, Accuracy, Timeliness, and Accessibility), 17 secondary indicators, and 61 specific detection items. The study employs a dual-track assessment methodology integrating automated detection tools with manual verification. Automated SQL scripts and rule engines are utilized for the large-scale quantitative detection of intrinsic dimensions, while manual checks and interviews address contextual dimensions. This methodology was applied to conduct a multi-dimensional evaluation of 1 367 data items across 102 datasets in the city, ensuring a thorough analysis of the data status. [Results/Conclusions] The evaluation results indicate that while the overall construction of the city's public basic databases is positive, multidimensional quality issues persist. Specifically, the assessment revealed problems such as data coding errors, non-standardized classification, missing data items, missing or duplicate primary keys, inconsistent formats, the presence of illegal characters or outliers, and data delays or discontinuations. To address these challenges, the paper proposes four systematic improvement strategies: 1) To unify data standards and coding systems to ensure consistency across departments; 2) To construct a full-process quality control mechanism covering data collection, storage, and usage; 3) To strengthen technical platform support by implementing real-time monitoring and intelligent warning capabilities; and 4) To improve organizational synergy and institutional guarantees to solidify the management foundation. These measures are intended to optimize data supply quality and support the support the high-quality and sustainable development of the data element market.

Digital Resilience of Smart Libraries Driven by Data Elements - Digital Technology | Open Access
WU Yuhao, ZHOU Zhigang, LIU Wei
2026, 38(6):  70-85.  DOI: 10.13998/j.cnki.issn1002-1248.25-0727
Asbtract ( 137 )   HTML ( 21)   PDF (980KB) ( 5 )  
Figures and Tables | References | Related Articles | Metrics

[Purpose/Significance] As the core hub for public cultural services and the inclusive dissemination of knowledge, the digital transformation of smart libraries is accelerating continuously. However, they also face multiple digital risks such as data fragmentation, insufficient technological adaptation, and prominent system vulnerabilities, which seriously constrain the stability and sustainability of public cultural services. The construction of digital resilience has become a key support for smart libraries to respond to environmental changes and ensure the realization of core functions. This paper focuses on the sustainable development demands of smart libraries in the digital age. Based on the dual-wheel drive perspective of "data elements-digital technology", it explores the generation logic and improvement path of digital resilience. This approach can not only provide a new dimension for improving the theoretical system of digital risk governance in smart libraries, but also provide practical solutions to solve real problems such as data fragmentation and insufficient technical adaptation. Furethermore, it can enhance the stability and efficiency of public cultural services. [Method/Process] Supported by theories of data governance, technological innovation and organizational resilience, this research adopts a progressive approach of literature review, logical deconstruction, framework construction and path optimization, and integrates literature research methods, system deconstruction methods and logical deduction methods. We systematically analyze the penetration and impact of data elements and digital technologies on the resources, services, technologies, and organizational dimensions of smart libraries, clarify the correlation logic and operational mechanism between dual-wheel drive and digital resilience, construct practical approaches from two aspects: the release of data element value and the collaboration of digital technology clusters, and provide a multi-dimensional guarantee system. [Results/Conclusions] The core essence of digital resilience in smart libraries lies in their dynamic adaptation, efficient response, and continuous evolution capabilities in the face of digital risks. Its formation relies on the deep collaboration between data elements and digital technologies: Data elements, by building a multimodal collaborative data ecosystem, break down information silos and lay a solid resource foundation for digital resilience. Digital technology, relying on the collaborative efforts of technology clusters such as big data, artificial intelligence, and blockchain, has formed a full-cycle risk response technology system covering risk perception, emergency response, and system recovery. The coupled interaction between the two promotes a qualitative leap in digital resilience from passive risk resistance to active value creation, ultimately achieving a deep integration and development driven by data elements - digital technology-driven and resilience construction. Based on this, practical suggestions are put forward. Smart libraries should strengthen the standardized construction of data governance, promote the scenario-based application of technology clusters, and improve the cross-departmental collaboration mechanism.

Optimization of Subversive Technology Identification Model Based on Data Balancing and Integrated Learning | Open Access
CHEN Yuanyuan, HU Shaohuang, CHEN Xiaohong
2026, 38(6):  86-97.  DOI: 10.13998/j.cnki.issn1002-1248.25-0536
Asbtract ( 82 )   HTML ( 19)   PDF (1325KB) ( 26 )  
Figures and Tables | References | Related Articles | Metrics

[Purpose/Significance] Disruptive technology identification has become an increasingly important research topic in the context of rapid technological evolution and strategic decision-making for governments and enterprises. However, existing data-driven identification approaches often suffer from two critical limitations. First, disruptive technology datasets are typically characterized by severe class imbalance, where truly disruptive cases constitute only a small fraction of the total samples, leading to biased learning and poor generalization. Second, most existing studies rely on a single machine learning model, which limits the ability to capture complex and heterogeneous patterns embedded in high-dimensional technical text features. These issues restrict the robustness, accuracy, and practical applicability of current identification frameworks. To address these challenges, this study aims to construct an optimized disruptive technology identification model that jointly considers data imbalance mitigation and model performance enhancement, thereby improving the reliability and stability of predictive results and contributing to methodological advancements in technology intelligence and innovation management research. [Method/Process] Based on the reproduction of a widely used baseline model built upon XGBoost, this study proposed a two-stage optimization framework integrating data resampling and ensemble learning. In the data preprocessing stage, a hybrid SMOTE-ENN sampling strategy was employed to reconstruct the training dataset. The SMOTE component synthetically generated minority class samples to enhance class representation, while the ENN component removed ambiguous and noisy samples from overlapping regions, thus achieving a balance between noise reduction and information preservation. This strategy effectively alleviated the adverse impact of class imbalance on model learning without excessively distorting the original data distribution. In the modeling stage, a stacking-based ensemble learning framework was constructed by integrating multiple heterogeneous base learners, including XGBoost, LightGBM, Extra Trees, and Support Vector Machines. These base models were selected to capture complementary decision boundaries and feature interactions from different learning perspectives. A Random Forest model was further employed as a meta-learner to aggregate the outputs of the base learners and perform higher-level feature integration. Through this hierarchical learning mechanism, the proposed framework enhanced both representation capability and predictive robustness, enabling more accurate identification of disruptive technologies under complex and noisy data conditions. [Results/Conclusions] Extensive experimental evaluations demonstrate that the proposed optimization model significantly outperforms the baseline XGBoost model across multiple core performance metrics, including Accuracy, Precision, Recall, and F1-Score. Notably, the F1-Score, which is substantially improved from 0.63 to 0.98, indicates a marked enhancement in the model's ability to correctly identify minority disruptive technology samples while maintaining high overall stability. The results confirm that the combined application of hybrid resampling and ensemble learning effectively addresses the challenges of sample imbalance and model bias in disruptive technology identification tasks. In conclusion, the proposed framework provides a robust and scalable solution for identifying disruptive technologies in high-dimensional, imbalanced data scenarios. Beyond improving prediction accuracy, this study offers methodological insights for technical text modeling and innovation analytics. Its approach can be easily adapted to other fields with similar data imbalance and complexity issues. Future research may further explore adaptive sampling strategies and deep learning-based ensemble architectures to enhance temporal and semantic representation capabilities.

Large AI Model Utilization Optimization in Libraries Based on Multimodal Resource Profiling | Open Access
QIN Miao, WANG Qingfei
2026, 38(6):  98-114.  DOI: 10.13998/j.cnki.issn1002-1248.25-0259
Asbtract ( 514 )   HTML ( 11)   PDF (3138KB) ( 7 )  
Figures and Tables | References | Related Articles | Metrics

[Purpose/Significance] With the rapid advancement of artificial intelligence (AI) technologies, libraries are transforming their service models and content offerings. Large AI models have opened up broader development opportunities for smart libraries. However, the rational adoption and application of these models has posed a significant challenge to libraries. This study employs multimodal resource profiling to conduct research on the optimization of large AI model utilization in libraries, revealing the intrinsic relationships among various types of library resource data. Based on these insights, the optimization methods and related strategies are extracted to enhance the efficiency of library resource utilization and improve user experience. [Method/Process] Multimodal resource profiling is a comprehensive representation that captures the intrinsic characteristics of library resources through tag extraction, aggregation analysis, and visualization of diverse data generated within the libraries. By utilizing a novel clustering algorithm, it overcomes the high sensitivity to input parameters characteristic of traditional algorithms and achieves natural clustering across resources with varying densities, thereby enabling the generation of accurate multimodal resource profiles. The resource profiling model provides a theoretical foundation for optimizing the deployment and utilization of large AI models in libraries, while also delivering rich data support for subsequent AI model applications. The adoption strategy proposed in this study is divided into two aspects: model selection and model utilization. Model selection focuses on compatibility and accuracy to achieve an optimal match between the model and both library resources and user needs. Model utilization emphasizes the effectiveness and usability of the output, thereby enhancing operational efficiency and user experience. Based on this framework, the overall operational mechanism of the adoption optimization strategy is designed around continuous model monitoring, real-time collection of user feedback, iterative model updates, and dynamic adjustment of multimodal resource profiles. [Results/Conclusions] This study takes a public digital library on "Telegram" as a case study to generate multimodal resource profiles, which meticulously categorize user groups, interests, and emotional intensities. By integrating the large AI model adoption optimization strategy with the outcomes of multimodal resource profiling, the model autonomously identifies the most task-relevant features, reducing the need for manual intervention. Not only does it achieve high prediction accuracy, but the explanatory feature weights it outputs also provide a quantifiable basis for service optimization. Through comparative experiments with commonly used structural modules, the proposed method demonstrates significant advantages over traditional recommendation systems in terms of both resource utilization efficiency and user engagement. This study lays the foundation for the future development of library technology and opens up new possibilities for the application of multimodal resource profiling.

Digital Literacy, Perception of New Quality Productive Forces, and Green Production Willingness: Evidence from Large-Scale Farmers in Northwest China | Open Access
WU Yanyan, ZHANG Jinling
2026, 38(6):  115-130.  DOI: 10.13998/j.cnki.issn1002-1248.25-0564
Asbtract ( 140 )   HTML ( 76)   PDF (762KB) ( 37 )  
Figures and Tables | References | Related Articles | Metrics

[Purpose/Significance] The rapid advancement of digital technologies has created substantial opportunities for promoting green agricultural transformation in China. Digital literacy plays a pivotal role in enabling large-scale farmers to understand and apply modern agricultural technologies, thereby shaping their willingness to adopt environmentally-friendly production practices. As the core actors in agricultural production, farmers' digital competence and their perception of new quality productive forces (PNQPF) directly influence how they respond to digital innovations and participate in green production. However, existing research has not yet established a systematic measurement framework for PNQPF, nor has it clarified the multi-stage cognitive mechanisms through which digital literacy affects farmers' green production willingness. [Method/Process] Drawing upon grounded theory and empirical investigation, this study adopted a mixed-method approach. First, in-depth interviews were conducted with diverse groups of large-scale farmers in northern and southern regions of the Xinjiang Uygur autonomous region, generating a rich textual corpus for qualitative analysis. Through open coding, selective coding, and theoretical coding, two fundamental cognitive dimensions - perception of labor tools and perception of labor objects - were identified, forming the basis of an eight-item PNQPF scale. In the quantitative stage, a structured survey was administered to 352 large-scale farmers across four provinces (Xinjiang, Shaanxi, Gansu, and Qinghai) andNingxia Hui Autonomous Region in Northwest China. The dataset encompasses demographic characteristics, operational features, digital literacy indicators, PNQPF perceptions, and evaluations of green production willingness. Exploratory and confirmatory factor analyses were employed to validate the construct reliability and structural robustness of the PNQPF scale, while regression-based mediation and moderation modeling enabled a systematic examination of the pathways, through which digital literacy influences green production willingness. This integrated analytical framework provides a comprehensive and evidence-based foundation for understanding farmers' decision-making processes in digital agricultural environments. [Results/Conclusions] The findings indicate that digital literacy significantly enhances farmers' willingness to adopt green production practices. Both cognitive dimensions of PNQPF - perception of labor tools and perception of labor objects - serve as key psychological mechanisms, exerting independent mediating effects and jointly forming a chain mediation pathway. This suggests that digital tools not only improve farmers' operational efficiency but also deepen their understanding of production objects, thereby reinforcing environmentally responsible behavior. Digital infrastructure further strengthens the impact of digital literacy on PNQPF, highlighting the importance of a supportive digital environment in amplifying farmers' behavioral transformation. Based on these insights, this study suggests that enhancing farmers' digital competence, improving regional digital infrastructure, and promoting targeted digital extension services are essential for advancing green agricultural development. Nevertheless, the sample is concentrated in Northwest China, where regional disparities in digital development and agricultural structure may limit the generalizability of the findings. Future research should expand the sample scope, incorporate longitudinal data, and explore the evolving role of digital technologies in shaping farmers' production decisions.