中文    English
Current Issue
05 September 2024, Volume 36 Issue 9
Construction of a Scientific Literature AI Data System for the Thematic Scenario: Technical Framework Research and Practice | Open Access
Zhijun CHANG, Li QIAN, Yaoting WU, Yunpeng QU, Yue GONG, Zhixiong ZHANG
2024, 36(9):  4-17.  DOI: 10.13998/j.cnki.issn1002-1248.24-0755
Asbtract ( 30 )   HTML ( 3)   PDF (1847KB) ( 15 )  
Figures and Tables | References | Related Articles | Metrics

[Purpose/Significance] Artificial intelligence is empowering scientific research and has become a major driver of scientific discovery. High-quality data resources for thematic scenarios are the key to training high-performance AI models. Given the complexity of scientific and technological (S&T) literature data and the limitations of its direct use for large-scale model training, there is a urgent need to build a systematic data construction technology framework to process, refine and curate S&T literature resources, and ultimately build a high-quality training corpus for AI applications. Some experts have conducted a number of studies, but there is still a lack of research on S&T literature AI data system for thematic scenarios. [Method/Process] This article proposes a "3+5 technical framework" plan for the construction of an AI data system for themed scenarios. Focusing on the whole process of AI data system construction, it refined and designed three levels of data content and five stages of data governance. The three-level data structure inclueds the multi-type basic database, the multi-model deconstruction database and fine-grained semantic mining knowledge base. The five-level construction stages are multi-channel data source scanning, multi-type basic data construction, multi-modal deconstruction data construction, fine-grained semantic mining knowledge construction and multi-scenario data application. Taking big data technology and intelligent mining technology as the key elements of data governance, the system architecture and functions of the data governance tool chain are described in detail. The core components of the tool chain are multi-source data aggregation tool, multi-format data parsing tool, data cleaning tool, associated file identification and acquisition tool, data fusion tool, multi-modal deconstruction and reorganization tool, and fine-grained knowledge identification tool. Working together, these tools ensure the efficiency and integrity of the design process from raw data to the AI data system. [Results/Conclusions] To verify the effectiveness of the proposed technical framework, this study has built a knowledge base in the field of rice breeding. The AI data system for thematic scenario of rice intelligent breeding includes a multi-type basic knowledge layer, a multi-modal deconstruction and recombination knowledge layer and a fine-grained semantic mining knowledge layer. The basic knowledge layer includes general scientific papers and patent data; the multi-modal knowledge layer includes the multi-modal data deconstruction of the paper content; the domain semantic mining knowledge layer focuses on the professional knowledge in rice intelligent breeding, such as rice variety validation data, phenotypic characteristics data, and rice lineage network. The results showed that the framework can effectively process S&T literature data and build a high-quality domain knowledge base, providing data support for the application of AI models in rice breeding research, verifying the effectiveness and practicality of the framework.

Building Consumption Data Systems Driven by AI Plus Expert for Scientific and Technical Literature Information Resources | Open Access
Guanghui YE, Kai TU, Lina HU, Li HAN, Zhiming FENG
2024, 36(9):  18-31.  DOI: 10.13998/j.cnki.issn1002-1248.24-0640
Asbtract ( 38 )   HTML ( 7)   PDF (768KB) ( 13 )  
Figures and Tables | References | Related Articles | Metrics

[Purpose/Significance] Limited by the constraints of traditional literature classification systems, scientific and technical literature information resources face problems such as inadequate disclosure and resource utilization. At the same time, high-quality user-generated data cannot yet be integrated as data elements into services related to scientific and technical literature services, which prevents these services from adapting to the context of the open science and meeting the diverse knowledge needs of readers. This study aims to harness the technological breakthrough potential of AI to build a consumer-end data system for scientific and technical literature information resources driven by AI and experts. This will help to overcome the shortcomings of traditional services, such as the lack of supporting reading information and low interactivity between users, with the hope of promoting the optimization process of scientific and technical literature information resource services. [Method/Process] First, the study analyzes the four-dimensional value representation of the consumer-end data systems for scientific and technical literature information resources, including the intrinsic value, the tool value, the academic value, and the future value of annotation data. Then, following the processing flow of consumer-end data, namely the collection phase, utilization phase, and management phase, the paper proposes principles for the construction of consumer-end data systems. Furthermore, the paper deconstructs and analyzes the risks associated with the involvement of AI in the construction of consumer-end data systems, including four types of risks: machine algorithm risks, annotation content risks, annotation data risks and application risks. Finally, based on the degree of AI involvement in data annotation work, three innovative models of AI plus expert collaborates with user to accomplish data annotation for scientific and technical literature information resources are designed: the AI plus expert-assisted data annotation model, the AI plus expert collaborative data annotation model, and the AI plus expert-led data annotation model. [Results/Conclusions] Under the AI plus expert-assisted data annotation model, AI acts as a tool to complete surface-level information processing based on rules set by experts to assist users in data annotation. In the AI plus expert collaborative data annotation model, AI completes the review of pre-annotation tags for scientific and technical literature information resources, transforming users from a self-generated tag mode to an AI-generated data tag evaluation and selection mode, with experts assisting in the final review of data tag quality. In the AI plus expert-led data annotation model, users provide data annotation requirements, experts guide the process, and data annotation is automatically completed by the AI4S platform.

Copyright Data Dilemma of Building High-Quality Data System for AI: Present Situation, Coping Strategies, and Implementation Path | Open Access
Hecan ZHANG, Chengqi YI, Peng GUO, Qianqian HUANG, Xiaokun JIN
2024, 36(9):  32-43.  DOI: 10.13998/j.cnki.issn1002-1248.24-0475
Asbtract ( 116 )   HTML ( 12)   PDF (980KB) ( 117 )  
Figures and Tables | References | Related Articles | Metrics

[Purpose/Significance] Improving the policy and governance systems to promote the development of strategic industries such as artificial intelligence was explicitly proposed in the resolution of the Third Plenary Session of the 20th Central Committee of the Communist Party of China. In recent years, the conflict between AI companies' desire for copyrighted data and the copyright holders' protection of copyrighted data has become increasingly apparent. There have been a number of lawsuits and disputes around the world regarding copyright infringement caused by artificial intelligence. The dilemma of copyright protection of AI training data has become a difficulty and bottleneck that urgently needs to be resolved in the development of high-quality data system for AI. [Method/Process] Based on the academic research and industrial practice on the copyright protection of AI data, this study systematically summarizes six representative approaches to address the copyright dilemma of AI training data, and provides a comparative analysis of the advantages, disadvantages, and applicability of these approaches. The six representative approaches are: signing a license agreement by both parties, initiating special plans or forming alliances, introducing a copyright notice mechanism, introducing a copyright risk guarantee mechanism, replacing with synthetic data, and applying copyright detection tools to large language models. For the copyright dilemma of AI training data, there is no optimal solution that can both encourage the supply of AI copyright training data and protect the copyright of data. [Results/Conclusions] In order to provide helpful references for increasing the supply of AI copyright data, formulating relevant policies, and promoting related work, this study has proposed a concept of general implementation path to build a high-quality data system for AI to solve the copyright dilemma of AI training data, based on the comparative analysis of the above six representative approaches and combined with China's four unique advantages. These include: 1) Integrating existing platforms to build a national-level integrated service platform for copyright data for AI, with state-owned enterprises (SOEs) under the direct administration of the central government taking the lead in establishing a national copyright data alliance and connecting copyright data to the platform. 2) To collaborate with local pilots of data intellectual property rights, explore and promote comprehensive reform pilot programs of copyright data adapted to the development of AI, and continuously strengthen the cooperation efforts and willingness between AI enterprises and copyright holders. 3) The focus should be on principled or critical issues, establishing and improving legislation related to copyright data for AI and promoting industry self-regulation.

Opportunities, Challenges, and Future Directions for Generative Artificial Intelligence in Library Information Literacy Education: A Scoping Review | Open Access
Fan YUAN, Jia LI
2024, 36(9):  44-57.  DOI: 10.13998/j.cnki.issn1002-1248.24-0614
Asbtract ( 30 )   HTML ( 2)   PDF (794KB) ( 9 )  
Figures and Tables | References | Related Articles | Metrics

[Purpose/Significance] In the rapidly evolving digital landscape, generative artificial intelligence (GenAI) has emerged as a transformative force in information literacy (IL) education, presenting unprecedented opportunities and challenges for library-based learning environments. This scoping review comprehensively examines the integration of GenAI within IL education, moving beyond theoretical frameworks to provide a nuanced analysis of practical applications and strategic implementations. In contrast to existing research that primarily emphasizes technological capabilities, this study explores the profound implications of GenAI on educational paradigms and provides critical insights into the systematic transformation of library IL services in the AI era. [Methods/Process] Following the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines, 51 key literature sources selected from the SSCI, A\&HCI, and CSSCI databases were systematically analyzed. The comprehensive analytical framework encompassed four key dimensions: technology acceptance, educational framework construction, AI literacy cultivation, and the integration of artificial intelligence with IL education. This methodological approach enabled a thorough exploration of current practices while identifying critical gaps in existing research. [Results/Conclusions] The results show that GenAI significantly enhances IL education through personalized learning experiences and improved digital teaching effectiveness. Tools such as ChatGPT have significant potential to promote adaptive learning environments and improve student engagement. The research identifies four primary areas of impact: 1) creating dynamically adaptive learning environments tailored to individual needs, 2) enhancing critical thinking through interactive scenarios, 3) facilitating cross-disciplinary knowledge integration, and 4) generating innovative educational content and resources. However, the study also identifies several critical challenges, including concerns about data accuracy concerns, inherent algorithmic biases, risks to academic integrity, and the potential weakening of independent thinking skills due to over-reliance on AI systems. To address these challenges, the research proposes a comprehensive framework that includes: 1) robust ethical guidelines for the implementation of GenAI, 2) systematic assessment mechanisms to monitor learning outcomes, 3) critical thinking training programs, and 4) strategies to maintain academic integrity and intellectual autonomy. The study emphasizes that the integration of GenAI is more than a technological change - it represents a fundamental shift towards AI literacy education. This evolution will require learners to develop skills beyond traditional IL skills, including understanding AI ethics, legal frameworks, and using AI technologies to solve problems. Future research directions should focus on conducting empirical studies in different educational contexts, developing adaptive teaching frameworks that balance technological innovation with traditional educational values, and investigating the long-term impact of GenAI integration on learning outcomes. By systematically examining the opportunities, challenges, and development trajectories of generative AI, this study provides valuable insights for libraries and educational institutions seeking to optimize their IL programs in the AI era. The findings not only contribute to the theoretical understanding of the role of GenAI in education, but also provdie practical guidance for integrating advanced technologies into traditional educational frameworks, ultimately fostering a more adaptive, intelligent, and personalized learning ecosystem.

Reader Knowledge Construction in Public Library Information Literacy Education based on the MOA Model | Open Access
Huimin HE
2024, 36(9):  58-69.  DOI: 10.13998/j.cnki.issn1002-1248.24-0648
Asbtract ( 37 )   HTML ( 3)   PDF (846KB) ( 27 )  
Figures and Tables | References | Related Articles | Metrics

[Purpose/Significance] As part of the plan to build a better new digital life, information literacy (IL) is a survival skill in the information age. As a major repository of information resources, public libraries are the backbone of public IL education. The research on readers' knowledge construction behavior of IL education in public libraries is not only helpful for readers to improve their individual IL and self-learning ability, but also constructive for the development of the national IL system. At present, relevant research mainly focuses on the framework and practical suggestions of IL education, and rarely analyzes the construction of readers' knowledge in IL education from the perspective of knowledge management and readers' behavior. [Method/Process] From the perspective of readers' knowledge construction, this paper logically extends to the necessary knowledge scene transformation in public library IL education, and finally rises to the level of development strategy on how readers can realize knowledge construction and innovation in public library IL education. Information literacy education in public libraries also includes explicit and implicit knowledge. Information literacy education activities in public libraries focus on sharing explicit knowledge, enhancing the value of collection knowledge, but more attention should be paid to the exploration of implicit knowledge, enhancing knowledge transformation and innovation of participating readers. The path of reader knowledge construction in IL education can be divided into individual knowledge, team knowledge, organizational level knowledge. According to the different levels of knowledge construction, four knowledge transformation modes constitute the spiral process of knowledge innovation. Based on the MOA model, this paper analyzes the motivational factors, opportunity factors, and ability factors of readers' knowledge construction in the IL education in public libraries, as well as the constituent elements of the knowledge construction community and their interactive relationships from a multi-dimensional dynamic perspective. The model of readers' knowledge construction in public library IL education from the perspective of MOA analyzes the influence of motivational factors, opportunity factors, and ability factors on the readers' knowledge construction behavior in the knowledge construction community of IL education. Motivation points to opportunities and skills, suggesting that motivation leads readers to seek opportunities and develop necessary skills; motivation, opportunity and ability, and knowledge construction community all point to readers' knowledge construction behavior, suggesting that together they influence and promote readers' knowledge construction process. [Results/Conclusions] Readers' knowledge construction in the MOA model is a dynamic and multi-stage process. In the IL education activities of public libraries, readers gradually promote their understanding of knowledge and apply it to practical situations by stimulating motivation, exploiting opportunities, improving skills, and ultimately constructing knowledge. At the same time, through the feedback evaluation stage, further enhance knowledge construction strategies, update development goals, and continue reform and innovation. This study is only a theoretical extension of practical work experience, and does not involve rigorous and formal data verification and case study. In the next step, the questionnaire will be used to collect sample data, and through hypothesis testing and model fitting, the application practice of the MOA model in readers' knowledge construction behavior will be further verified.

Graduate Student Digital Literacy Promotion Pathway Driven by AIGC | Open Access
Xuemei LUO, Yuzhe LIN
2024, 36(9):  70-77.  DOI: 10.13998/j.cnki.issn1002-1248.24-0596
Asbtract ( 38 )   HTML ( 3)   PDF (554KB) ( 28 )  
References | Related Articles | Metrics

[Purpose/Significance] Artificial intelligence-generated content (AIGC) is having a profound impact on the field of education. Currently, there are some problems in the digital education environment, such as incomplete digital infrastructure and slow digital transformation. The postgraduate education system has not yet fully responded to the changes in the educational environment in the intelligent era. In the era of AIGC, digital literacy has become an important component of graduate students' core competence, which is related to their future academic research and career development. In order to promote graduate students from the understanding of intelligent technology to the rational application, this paper explores a new way of talent training to adapt to the development of intelligent technology. Improving graduate students' literacy skills is important for adapting to the new demands of learning and research in the AIGC era. [Method/Process] Through a literature review and case analysis, this study explores the importance of digital literacy education for postgraduate students, and identifies challenges in educational content and teaching methods. Based on the successful experience of international universities, by analyzing the advantages and application scenarios of AIGC technology, combined with the current situation and existing problems of graduate students' digital literacy education, this paper proposes strategies and ways to improve graduate students' digital literacy. Based on the relevant theories of educational technology development, combined with educational practice and case analysis, this paper proposes an improvement plan for postgraduate students' digital literacy education in the AIGC era. [Results/Conclusions] In order to adapt to the changes brought about by AIGC technology, colleges and universities need to innovate in curriculum design, teaching paradigm and evaluation methods, and put forward strategies such as introducing AIGC-related knowledge modules, building interactive digital resources' intelligent recommendation platform, establishing interdisciplinary integration mechanism, strengthening ethical and legal education and establishing supervision mechanism, so as to promote the comprehensive ability of graduate students. Future research can further explore the deep integration path of AIGC technology and postgraduate students' digital literacy education, the high-order thinking practice direction of AIGC to promote digital literacy, and how to give full play to the positive role of AIGC technology in education while ensuring academic integrity. The shortcoming of this study is that because the development of AIGC technology is still in rapid evolution, some of the suggestions of the study may need to be adjusted in time according to the further development of the technology.

Application Models and Innovative Approaches of Smart Libraries from the Perspective of MR Technology | Open Access
Jiaxin HUANG, Xiaofang ZHANG
2024, 36(9):  78-88.  DOI: 10.13998/j.cnki.issn1002-1248.23-0492
Asbtract ( 19 )   HTML ( 5)   PDF (750KB) ( 12 )  
Figures and Tables | References | Related Articles | Metrics

[Purpose/Significance] The continuous development of metaverse technology marks the transition of mankinds from information society to information civilization. How to understand the relationship between human beings, physical space and information space in the future society has become a key problem of the era. The mixed reality (MR) Technology is a new intelligent technology that integrates the advantages of augmented reality (AR) and virtual reality (VR), makes virtual objects coexist in the physical world, and integrates the functions of human perception, computer processing and environmental input. The new generation of MR technology has improved the traditional global understanding of digital reality interaction, and also has been bringing technological innovation opportunities for the development of smart libraries. Exploring the new application scenarios of MR technology is helpful in expanding the depth and breadth of the research on smart libraries. [Method/Process] By using the methods of literature review, content analysis and website analysis, this paper reviews the current research status of MR Technology in the field of library science at home and abroad. In addition, through practical cases, this paper summarizes the relevant experience and existing gaps in the application of MR Technology in domestic and foreign libraries. Therefore, it is clear that the research of this paper aims to further stimulate and release libraries' demand and potential for MR Technology. Specifically speaking, by examining the characteristics of high realism, more intelligent and omni-directional MR technology, this paper further explores the ability of smart libraryies in four dimensions of service, knowledge, experience and collaboration, which will contribute to building a new application scenario of smart libraries with the vision of MR technology. It is hoped that this paper can promote the formation of a new type of smart libraries that combines dynamic and static, actively data, blending virtual-real and multi-dimensional expansion. [Results/Conclusions] In the wave of rapid innovation of VR, the construction of smart libraries should be considered in four dimensions: problem orientation, theoretical supports, talent management and subject co-creation. It can provide a better understanding of the future smart libraries with the possible risks, urgent internal and external needs. It is expected to build a future ecological picture of the integration of smart libraries and MR technology. However, due to the limitation of the author's knowledge level and the lack of practical ability, this paper provides a relatively macro guidance. Libraries vary in their application of MR technology. On specific issues, we need specific analysis and different solutions. Therefore, in the future research, we will continuously improve and refine the research in this aspect, and provide reference basis and application value for the effective practice of MR technology in the smart libraries.

Digital Humanities & Large Language Models: Practice and Research in Semantic Retrieval of Ancient Documents | Open Access
Haoxian WANG, Ziming ZHOU, Feifei DING, Chengfu WEI
2024, 36(9):  89-101.  DOI: 10.13998/j.cnki.issn1002-1248.24-0615
Asbtract ( 20 )   HTML ( 3)   PDF (971KB) ( 12 )  
Figures and Tables | References | Related Articles | Metrics

[Purpose/Significance] Against the backdrop of the increasing popularity of artificial intelligence technology, particularly large language models, this paper aims to explore their applications in the field of digital humanities, with a particular focus on the retrieval of ancient documents. Through the practice and exploration of the ancient document retrieval platform at Peking University Library, this study not only introduces new perspectives and methods to the field of digital humanities, but also promotes academic research and cultural heritage. It also provides practical references for other university libraries, which is an important guide. [Method/Process] The article begins with an overview of the origins and development of the digital humanities, emphasizing its central role in humanities research. The paper then examines the current state of the arts in large language models and analyzes their potential and advantages for identifying and classifying ancient documents, semantic understanding and parsing, and information extraction and association. Through the analysis of practical case studies, this paper constructs a fundamental semantic retrieval model, the core architecture of which consists of two critical components. First, the construction phase of the retrieval engine involves meticulous pre-processing of the ancient document information to generate basic metadata. Using large-scale models, these metadata are subjected to in-depth processing and enhancement to create auxiliary search fields and enriched text. In addition, the text processed by the model and the original text are transformed into semantic vectors, which are then stored in an efficient vector engine for rapid retrieval. Second, the search and sort component is another core part of the model. This part accurately processes the user's search terms through large models to generate extended content and, in conjunction with the search terms, creates accurate semantic vectors. Utilizing the previously constructed vector engine, the model can efficiently retrieve relevant documents and intelligently sort the search results based on specific algorithms, ensuring that users can quickly obtain the most relevant and valuable information. Taking the ancient document system collection data of Peking University Library as the research object, the paper processes over 250,000 records, primarily consisting of ancient books and rubbings, as well as over 10 million metadata items. Using the gradio framework on a server equipped with two NVIDIA RTX 4090 24G graphics cards, a semantic retrieval platform was created to effectively manage and retrieve these vast amounts of data. [Results/Conclusions] The main strengths and contributions of the study lie in the standardized metadata organization, the metadata extension supported by large models, the support for natural language search terms, the fault-tolerant search mechanisms, and the efficient retrieval capabilities of the vector engine. However, there are shortcomings, such as the accuracy of results generated by large models and insufficient comprehensive analysis of user search data. Future efforts will be devoted to improving these issues to increase the effectiveness of the research.