农业图书情报学报 ›› 2023, Vol. 35 ›› Issue (5): 16-26.doi: 10.13998/j.cnki.issn1002-1248.23-0386

• AIGC专题 • 上一篇    下一篇

走向“已知之未知”:GPT大语言模型助力实现以人为本的信息检索

寿建琪   

  1. 天津图书馆(天津市少年儿童图书馆),天津 300201
  • 收稿日期:2023-04-14 出版日期:2023-05-05 发布日期:2023-07-26
  • 作者简介:寿建琪(1970- ),女,副研究馆员,研究方向为信息组织、RDA及书目框架(BIBFRAME)本土化研究、信息检索等
  • 基金资助:
    国家社会科学基金一般项目“《资本论》汉英术语知识库建设与应用研究”(18BTQ025)

Towards Known Unknowns: GPT Large Language Models Empower Human-Centered Information Retrieval

SHOU Jianqi   

  1. Tianjin Library (Tianjin Children's Library), Tianjin 300201
  • Received:2023-04-14 Online:2023-05-05 Published:2023-07-26

摘要: [目的/意义]信息检索是公共图书馆服务的核心,具有重要的社会价值。目前两种主流信息检索方法是基于关键词的自顶向下检索和基于人工智能的点对点检索。然而,分别采用这两种方法难以在灵活性与可靠性之间找到平衡,需要开发新的检索策略,实现以人为本的信息检索。[方法/过程]本文基于认知科学理论提出了一种自适应文献检索框架(ALRF)。ALRF的两阶段工作流程可将两种主流检索方式的优势相结合,支持ChatGPT、GPT-4、文心一言等多种大语言模型,针对其在理学与工学、生物与医学、文学与社会学三大类文献的检索能力进行了测试。[结果/结论]ALRF在灵活性与可靠性上均优于两种现有检索方法,更有潜力实现以用户为中心的公共信息检索服务。

关键词: GPT, 大语言模型, 信息检索, 知识管理, AIGC

Abstract: [Purpose/Significance] The foundation of public library services lies within information retrieval (IR), an area that has a profound societal impact through activities such as digital resource integration and the advancement of societal equity. Current methodologies focus primarily on classical keyword-based Online Public Access Catalog (OPAC)-like top-down retrieval and large language model (LLM) based point-to-point retrieval. Unfortunately, these approaches individually fail to strike a balance between flexibility and reliability, hindering the evolution towards user-centric IR systems. Consequently, there is an urgent need for an innovative retrieval strategy that fosters a human-centered IR paradigm. [Method/Process] Contrary to the prevalent school of thought that advocates for the complete substitution of classical OPAC-like approach with LLM methods such as GPT, we put forward a groundbreaking proposal that synergizes the merits of both strategies. This proposition represents the inaugural effort of this kind within the scholarly community of public information service. We introduce the adaptive literature retrieval framework (ALRF), an innovative approach grounded in the principles of cognitive science, addressing the critical user challenge in retrieval - the pursuit of known unknown knowledge (KUK). KUK originates from a user's explicit understanding of the desired outcome, without comprehending the associated domain-specific terminology, thereby lacking the necessary entry point for a keyword-based search. ALRF's novel two-stage workflow caters specifically to such situations: (i) users can identify target keywords or keywords at a more abstract level by entering descriptions in natural language, thus implementing a bottom-up strategy; (ii) utilizing these extracted keywords, users can then conduct a top-down search. ALRF accommodates LLMs such as ChatGPT, GPT-4, and ERNIE Bot. The platform's effectiveness in retrieving literature from diverse fields such as science and engineering, biology and medicine, literature and sociology was carefully evaluated. [Results/Conclusions] The ALRF significantly outperforms standard methods, i.e., LLM-based retrieval service and OPAC-like retrieval service, in terms of both flexibility and reliability. This holds true for tasks involving keyword abstraction (i.e., identifying keywords at a higher level of abstraction in the target domain) and property extraction (i.e., locating keywords with specific attributes but at the same abstraction level as the target domain). Consequently, it addresses the pressing need for KUK retrieval, signifying that ALRF has showcased initial potential to cater to the diverse and personalized retrieval requirements of users. This suggests that ALRF could potentially revolutionize public information services by placing humans at the center of its operation. Regrettably, a current hindrance to the wider adoption of ALRF in public IR in China is the pace of development of powerful LLMs by Chinese corporations. We recommend that researchers remain abreast of such advancements to be cognizant of the realistic possibilities and limitations in real-world applications.

Key words: GPT, large language model, information retrieval, knowledge management, AIGC

中图分类号: 

  • G254.9

引用本文

寿建琪. 走向“已知之未知”:GPT大语言模型助力实现以人为本的信息检索[J]. 农业图书情报学报, 2023, 35(5): 16-26.

SHOU Jianqi. Towards Known Unknowns: GPT Large Language Models Empower Human-Centered Information Retrieval[J]. Journal of Library and Information Science in Agriculture, 2023, 35(5): 16-26.