农业图书情报学报

• •    

高质量AI数据体系面临的数据版权困境、应对策略解析与实施路径研究

张何灿1, 易成岐1(), 郭鹏2, 黄倩倩1,3, 靳晓锟4   

  1. 1. 国家信息中心大数据发展部,北京 100045
    2. 深圳数聚湾区大数据研究院战略研究中心,深圳 518048
    3. 中国人民大学 信息资源管理学院,北京 100872
    4. 中国科学院科技战略咨询研究院,北京 100190
  • 收稿日期:2024-05-20 出版日期:2024-10-14
  • 通讯作者: 易成岐
  • 作者简介:

    张何灿,研究实习员,国家信息中心大数据发展部人工智能处,研究方向为大数据与数字经济、人工智能语料体系等

    郭鹏,高级咨询师,深圳数聚湾区大数据研究院(粤港澳大湾区大数据研究院)战略研究中心,研究方向为人工智能、软件工程

    黄倩倩,助理研究员,国家信息中心大数据发展部人工智能处,中国人民大学信息资源管理学院,博士研究生,研究方向为人工智能、数据要素等

    靳晓锟,博士研究生,中国科学院科技战略咨询研究院,研究方向为复杂网络、网络舆情等

  • 基金资助:
    国家自然科学基金专项项目“融合共票机制的元宇宙数字资产理论与方法研究”(62441206); 国家社会科学基金青年项目“面向多语种社会科学数据的线索发现方法研究”(22CTQ025); 国家社会科学基金青年项目“数据要素影响税收体系的机理及优化; 路径研究”(24CJY048)

Copyright Data Dilemma of Building High-quality Data System for AI: Present Situation, Coping Strategies, and Implementation Path

Hecan ZHANG1, Chengqi YI1(), Peng GUO2, Qianqian HUANG1,3, Xiaokun JIN4   

  1. 1. Department of Big Data Development, State Information Center, Beijing 100045
    2. Centre for Strategic Studies, Greater Bay Area Big Data Research Institute, Shenzhen 518048
    3. School of Information Resource Management, Renmin University of China, Beijing 100872
    4. Institutes of Science and Development, Chinese Academy of Sciences, Beijing 100190
  • Received:2024-05-20 Online:2024-10-14
  • Contact: Chengqi YI

摘要:

[目的/意义] 党的二十届三中全会决定明确提出,完善推动人工智能等战略性产业发展政策和治理体系。近年来,全球人工智能版权数据诉讼纷争频发,人工智能训练数据版权保护困境成为构建高质量AI数据体系面临的关键堵点和现实难题。 [方法/过程] 本研究在研究梳理人工智能数据版权保护相关学术研究和产业实践的基础上,系统性总结了应对数据版权困境的六大代表性做法,对比解析了不同做法的优缺点和适用性。 [结果/结论] 针对人工智能数据版权困境,即暂无既能促进人工智能版权数据供给又能兼顾数据版权保护工作的最优解问题,本研究在充分参考六大代表性做法解析和结合中国具备的四大独特优势基础上,研究提出系统妥善解决数据版权困境筑牢高质量AI数据体系的总体实施路径构想,分别为打造国家级人工智能数据版权一体化综合服务平台,探索推进适应人工智能发展的数据版权综合改革试点,建立完善人工智能数据版权相关立法并推动行业自律,以期对加大中国人工智能版权数据供给、制定相关政策和推动工作提供有益参考。

关键词: 人工智能, AI数据体系, 版权保护, 数据版权, 数据要素

Abstract:

[Purpose/Significance] Improving the policy and governance systems to promote the development of strategic industries such as artificial intelligence was explicitly proposed in the resolution of the Third Plenary Session of the 20th Central Committee of the Communist Party of China. In recent years, the conflict between AI companies' desire for copyrighted data and the copyright holders' protection of copyrighted data has become increasingly apparent. There have been a number of lawsuits and disputes around the world regarding copyright infringement caused by artificial intelligence. The dilemma of copyright protection of AI training data has become a difficulty and bottleneck that urgently needs to be resolved in the development of high-quality data system for AI. [Method/Process] Based on the academic research and industrial practice on the copyright protection of AI data, this study systematically summarizes six representative approaches to address the copyright dilemma of AI training data, and provides a comparative analysis of the advantages, disadvantages, and applicability of these approaches. The six representative approaches are: signing a license agreement by both parties, initiating special plans or forming alliances, introducing a copyright notice mechanism, introducing a copyright risk guarantee mechanism, replacing with synthetic data, and applying copyright detection tools to large language models. For the copyright dilemma of AI training data, there is no optimal solution that can both encourage the supply of AI copyright training data and protect the copyright of data. [Results/Conclusions] In order to provide helpful references for increasing the supply of AI copyright data, formulating relevant policies, and promoting related work, this study has proposed a concept of general implementation path to build a high-quality data system for AI to solve the copyright dilemma of AI training data, based on the comparative analysis of the above six representative approaches and combined with China's four unique advantages. These include: 1) Integrating existing platforms to build a national-level integrated service platform for copyright data for AI, with state-owned enterprises (SOEs) under the direct administration of the central government taking the lead in establishing a national copyright data alliance and connecting copyright data to the platform. 2) To collaborate with local pilots of data intellectual property rights, explore and promote comprehensive reform pilot programs of copyright data adapted to the development of AI, and continuously strengthen the cooperation efforts and willingness between AI enterprises and copyright holders. 3) The focus should be on principled or critical issues, establishing and improving legislation related to copyright data for AI and promoting industry self-regulation.

Key words: artificial intelligence, data system for AI, copyright protection, copyright data, data elements

中图分类号:  TP3-05,TP271

引用本文

张何灿, 易成岐, 郭鹏, 黄倩倩, 靳晓锟. 高质量AI数据体系面临的数据版权困境、应对策略解析与实施路径研究[J/OL]. 农业图书情报学报. https://doi.org/10.13998/j.cnki.issn1002-1248.24-0475.

Hecan ZHANG, Chengqi YI, Peng GUO, Qianqian HUANG, Xiaokun JIN. Copyright Data Dilemma of Building High-quality Data System for AI: Present Situation, Coping Strategies, and Implementation Path[J/OL]. Journal of Library and Information Science in Agriculture. https://doi.org/10.13998/j.cnki.issn1002-1248.24-0475.