农业图书情报学报

• •    

数据质量评估及改进策略——基于某市公共基础数据库(人口库和法人库)的检测分析

潘永1, 孙静2, 王建冬3   

  1. 1. 南京大数据集团有限公司,南京 211100
    2. 北京大学 工学院,北京 100871
    3. 国家发展和改革委员会价格监测中心,北京 100837
  • 收稿日期:2025-11-21 出版日期:2026-01-06
  • 作者简介:

    潘永(1995- ),男,硕士,南京大数据集团有限公司数据应用岗,助理工程师,研究方向为政府数据开放、数据要素治理、数据开发利用等

    孙静(1991- ),女,硕士,北京大学工学院,图书馆馆员,研究方向为数字图书馆、图书馆建设、科学评价、数据要素治理等

    王建冬(1982- ),男,博士,研究员,国家发展和改革委员会价格监测中心,副主任,研究方向为数据定价、大数据分析、数据要素治理等

  • 基金资助:
    国家社会科学基金重大项目“政务数据赋能数字政府效能提升的机制与路径研究”(25&ZD218)

Data Quality Assessment and Improvement Strategies: A Diagnostic Analysis Based on the Public Basic Databases (Population and Legal Entity Databases) of a City

PAN Yong1, SUN Jing2, WANG Jiandong3   

  1. 1. Nanjing Big Data Group Co. , Ltd, Nanjing 211100
    2. College of Engineering, Peking University, Beijing 100871
    3. Price Monitoring Center, National Development and Reform Commission, Beijing 100837
  • Received:2025-11-21 Online:2026-01-06

摘要:

【目的/意义】 随着数据规模的不断扩大、应用深度和广度的逐渐加强,数据质量问题成为数据要素潜力释放的重要瓶颈。本研究依托某市公共基础数据库(人口库与法人库)构建科学系统、可操作的数据质量评估体系,为相关部门开展数据质量提升实践提供借鉴和参考。 【方法/过程】 基于国内外数据质量评估相关研究,结合国家标准及地方数据特征,构建涵盖6个一级指标、17个二级指标和61个检测项的数据质量评估框架,综合采用自动化检测工具与人工校验相结合的方式,对该市102个数据集共1 367个数据项开展多维度质量评测。 【结果/结论】 评测结果显示,该市公共基础数据库建设总体情况较好,但仍存在数据编码错误、分类不规范、数据项缺失、主键缺少或重复、格式不一致、存在非法字符或异常值、数据延迟或断更等质量问题。据此,提出统一数据标准、构建全流程质量控制机制、强化技术平台支撑与实时监控、完善组织协同与制度保障等改进措施。

关键词: 数据质量, 公共基础数据库, 评估体系, 质量管理

Abstract:

[Purpose/Significance] As data become a strategic resource in the digital economy, its quality directly affects the efficiency of value creation and the effectiveness of public governance. However, with the continuous expansion of data scale and the deepening of application scenarios, pervasive quality issues - such as inconsistencies, errors, and redundancies - have emerged as a significant bottleneck restricting the release of data element potential. High-quality public data are particularly critical for empowering government decision-making and optimizing public services. Addressing the urgent practical need for high-quality data supply, this paper relies on the public basic databases (specifically the Population Database and Legal Entity Database) of a representative city to construct a scientific, systematic, and operable data quality assessment system. The study aims to diagnose existing quality defects in these foundational assets and provide theoretical support and actionable references for relevant departments to transition from passive data management to active quality governance. [Method/Process] To ensure the assessment is both scientifically rigorous and practically applicable, this study establishes a comprehensive evaluation framework based on domestic and international research, combined with the national standard GB/T 36344-2018 and local data characteristics. The framework comprises a hierarchical structure with 6 primary indicators (Normativity, Integrity, Consistency, Accuracy, Timeliness, and Accessibility), 17 secondary indicators, and 61 specific detection items. The study employs a dual-track assessment methodology integrating automated detection tools with manual verification. Automated SQL scripts and rule engines are utilized for the large-scale quantitative detection of intrinsic dimensions, while manual checks and interviews address contextual dimensions. This methodology was applied to conduct a multi-dimensional evaluation of 1 367 data items across 102 datasets in the city, ensuring a thorough analysis of the data status. [Results/Conclusions] The evaluation results indicate that while the overall construction of the city's public basic databases is positive, multidimensional quality issues persist. Specifically, the assessment revealed problems such as data coding errors, non-standardized classification, missing data items, missing or duplicate primary keys, inconsistent formats, the presence of illegal characters or outliers, and data delays or discontinuations. To address these challenges, the paper proposes four systematic improvement strategies: 1) To unify data standards and coding systems to ensure consistency across departments; 2) To construct a full-process quality control mechanism covering data collection, storage, and usage; 3) To strengthen technical platform support by implementing real-time monitoring and intelligent warning capabilities; and 4) To improve organizational synergy and institutional guarantees to solidify the management foundation. These measures are intended to optimize data supply quality and support the support the high-quality and sustainable development of the data element market.

Key words: data quality, public basic databases, assessment system, quality management

中图分类号:  G203,TP311.13,D630.1

引用本文

潘永, 孙静, 王建冬. 数据质量评估及改进策略——基于某市公共基础数据库(人口库和法人库)的检测分析[J/OL]. 农业图书情报学报. https://doi.org/10.13998/j.cnki.issn1002-1248.25-0664.

PAN Yong, SUN Jing, WANG Jiandong. Data Quality Assessment and Improvement Strategies: A Diagnostic Analysis Based on the Public Basic Databases (Population and Legal Entity Databases) of a City[J/OL]. Journal of library and information science in agriculture. https://doi.org/10.13998/j.cnki.issn1002-1248.25-0664.