中文    English

Journal of library and information science in agriculture

   

Data Quality Assessment and Improvement Strategies: A Diagnostic Analysis Based on the Public Basic Databases (Population and Legal Entity Databases) of a City

PAN Yong1, SUN Jing2, WANG Jiandong3   

  1. 1. Nanjing Big Data Group Co. , Ltd, Nanjing 211100
    2. College of Engineering, Peking University, Beijing 100871
    3. Price Monitoring Center, National Development and Reform Commission, Beijing 100837
  • Received:2025-11-21 Online:2026-01-06

Abstract:

[Purpose/Significance] As data become a strategic resource in the digital economy, its quality directly affects the efficiency of value creation and the effectiveness of public governance. However, with the continuous expansion of data scale and the deepening of application scenarios, pervasive quality issues - such as inconsistencies, errors, and redundancies - have emerged as a significant bottleneck restricting the release of data element potential. High-quality public data are particularly critical for empowering government decision-making and optimizing public services. Addressing the urgent practical need for high-quality data supply, this paper relies on the public basic databases (specifically the Population Database and Legal Entity Database) of a representative city to construct a scientific, systematic, and operable data quality assessment system. The study aims to diagnose existing quality defects in these foundational assets and provide theoretical support and actionable references for relevant departments to transition from passive data management to active quality governance. [Method/Process] To ensure the assessment is both scientifically rigorous and practically applicable, this study establishes a comprehensive evaluation framework based on domestic and international research, combined with the national standard GB/T 36344-2018 and local data characteristics. The framework comprises a hierarchical structure with 6 primary indicators (Normativity, Integrity, Consistency, Accuracy, Timeliness, and Accessibility), 17 secondary indicators, and 61 specific detection items. The study employs a dual-track assessment methodology integrating automated detection tools with manual verification. Automated SQL scripts and rule engines are utilized for the large-scale quantitative detection of intrinsic dimensions, while manual checks and interviews address contextual dimensions. This methodology was applied to conduct a multi-dimensional evaluation of 1 367 data items across 102 datasets in the city, ensuring a thorough analysis of the data status. [Results/Conclusions] The evaluation results indicate that while the overall construction of the city's public basic databases is positive, multidimensional quality issues persist. Specifically, the assessment revealed problems such as data coding errors, non-standardized classification, missing data items, missing or duplicate primary keys, inconsistent formats, the presence of illegal characters or outliers, and data delays or discontinuations. To address these challenges, the paper proposes four systematic improvement strategies: 1) To unify data standards and coding systems to ensure consistency across departments; 2) To construct a full-process quality control mechanism covering data collection, storage, and usage; 3) To strengthen technical platform support by implementing real-time monitoring and intelligent warning capabilities; and 4) To improve organizational synergy and institutional guarantees to solidify the management foundation. These measures are intended to optimize data supply quality and support the support the high-quality and sustainable development of the data element market.

Key words: data quality, public basic databases, assessment system, quality management

CLC Number: 

  • G203

Fig.1

Composition of master data in the population database"

Fig.2

Composition of master data in the legal entity database"

Fig.3

Data quality assessment indicator system for public infrastructure databases (population and legal entity databases)"

Table 1

Evaluation results of data quality for public infrastructure databases"

序号 数据质量维度 检测指标 得分 测量粒度 检测方式
一级指标 得分 二级指标
1 数据规范性 94.22 数据标准 手机号码 98.69 字段 程序
2 证件号码 99.76 字段 程序
3 统一社会信用代码 97.11 字段 程序
4 工商注册号 92.52 字段 程序
5 办公电话/传真 98.27 字段 程序
6 邮政编码 84.83 字段 程序
7 权威参考数据源 证件类型 77.54 字段 程序
8 地区代码 82.22 字段 程序
9 行业代码 97.58 字段 程序
10 企业类型代码 99.92 字段 程序
11 组织机构代码 98.6 字段 程序
12 经营范围 97.3 字段 程序
13 残疾类别 100 字段 程序
14 残疾等级 100 字段 程序
15 学校代码 100 字段 程序
16 行政许可事项分类代码 99.91 字段 程序
17 业务规则 残疾人证号 99.99 字段 程序
18 医保系统单位编码 100 字段 程序
19 银行卡号编码 100 字段 程序
20 药品经营许可证编号 100 字段 程序
21 食品小作坊登记证编号 91.86 字段 程序
22 临时建设用地规划许可证编码 91.2 字段 程序
23 食品生产许可证编号 100 字段 程序
24 食品经营许可证编号 94.92 字段 程序
25 医疗器械注册证编号 97.74 字段 程序
26 个体工商户字号 98.28 字段 程序
27 个体工商户组织形式 90.25 字段 程序
28 特种设备目录 99.01 字段 程序
29 安全规范 隐私字段加密检测 96.28 字段 程序
30 数据完整性 78.06 数据记录完整性 数据备份与恢复 100 数据集 访谈
31 数据记录完整性检测 89.35 数据集 程序
32 数据更新完整性 100 数据集 访谈
33 覆盖量 100 数据集 访谈
34 数据元素完整性 地址完整性检测 80.34 字段 程序
35 数据完整性校验 0.00 字段 程序
36 数据一致性 93.43 关联数据一致性 关联性检测 83.15 数据集 程序
37 相同数据一致性 日期一致性检测 98.59 字段 程序
38 数值一致性检测 97.85 字段 程序
39 值域一致性检测 94.02 字段 程序
40 数据编码一致性检测 99.97 数据集 程序
41 主键一致性检测 72.72 字段 程序
42 地址一致性检测 79.31 字段 程序
43 计量单位一致性检测 100 字段 程序
44 数据准确性 97.63 数据格式合规性 数值范围检测 96.42 字段 程序
45 数据内容正确性 数据来源 100 数据集 访谈
46 数据收集方式 100 数据集 访谈
47 数据更新策略 100 数据集 访谈
48 数据唯一性 唯一标识检测 97.70 数据集 程序
49 数据重复率 重复记录检测 98.26 字段 程序
50 脏数据出现率 错误值检测 89.72 字段 程序
51 数据时效性 87.70 基于时间段的正确性 时效正确性检测 84.94 数据集 程序
52 基于时间点的及时性 数据更新频率 100 数据集 访谈
53 数据采集时间戳检测 69.45 数据集 程序
54 数据延迟检测 93.85 数据集 程序
55 数据断更检测 90.20 数据集 程序
56 数据可访问性 100 可访问 数据授权机制 100 数据集 访谈
57 数据共享能力 100 数据集 访谈
58 可用性 接口可用性 100 数据集 访谈
59 数据文档完整性 100 数据集 访谈
60 数据访问性能 100 数据集 访谈
61 数据可理解性 100 数据集 访谈
[1]
国家发展改革委等部门关于印发《深化智慧城市发展推进全域数字化转型行动计划》的通知[EB/OL]. [2025-10-31].
[2]
全国数据资源调查工作组(国家工业信息安全发展研究中心). 全国数据资源调查报告(2024年)[EB/OL]. [2025-05-14].
[3]
清华大学计算社会科学与国家治理实验室, 中国电子信息行 业联合会数据治理专业委员会. 中国地方数据发展报告(2023年)[EB/OL]. [2024-09-10].
[4]
张立. 畅通数据汇聚、供给、利用堵点 凝力推进数据集高质量建设[EB/OL]. [2025-03-06].
[5]
刘智锋, 王继民, 李倩, 等. 元数据质量评价研究综述[J]. 情报理论与实践, 2022, 45(7): 42-48.
LIU Z F, WANG J M, LI Q, et al. Review of metadata quality evaluation research[J]. Information studies: Theory & application, 2022, 45(7): 42-48.
[6]
唐勇, 李东鹏, 林娟娟, 等. 基于效用的数据质量综合评估方法探讨[J]. 财会月刊, 2024, 45(16): 110-116.
TANG Y, LI D P, LIN J J, et al. Discussion on comprehensive evaluation method of data quality based on utility[J]. Finance and accounting monthly, 2024, 45(16): 110-116.
[7]
高秦伟. 人工智能数据质量保障的规范性探究[J]. 学术研究, 2025(9): 50-59, 177.
GAO Q W. Normative exploration of AI data quality assurance[J]. Academic research, 2025(9): 50-59, 177.
[8]
高志, 樊锐轶, 耿少博, 等. 基于大数据技术的电力数据质量评估数据框架研究[J]. 电子器件, 2022, 45(1): 194-198.
GAO Z, FAN R Y, GENG S B, et al. Research on data framework of power data quality assessment based on big data technology[J]. Chinese journal of electron devices, 2022, 45(1): 194-198.
[9]
计蓉, 侯慧娟, 盛戈皞, 等. 基于组合赋权法和模糊综合评价的电力设备状态数据质量评估[J]. 高电压技术, 2024, 50(1): 274-281.
JI R, HOU H J, SHENG G H, et al. Quality evaluation of power equipment status data based on combination weighting method and fuzzy comprehensive evaluation[J]. High voltage engineering, 2024, 50(1): 274-281.
[10]
杨雪洁, 刘佳, 吴青筱, 等. 多模态医养大数据动态聚合与智慧服务模式[J]. 农业图书情报学报, 2025, 37(4): 24-38.
YANG Xuejie, LIU Jia, WU Qingxiao,et al. Big data dynamic aggregation and intelligent service model for multimodal healthcare and eldercare[J]. Journal of library and information science in agriculture, 2025, 37(4): 24-38.
[11]
王晓华, 苏宏业, 渠瑜, 等. 面向电信欠费挖掘的数据质量评估策略研究[J]. 计算机工程与应用, 2011, 47(12): 220-224, 233.
WANG X H, SU H Y, QU Y, et al. Research on data quality evaluation strategy for telecom arrears mining[J]. computer engineering and applications, 2011, 47(12): 220-224, 233.
[12]
岳明桥, 马跃飞, 翟一琛, 等. 装备试验数据质量综合评价指标体系与建模方法[J]. 舰船科学技术, 2023, 45(12): 173-177.
YUE M Q, MA Y F, ZHAI Y C, et al. Comprehensive evaluation index system and modeling method of equipment test data quality[J]. Ship science and technology, 2023, 45(12): 173-177.
[13]
雷财林, 赵聪, 娄刃, 等. 路侧感知车辆轨迹数据的质量评估方法[J]. 华南理工大学学报(自然科学版), 2024, 52(6): 56-72.
LEI C L, ZHAO C, LOU R, et al. Quality evaluation method of roadside perception vehicle trajectory data[J]. Journal of South China University of technology (natural science edition), 2024, 52(6): 56-72.
[14]
尹春荣, 李媛, 曲雪妍, 等. 中国地质灾害数据质量评价指标体系构建[J]. 中国地质灾害与防治学报, 2021, 32(4): 120-125.
YIN C R, LI Y, QU X Y, et al. Construction of geological disaster data quality evaluation index system in China[J]. The Chinese journal of geological hazard and control, 2021, 32(4): 120-125.
[15]
CHEN M J, SONG M N, HAN J, et al. Survey on data quality[C]//2012 World Congress on Information and Communication Technologies. Trivandrum, India: IEEE, 2013: 1009-1013.
[16]
安小米, 黄婕, 许济沧, 等. 全景式大数据质量评估指标框架构建研究[J]. 管理科学学报, 2023, 26(5): 138-153.
AN X M, HUANG J, XU J C, et al. Research on the construction of panoramic big data quality evaluation index framework[J]. Journal of management sciences in China, 2023, 26(5): 138-153.
[17]
SIDI F, SHARIAT PANAHY P H, AFFENDEY L S, et al. Data quality: A survey of data quality dimensions[C]//2012 International Conference on Information Retrieval & Knowledge Management. Kuala Lumpur, Malaysia: IEEE, 2012: 300-304.
[18]
CABALLERO I, PIATTINI M. CALDEA: A data quality model based on maturity levels[C]//Third International Conference on Quality Software Dallas, TX, USA: IEEE, 2004: 380-387.
[19]
BALLOU D P, PAZER H L. Modeling data and process quality in multi-input, multi-output information systems[J]. Management science, 1985, 31(2): 150-162.
[20]
BALLOU D, WANG R, PAZER H, et al. Modeling information manufacturing systems to determine information product quality[J]. Management science, 1998, 44(4): 462-484.
[21]
WAND Y, WANG R Y. Anchoring data quality dimensions in ontological foundations[J]. Communications of the acm, 1996, 39(11): 86-95.
[22]
MCGILVRAY, DANETTE. Executing data quality projects: Ten steps to quality data and trusted information[M]. London: Academic Press, 2021.
[23]
BAILEY J E, PEARSON S W. Development of a tool for measuring and analyzing computer user satisfaction[J]. Management science, 1983, 29(5): 530-545.
[24]
DELONE W H, MCLEAN E R. Information systems success: The quest for the dependent variable[J]. Information systems research, 1992, 3(1): 60-95.
[25]
LAUDON K C. Data quality and due process in large interorganizational record systems[J]. Communications of the ACM, 1986, 29(1): 4-11.
[26]
WANG R Y, STOREY V C, FIRTH C P. A framework for analysis of data quality research[J]. IEEE transactions on knowledge and data engineering, 1995, 7(4): 623-640.
[27]
EVEN A, SHANKARANARAYANAN G, BERGER P D. Evaluating a model for cost-effective data quality management in a real-world CRM setting[J]. Decision support systems, 2010, 50(1): 152-163.
[28]
GHASEMAGHAEI M, EBRAHIMI S, HASSANEIN K. Data analytics competency for improving firm decision making performance[J]. The journal of strategic information systems, 2018, 27(1): 101-113.
[29]
DANISH R Q, ASGHAR J, AHMAD Z, et al. Factors affecting "entrepreneurial culture": The mediating role of creativity[J]. Journal of innovation and entrepreneurship, 2019, 8(1): 14.
[30]
OUECHTATI I. Financial inclusion, institutional quality, and inequality: An empirical analysis[J]. Journal of the knowledge economy, 2023, 14(2): 620-644.
[31]
苏宇. 公共数据质量的制度保障[J]. 行政法学研究, 2025(4): 85-100.
SU Y. Institutional guarantee for the quality of public data[J]. Administrative law review, 2025(4): 85-100.
[32]
BATINI C, CAPPIELLO C, FRANCALANCI C, et al. Methodologies for data quality assessment and improvement[J]. ACM computing surveys, 2009, 41(3): 1-52.
[33]
ZAVERI A, RULA A, MAURINO A, et al. Quality assessment for linked data:  A survey: A systematic literature review and conceptual framework[J]. Semantic web, 2015, 7(1): 63-93.
[34]
HAUG A. Understanding the differences across data quality classifications: A literature review and guidelines for future research[J]. Industrial management & data systems, 2021, 121(12): 2651-2671.
[35]
VALVERDE C, MAROTTA A, PANACH J I, et al. Towards a model and methodology for evaluating data quality in software engineering experiments[J]. Information and software technology, 2022, 151: 107029.
[36]
刘祺, 黄君义, 冯耕中, 等. 数字经济发展根基: 数据质量研究述评与未来展望[J]. 系统工程理论与实践, 2025, 45(7): 2101-2123.
LIU Q, HUANG J Y, FENG G Z, et al. Foundations for digital economy development: A review and future prospects of data quality research[J]. Systems engineering-theory & practice, 2025, 45(7): 2101-2123.
[1] SUN Lili, YUAN Qinjian. Review of Data Quality Research: Comparative Perspective [J]. Journal of library and information science in agriculture, 2019, 31(7): 4-13.
[2] YAO Xiaolu. The Composition and Cultivation of Embedded Subject Librarians’ data Quality in the era of big data [J]. Journal of library and information science in agriculture, 2017, 29(2): 198-201.
[3] WU Dan. Study on the Role of Patron in Library Total Quality Management [J]. Journal of library and information science in agriculture, 2015, 27(11): 204-206.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!