Research Progress of Data Traceability from the Perspective of Data Element Circulation
Wang Xiaoqing1,2,3,Sun Zhanwei1,Wu Junhong4,Du Ziran5,Qian Chengjiang6()
1School of Public Administration, Nanjing University of Finance & Economics, Nanjing 210003, China 2College of Economics and Management, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China 3Hongshan College, Nanjing University of Finance & Economics, Nanjing 210003, China 4Department of Platform Research and Development, Business School, Nanjing Normal University, Nanjing 210023, China 5Department of Platform Research and Development, Greater Bay Area Big Data Research Institute, Shenzhen 518048, China 6Nanjing NJtech Safety Co., Ltd, Nanjing 210047, China
[Objective] The research progress and application scenarios of data traceability are analyzed through literature review, in order to provide reference for the construction of data trading platform, the construction of industrial data governance and the construction of digital government governance. [Methods] The data traceability model, data traceability method and data traceability application are summarized and analyzed, and on this basis, the research status and shortcomings are discussed. [Results] Whether in content description, model construction or scene application, data traceability research has achieved rich results, such as improving the quality of data traceability, ensuring the safety of data traceability and improving the efficiency of data traceability. [Limitations] The research on data traceability from the perspective of factor circulation started relatively late, the research results were not rich enough, the research system had not been formed, and the research focus was biased towards empirical research. [Conclusions] We can actively promote the normalization of data delivery and use by combining with data factor market; speed up the work of data traceability standards, and actively promote the institutionalization of data use; continuously improve the quality of data traceability information, and actively promote the quality of data services; attach great importance to data traceability information security, and actively promote the standardization of data information use; to build a high standard data traceability platform, and actively promote the healthy development of data factor market.
王晓庆, 孙战伟, 吴军红, 杜自然, 钱城江. 基于数据要素流通视角的数据溯源研究进展*[J]. 数据分析与知识发现, 2022, 6(1): 43-54.
Wang Xiaoqing, Sun Zhanwei, Wu Junhong, Du Ziran, Qian Chengjiang. Research Progress of Data Traceability from the Perspective of Data Element Circulation. Data Analysis and Knowledge Discovery, 2022, 6(1): 43-54.
Foster I, Vockler J, Wilde M, et al. Chimera: A Virtual Data System for Representing, Querying, and Automating Data Derivation[C]// Proceedings of the 14th International Conference on Scientific and Statistical Database Management. IEEE, 2002: 37-46.
(How to View the Position of Data Model in Data Management?[EB/OL]. [2019-11-02].https://zhuanlan.zhihu.com/p/75883955 .)
[3]
Buneman P, Khanna S, Wang-Chiew T. Why and Where: A Characterization of Data Provenance[A]//Database Theory — ICDT[M]. Springer Berlin Heidelberg, 2001:316-330.
[4]
Green T J, Karvounarakis G, Tannen V. Provenance[C]// Proceedings of the 26th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. 2007: 31-40.
[5]
Ram S, Liu J. A New Perspective on Semantics of Data Provenance[C]// Proceedings of the 1st International Conference on Semantic Web in Provenance Management - Volume 526. 2009: 35-40.
( Wang Fengyang, Xu Quanjun, Liu Feng, et al. Design and Thinking of Scientific Data Provenance Description Model and Specification[J]. e-Science Technology & Application, 2017, 8(1):27-34.)
( Shen Zhihong, Zhang Xiaolin. Data Provenance Model in Semantic Web Environment:An Overview[J]. New Technology of Library and Information Service, 2011(4):1-8.)
( Ni Jing, Meng Xianxue. The Comparative Analysis of Major Provenance Vocabularies in Linked Data Environment[J]. New Technology of Library and Information Service, 2013(2):18-23.)
(GB/T 34945-2017 Information Technology Data Traceability Description Model[EB/OL]. https://max.book118.com/html/2018/1203/7054141150001162.shtm .)
[13]
Sahoo S S, Barga R S, Goldstein J, et al. Provenance Algebra and Materialized View-based Provenance Management[C]// Proceedings of the 2nd International Provenance and Annotation Workshop. Berlin: Springer, 2008: 531-540.
( Du Ying, Lin Bingxian, Zhou Liangchen, et al. Provenance Method for SAR Data Processing Flow[J]. Geomatics and Information Science of Wuhan University, 2017, 42(5):669-675.)
[15]
袁洁. 基于关联数据技术的空间数据溯源共享研究[D]. 武汉: 武汉大学, 2013.
[15]
( Yuan Jie. Research on Geospatial Data Provenance Sharing Based on Linked Data Approach[D]. Wuhan: Wuhan University, 2013.)
[16]
Hasan R, Sion R, Winslett M. Introducing Secure Provenance: Problems and Challenges[C]//Proceedings of the 2007 ACM Workshop on Storage Security and Survivability. New York: ACM Press, 2007: 13-18.
( Li Xiumei, Wang Fengying. Research on Data Provenance's Security Model[J]. Journal of Shandong University of Technology(Natural Science Edition), 2010, 24(4):56-60.)
( Wang Fengying, Zhang Fang, Zhang Wei. Securing Data Provenance and Creditability Validation Study Based on Big Data of Health Care[J]. Journal of Shandong University of Technology (Natural Science Edition), 2017, 31(6):6-11.)
( Liu Yaozong, Liu Yunheng. Security Provenance Model for RFID Big Data Based on Blockchain[J]. Computer Science, 2018, 45(S2):367-368,381.)
[21]
Liang X P, Shetty S, Tosh D, et al. ProvChain: A Blockchain-Based Data Provenance Architecture in Cloud Environment with Enhanced Privacy and Availability[C]// Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing(CCGRID). IEEE, 2017: 468-477.
( Wang Fang, Zhao Hong, Ma Jiayue, et al. Research and Practice Progress of Data Provenance from the Perspective of Data Science[J]. Journal of Library Science in China, 2019, 45(5):79-100.)
( Zhou Zhong. A Research of Data Provenance Technology and Its Implementation in PostgreSQL[D]. Guangzhou: South China University of Technology, 2016.)
[24]
Karvounarakis G, Ives Z G, Tannen V. Querying Data Provenance[C]// Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 2010: 951-962.
( Wang Liwei, Bao Zhifeng, Koehler Henning, et al. An Approach for Optimizing Relational Provenance Storage[J]. Chinese Journal of Computers, 2011, 34(10):1863-1875.)
[26]
Deutch D, Milo T, Roy S, et al. Circuits for Datalog Provenance[C]// Proceedings of International Conference on Database Theory. 2014: 201-212.
[27]
Chapman A P, Jagadish H V, Ramanan P. Efficient Provenance Storage[C]// Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 2008: 993-1006.
( Wang Liwei, Huang Zeqian, Luo Min, et al. Data Provenance in a Scientific Workflow Service Framework Integrated with Object Deputy Database[J]. Chinese Journal of Computers, 2008, 31(5):721-732.)
( Wu Yuan. Design and Implementation of a Provenance Framework in Workflow System-Nebulas[D]. Kunming: Kunming University of Science and Technology, 2011.)
( Wei Yinzhen, Deng Zhonghua. Research on Data Provenance Collection and Query Framework of Scientific Workflow in Cloud Environment[J]. Information Studies: Theory & Application, 2015, 38(7):115-118.)
[33]
Park H, Ikeda R, Widom J. RAMP:A System for Capturing and Tracing Provenance in MapReduce Workflows[C]// Proceedings of the 37th International Conference on Very Large Data Bases(VLDB 2011). 2011: 1351-1354.
[34]
Saad M I M, Jalil K A, Manaf M. Data Provenance Trusted Model in Cloud Computing[C]// Proceedings of 2013 International Conference on Research and Innovation in Information Systems (ICRIIS). IEEE, 2013: 257-262.
[35]
Zawoad S, Hasan R. SECAP: Towards Securing Application Provenance in the Cloud[C]// Proceedings of IEEE 9th International Conference on Cloud Computing. IEEE, 2016: 900-903.
[36]
Tosh D K, Shetty S, Liang X P, et al. Consensus Protocols for Blockchain-Based Data Provenance: Challenges and Opportunities[C]// Proceedings of the 8th Annual Ubiquitous Computing, Electronics and Mobile Communication Conference. IEEE, 2017: 469-474.
[37]
Kim H M, Laskowski M. Towards an Ontology-Driven Blockchain Design for Supply Chain Provenance [OL]. arXiv Preprint, arXiv:1610.02922.
( Wang Ruojia, Li Pei. Detecting Influenza Epidemics by Comparing and Optimizing Models Based on Internet Search Engine Query Data[J]. Library and Information Service, 2016, 60(18):122-132.)
( Wang Di, Yang Guangyi. Research on Public Opinion Risk Control of Major Public Health Problems Based on Blockchain Traceability Technology[J]. Journal of Hebei University of Engineering (Social Science Edition), 2021, 38(1):30-33.)
(AI Assisted Disease Prediction PingAn Science and Technology Cooperates with Chongqing CDC to Jointly Develop the First Global AI + Big Data Influenza Prediction Model[EB/OL].[2017-07-25]. http://www.pingan.cn/zh/common/cn_news/1500961992328.shtml .
( Zhu Peng, Zhu Xingzhen, Wang Li, et al. Tracing Method of Emergencies Information Cascade Based on Time Series and Information Fusion[J]. Journal of Modern Information, 2018, 38(10):38-42.)
[42]
陈卫哨. 微博突发事件检测及溯源技术研究[D]. 哈尔滨: 哈尔滨工程大学, 2014.
[42]
( Chen Weishao. Burst Event Detection and Initialyzing Technology Research in Micro-Blog[D]. Harbin: Harbin Engineering University, 2014.)
(Jingdong Vientiane Uses Science and Technology to Facilitate Data Circulation and Uses Blockchain Technology to Promote Healthy Development of the Industry[EB/OL]. [2017-01-11].https://wx.jdcloud.com/resources/preview/58?winzoom=1 .)
( Miao Xinping, Wu Yang, Kong Qingbo, et al. Research and Design of Index Data Provenance Model for Power Grid Enterprises[J]. Power Systems and Big Data, 2021, 24(4):70-77.)
( Wang Shu, Sun Shanpeng, Fan Jingchao, et al. Preliminary Study on the Traceability Application of Agricultural Science Data Based on Blockchain[J]. Journal of Agricultural Big Data, 2020, 2(2):47-54.)
(GuideLines Issued by the British Digital Preservation Centre 'Developing Data Management and Sharing Plan'[EB/OL].[2011-11-17]. http://www.ecas.cas.cn/xxkw/kbcd/201115_83713/ml/xxhzlyzc/201111/t20111117_3397761.html
( Gu Jun, Xu Xin. Design and Implementation of a Humanities and Social Sciences Data Sharing Model: A Case Study of Consortium Blockchain[J]. Journal of the China Society for Scientific and Technical Information, 2019, 38(4):354-367.)