|
|
Multi-Truth Discovery Method Based on Attribute Fusion |
Yang Haolin1,Dong Yongquan1,2(),Chen Huafeng1,Zhang Guoxi1 |
1School of Computer Science and Technology, Jiangsu Normal University, Xuzhou 221008, China 2Xuzhou Engineering Research Center of Cloud Computing, Xuzhou 221100, China |
|
|
Abstract [Objective] This paper adds influence of auxiliary attributes to the existing models for multi-truth discovery, aiming to improve their F1 values. [Methods] First, we used the auxiliary attributes to calculate the source expertise and consensus degree. Then, we combined the activity degree of multi-truth attribute values to get the degree of support from the source for the conflicting data. Third, we called the existing truth discovery methods to obtain the pseudo tags of the truth. Finally, we used the neural network to capture the complex relationship between the sources and the conflicting data, and identified all truth. [Results] Compared with the sub-optimal model, our method improved the F1 value by 2.25% on the book dataset and by 5.42% on the movie dataset. [Limitations] The proposed method included auxiliary attributes reflecting object features, and more research is needed to explore the impacts of other auxiliary attributes on multi-truth discovery. [Conclusions] The proposed method could effectively discover multi-truth.
|
Received: 10 April 2022
Published: 13 January 2023
|
|
Fund:National Natural Science Foundation of China(61872168);Postgraduate Research Innovation Project of Jiangsu Normal University(2021XKT1381) |
Corresponding Authors:
Dong Yongquan
E-mail: tomdyq@163.com
|
[1] |
刘伟, 孟小峰, 孟卫一. Deep Web数据集成研究综述[J]. 计算机学报, 2007, 30(9): 1475-1489.
|
[1] |
(Liu Wei, Meng Xiaofeng, Meng Weiyi. A Survey of Deep Web Data Integration[J]. Chinese Journal of Computers, 2007, 30(9): 1475-1489.)
|
[2] |
李建中, 王宏志, 高宏. 大数据可用性的研究进展[J]. 软件学报, 2016, 27(7): 1605-1625.
|
[2] |
(Li Jianzhong, Wang Hongzhi, Gao Hong. State-of-the-Art of Research on Big Data Usability[J]. Journal of Software, 2016, 27(7): 1605-1625.)
|
[3] |
Bleiholder J, Naumann F. Data Fusion[J]. ACM Computing Surveys, 2009, 41(1): 1-41.
|
[4] |
Dong X L, Naumann F. Data Fusion- Resolving Data Conflicts for Integration[J]. Proceedings of the VLDB Endowment, 2009, 2(2): 1654-1655.
doi: 10.14778/1687553.1687620
|
[5] |
Li Y L, Gao J, Meng C S, et al. A Survey on Truth Discovery[J]. ACM SIGKDD Explorations Newsletter, 2016, 17(2): 1-16.
|
[6] |
Yin X X, Han J W, Yu P S. Truth Discovery with Multiple Conflicting Information Providers on the Web[J]. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(6): 796-808.
doi: 10.1109/TKDE.2007.190745
|
[7] |
Dong X L, Berti-Equille L, Srivastava D. Truth Discovery and Copying Detection in a Dynamic World[J]. Proceedings of the VLDB Endowment, 2009, 2(1): 562-573.
doi: 10.14778/1687627.1687691
|
[8] |
Dong X L, Berti-Équille L, Srivastava D. Integrating Conflicting Data: The Role of Source Dependence[J]. Proceedings of the VLDB Endowment, 2009, 2(1): 550-561.
doi: 10.14778/1687627.1687690
|
[9] |
Galland A, Abiteboul S, Marian A, et al. Corroborating Information from Disagreeing Views[C]// Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. 2010: 131-140.
|
[10] |
Qi G J, Aggarwal C C, Han J, et al. Mining Collective Intelligence in Diverse Groups[C]// Proceedings of the 22nd International Conference on World Wide Web. 2013: 1041-1052.
|
[11] |
Zhao B, Rubinstein B I P, Gemmell J, et al. A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration[J]. Proceedings of the VLDB Endowment, 2012, 5(6): 550-561.
doi: 10.14778/2168651.2168656
|
[12] |
Zhao B, Han J W. A Probabilistic Model for Estimating Real-Valued Truth from Conflicting Sources[C]// Proceedings of the 10th International Workshop on Quality in Databases, in Conjunction with VLDB 2012. 2012.
|
[13] |
Wang X Z, Sheng Q Z, Fang X S, et al. An Integrated Bayesian Approach for Effective Multi-Truth Discovery[C]// Proceedings of the 24th ACM International Conference on Information and Knowledge Management. 2015: 493-502.
|
[14] |
马如霞, 孟小峰. 基于数据源分类可信性的真值发现方法研究[J]. 计算机研究与发展, 2015, 52(9): 1931-1940.
|
[14] |
(Ma Ruxia, Meng Xiaofeng. Truth Discovery Based Credibility of Data Categories on Data Sources[J]. Journal of Computer Research and Development, 2015, 52(9): 1931-1940.)
|
[15] |
马如霞, 孟小峰, 王璐, 等. MTruths: Web信息多真值发现方法[J]. 计算机研究与发展, 2016, 53(12): 2858-2866.
|
[15] |
(Ma Ruxia, Meng Xiaofeng, Wang Lu, et al. MTruths: An Approach of Multiple Truths Finding from Web Information[J]. Journal of Computer Research and Development, 2016, 53(12): 2858-2866.)
|
[16] |
Canalle G K, Salgado A C, Loscio B F. A Survey on Data Fusion: What for? in What Form? What is Next?[J]. Journal of Intelligent Information Systems, 2021, 57(1): 25-50.
doi: 10.1007/s10844-020-00627-4
|
[17] |
卢菁, 胡成, 刘丛. 利用属性集相关性与源误差的多真值发现方法研究[J]. 小型微型计算机系统, 2019, 40(3): 601-605.
|
[17] |
(Lu Jing, Hu Cheng, Liu Cong. Research on Multi-Truth Discovery Using Attribute Set Correlation and Source Error[J]. Journal of Chinese Computer Systems, 2019, 40(3): 601-605.)
|
[18] |
Chen H F, Dong Y Q, Gu Q, et al. An End-to-End Deep Neural Network for Truth Discovery[C]// Proceedings of the International Conference on Web Information Systems and Applications. 2020: 377-387.
|
[19] |
Fang X S, Sheng Q Z, Wang X Z, et al. SmartVote: A Full-Fledged Graph-Based Model for Multi-Valued Truth Discovery[J]. World Wide Web, 2019, 22(4): 1855-1885.
doi: 10.1007/s11280-018-0629-3
|
[20] |
Lin X L, Chen L. Domain-Aware Multi-Truth Discovery from Conflicting Sources[J]. Proceedings of the VLDB Endowment, 2018, 11(5): 635-647.
doi: 10.1145/3187009.3177739
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|