Please wait a minute...
Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (11): 52-60    DOI: 10.11925/infotech.2096-3467.2022.0286
Current Issue | Archive | Adv Search |
Multi-Truth Discovery Method Based on Attribute Fusion
Yang Haolin1,Dong Yongquan1,2(),Chen Huafeng1,Zhang Guoxi1
1School of Computer Science and Technology, Jiangsu Normal University, Xuzhou 221008, China
2Xuzhou Engineering Research Center of Cloud Computing, Xuzhou 221100, China
Download: PDF (970 KB)   HTML ( 13
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper adds influence of auxiliary attributes to the existing models for multi-truth discovery, aiming to improve their F1 values. [Methods] First, we used the auxiliary attributes to calculate the source expertise and consensus degree. Then, we combined the activity degree of multi-truth attribute values to get the degree of support from the source for the conflicting data. Third, we called the existing truth discovery methods to obtain the pseudo tags of the truth. Finally, we used the neural network to capture the complex relationship between the sources and the conflicting data, and identified all truth. [Results] Compared with the sub-optimal model, our method improved the F1 value by 2.25% on the book dataset and by 5.42% on the movie dataset. [Limitations] The proposed method included auxiliary attributes reflecting object features, and more research is needed to explore the impacts of other auxiliary attributes on multi-truth discovery. [Conclusions] The proposed method could effectively discover multi-truth.

Key wordsMulti-Truth Discovery      Data Conflicts      Information Quality      Multi-Truth Attribute      Auxiliary Attribute     
Received: 10 April 2022      Published: 13 January 2023
ZTFLH:  TP311  
Fund:National Natural Science Foundation of China(61872168);Postgraduate Research Innovation Project of Jiangsu Normal University(2021XKT1381)
Corresponding Authors: Dong Yongquan     E-mail: tomdyq@163.com

Cite this article:

Yang Haolin,Dong Yongquan,Chen Huafeng,Zhang Guoxi. Multi-Truth Discovery Method Based on Attribute Fusion. Data Analysis and Knowledge Discovery, 2022, 6(11): 52-60.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022.0286     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2022/V6/I11/52

数据源 演员列表 电影时长/min 电影种类
IMDB Dainel Radcliffe; Emma Waston; Rupert Grint 152 奇幻,冒险
FilmCrave Dainel Radcliffe 158 奇幻
Good Films Johnny Depp; Emma Waston; Dainel Radcliffe 155 奇幻,冒险
Movie Insider J. K. Rowling 142 奇幻
Information about The Movie Harry Potter Provided by Four Websites
电影网站 喜剧 奇幻 纪录片 科幻
IMDB 18 279 16 013 31 750 29 056
FilmCrave 14 000 1 501 5 523 6 781
Good Films 20 708 5 136 11 551 17 408
Movie Insider 51 253 8 082 33 044 30 003
总计 104 240 22 650 81 868 83 248
Number of Movie Websites Offering Different Kinds of Movies
The Flow Chart of AFMTD
AFMTD Model Structure
方法 图书数据集 电影数据集
Recall Precision F1值 Recall Precision F1值
Majority Voting 0.712 1 0.870 0 0.783 1 0.577 6 0.834 8 0.681 5
TruthFinder 0.818 3 0.813 3 0.815 8 0.770 5 0.923 9 0.840 3
LTM 0.921 8 0.770 0 0.839 1 0.780 0 0.855 9 0.809 4
DART 0.973 1 0.575 5 0.723 2 0.926 2 0.783 8 0.848 7
AFMTD 0.889 6 0.828 6 0.858 0 0.912 8 0.877 4 0.894 7
Algorithm Performance
Effect of Threshold Change
Ablation Experiments
[1] 刘伟, 孟小峰, 孟卫一. Deep Web数据集成研究综述[J]. 计算机学报, 2007, 30(9): 1475-1489.
[1] (Liu Wei, Meng Xiaofeng, Meng Weiyi. A Survey of Deep Web Data Integration[J]. Chinese Journal of Computers, 2007, 30(9): 1475-1489.)
[2] 李建中, 王宏志, 高宏. 大数据可用性的研究进展[J]. 软件学报, 2016, 27(7): 1605-1625.
[2] (Li Jianzhong, Wang Hongzhi, Gao Hong. State-of-the-Art of Research on Big Data Usability[J]. Journal of Software, 2016, 27(7): 1605-1625.)
[3] Bleiholder J, Naumann F. Data Fusion[J]. ACM Computing Surveys, 2009, 41(1): 1-41.
[4] Dong X L, Naumann F. Data Fusion- Resolving Data Conflicts for Integration[J]. Proceedings of the VLDB Endowment, 2009, 2(2): 1654-1655.
doi: 10.14778/1687553.1687620
[5] Li Y L, Gao J, Meng C S, et al. A Survey on Truth Discovery[J]. ACM SIGKDD Explorations Newsletter, 2016, 17(2): 1-16.
[6] Yin X X, Han J W, Yu P S. Truth Discovery with Multiple Conflicting Information Providers on the Web[J]. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(6): 796-808.
doi: 10.1109/TKDE.2007.190745
[7] Dong X L, Berti-Equille L, Srivastava D. Truth Discovery and Copying Detection in a Dynamic World[J]. Proceedings of the VLDB Endowment, 2009, 2(1): 562-573.
doi: 10.14778/1687627.1687691
[8] Dong X L, Berti-Équille L, Srivastava D. Integrating Conflicting Data: The Role of Source Dependence[J]. Proceedings of the VLDB Endowment, 2009, 2(1): 550-561.
doi: 10.14778/1687627.1687690
[9] Galland A, Abiteboul S, Marian A, et al. Corroborating Information from Disagreeing Views[C]// Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. 2010: 131-140.
[10] Qi G J, Aggarwal C C, Han J, et al. Mining Collective Intelligence in Diverse Groups[C]// Proceedings of the 22nd International Conference on World Wide Web. 2013: 1041-1052.
[11] Zhao B, Rubinstein B I P, Gemmell J, et al. A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration[J]. Proceedings of the VLDB Endowment, 2012, 5(6): 550-561.
doi: 10.14778/2168651.2168656
[12] Zhao B, Han J W. A Probabilistic Model for Estimating Real-Valued Truth from Conflicting Sources[C]// Proceedings of the 10th International Workshop on Quality in Databases, in Conjunction with VLDB 2012. 2012.
[13] Wang X Z, Sheng Q Z, Fang X S, et al. An Integrated Bayesian Approach for Effective Multi-Truth Discovery[C]// Proceedings of the 24th ACM International Conference on Information and Knowledge Management. 2015: 493-502.
[14] 马如霞, 孟小峰. 基于数据源分类可信性的真值发现方法研究[J]. 计算机研究与发展, 2015, 52(9): 1931-1940.
[14] (Ma Ruxia, Meng Xiaofeng. Truth Discovery Based Credibility of Data Categories on Data Sources[J]. Journal of Computer Research and Development, 2015, 52(9): 1931-1940.)
[15] 马如霞, 孟小峰, 王璐, 等. MTruths: Web信息多真值发现方法[J]. 计算机研究与发展, 2016, 53(12): 2858-2866.
[15] (Ma Ruxia, Meng Xiaofeng, Wang Lu, et al. MTruths: An Approach of Multiple Truths Finding from Web Information[J]. Journal of Computer Research and Development, 2016, 53(12): 2858-2866.)
[16] Canalle G K, Salgado A C, Loscio B F. A Survey on Data Fusion: What for? in What Form? What is Next?[J]. Journal of Intelligent Information Systems, 2021, 57(1): 25-50.
doi: 10.1007/s10844-020-00627-4
[17] 卢菁, 胡成, 刘丛. 利用属性集相关性与源误差的多真值发现方法研究[J]. 小型微型计算机系统, 2019, 40(3): 601-605.
[17] (Lu Jing, Hu Cheng, Liu Cong. Research on Multi-Truth Discovery Using Attribute Set Correlation and Source Error[J]. Journal of Chinese Computer Systems, 2019, 40(3): 601-605.)
[18] Chen H F, Dong Y Q, Gu Q, et al. An End-to-End Deep Neural Network for Truth Discovery[C]// Proceedings of the International Conference on Web Information Systems and Applications. 2020: 377-387.
[19] Fang X S, Sheng Q Z, Wang X Z, et al. SmartVote: A Full-Fledged Graph-Based Model for Multi-Valued Truth Discovery[J]. World Wide Web, 2019, 22(4): 1855-1885.
doi: 10.1007/s11280-018-0629-3
[20] Lin X L, Chen L. Domain-Aware Multi-Truth Discovery from Conflicting Sources[J]. Proceedings of the VLDB Endowment, 2018, 11(5): 635-647.
doi: 10.1145/3187009.3177739
[1] Qi Tuotuo, Bai Ruyu, Wang Tianmei. Analyzing Knowledge Payment Behaviors with Information Adoption Model and Product Types[J]. 数据分析与知识发现, 2021, 5(12): 60-73.
[2] Jiang Wen, Xu Xin. Review on Information Quality Evaluation of Online Community Question Answering Sites[J]. 现代图书情报技术, 2014, 30(6): 41-50.
[3] Shen Wang, Guo Jia, Li He. Research on Information Quality and Credibility Evaluation in Online Community——Based on User Perspective[J]. 现代图书情报技术, 2013, 29(1): 69-74.
[4] Wu Xian. Searching High Quality Information Via Search Engines[J]. 现代图书情报技术, 2000, 16(6): 51-53.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn