基于属性融合的多真值发现方法<sup>*</sup>

doi:10.11925/infotech.2096-3467.2022.0286

数据分析与知识发现

2022, Vol. 6

Issue (11): 52-60 https://doi.org/10.11925/infotech.2096-3467.2022.0286

研究论文

本期目录 | 过刊浏览 | 高级检索

基于属性融合的多真值发现方法^*

杨昊霖¹,董永权^1,²(

),陈华凤¹,张国玺¹

¹江苏师范大学计算机科学与技术学院徐州 221008
²徐州市云计算工程技术研究中心徐州 221100

Multi-Truth Discovery Method Based on Attribute Fusion

Yang Haolin¹,Dong Yongquan^1,²(

),Chen Huafeng¹,Zhang Guoxi¹

¹School of Computer Science and Technology, Jiangsu Normal University, Xuzhou 221008, China
²Xuzhou Engineering Research Center of Cloud Computing, Xuzhou 221100, China

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF (970 KB) HTML ( 13 )
输出: BibTeX | EndNote (RIS)

摘要

【目的】 解决现有方法多数只侧重于多真值属性自身，缺少考虑辅助属性影响的问题，提高多真值发现的效果。【方法】 利用辅助属性计算数据源专业度和共识度，结合多真值属性值的活跃度得到数据源对冲突数据的支持度。通过调用已有真值发现方法获取真值伪标签，使用神经网络捕获数据源和冲突数据的复杂关系，最终推理出全部真值。【结果】 实验结果表明，与次优方法相比，在图书数据集上F1值提升2.25%，在电影数据集上F1值提升5.42%。【局限】 所提方法融合了反映对象特征的辅助属性，尚未探索其余辅助属性对多真值发现的影响。【结论】 基于多真值属性与辅助属性融合的方法提高了多真值发现的准确性。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	杨昊霖
	董永权
	陈华凤
	张国玺

关键词 ：多真值发现, 数据冲突, 信息质量, 多真值属性, 辅助属性

Abstract：

[Objective] This paper adds influence of auxiliary attributes to the existing models for multi-truth discovery, aiming to improve their F1 values. [Methods] First, we used the auxiliary attributes to calculate the source expertise and consensus degree. Then, we combined the activity degree of multi-truth attribute values to get the degree of support from the source for the conflicting data. Third, we called the existing truth discovery methods to obtain the pseudo tags of the truth. Finally, we used the neural network to capture the complex relationship between the sources and the conflicting data, and identified all truth. [Results] Compared with the sub-optimal model, our method improved the F1 value by 2.25% on the book dataset and by 5.42% on the movie dataset. [Limitations] The proposed method included auxiliary attributes reflecting object features, and more research is needed to explore the impacts of other auxiliary attributes on multi-truth discovery. [Conclusions] The proposed method could effectively discover multi-truth.

Key words： Multi-Truth Discovery Data Conflicts Information Quality Multi-Truth Attribute Auxiliary Attribute

收稿日期: 2022-04-10 出版日期: 2023-01-13

ZTFLH:

TP311

基金资助:* 国家自然科学基金项目(61872168);江苏师范大学研究生科研创新项目(2021XKT1381)

通讯作者: 董永权 E-mail: tomdyq@163.com

引用本文:

杨昊霖,董永权,陈华凤,张国玺. 基于属性融合的多真值发现方法^*[J]. 数据分析与知识发现, 2022, 6(11): 52-60.
Yang Haolin,Dong Yongquan,Chen Huafeng,Zhang Guoxi. Multi-Truth Discovery Method Based on Attribute Fusion. Data Analysis and Knowledge Discovery, 2022, 6(11): 52-60.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.0286 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2022/V6/I11/52

Table 1 4个网站提供的电影《哈利波特》的信息

Table 2 电影网站提供不同种类电影的数量

Fig.1 基于属性融合的多真值发现方法流程

Fig.2 基于属性融合的多真值发现模型结构

Table 3 算法性能对比

Fig.3 阈值改变的影响

Fig.4 消融实验

[1]	刘伟, 孟小峰, 孟卫一. Deep Web数据集成研究综述[J]. 计算机学报, 2007, 30(9): 1475-1489.
[1]	(Liu Wei, Meng Xiaofeng, Meng Weiyi. A Survey of Deep Web Data Integration[J]. Chinese Journal of Computers, 2007, 30(9): 1475-1489.)
[2]	李建中, 王宏志, 高宏. 大数据可用性的研究进展[J]. 软件学报, 2016, 27(7): 1605-1625.
[2]	(Li Jianzhong, Wang Hongzhi, Gao Hong. State-of-the-Art of Research on Big Data Usability[J]. Journal of Software, 2016, 27(7): 1605-1625.)
[3]	Bleiholder J, Naumann F. Data Fusion[J]. ACM Computing Surveys, 2009, 41(1): 1-41.
[4]	Dong X L, Naumann F. Data Fusion- Resolving Data Conflicts for Integration[J]. Proceedings of the VLDB Endowment, 2009, 2(2): 1654-1655. doi: 10.14778/1687553.1687620
[5]	Li Y L, Gao J, Meng C S, et al. A Survey on Truth Discovery[J]. ACM SIGKDD Explorations Newsletter, 2016, 17(2): 1-16.
[6]	Yin X X, Han J W, Yu P S. Truth Discovery with Multiple Conflicting Information Providers on the Web[J]. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(6): 796-808. doi: 10.1109/TKDE.2007.190745
[7]	Dong X L, Berti-Equille L, Srivastava D. Truth Discovery and Copying Detection in a Dynamic World[J]. Proceedings of the VLDB Endowment, 2009, 2(1): 562-573. doi: 10.14778/1687627.1687691
[8]	Dong X L, Berti-Équille L, Srivastava D. Integrating Conflicting Data: The Role of Source Dependence[J]. Proceedings of the VLDB Endowment, 2009, 2(1): 550-561. doi: 10.14778/1687627.1687690
[9]	Galland A, Abiteboul S, Marian A, et al. Corroborating Information from Disagreeing Views[C]// Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. 2010: 131-140.
[10]	Qi G J, Aggarwal C C, Han J, et al. Mining Collective Intelligence in Diverse Groups[C]// Proceedings of the 22nd International Conference on World Wide Web. 2013: 1041-1052.
[11]	Zhao B, Rubinstein B I P, Gemmell J, et al. A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration[J]. Proceedings of the VLDB Endowment, 2012, 5(6): 550-561. doi: 10.14778/2168651.2168656
[12]	Zhao B, Han J W. A Probabilistic Model for Estimating Real-Valued Truth from Conflicting Sources[C]// Proceedings of the 10th International Workshop on Quality in Databases, in Conjunction with VLDB 2012. 2012.
[13]	Wang X Z, Sheng Q Z, Fang X S, et al. An Integrated Bayesian Approach for Effective Multi-Truth Discovery[C]// Proceedings of the 24th ACM International Conference on Information and Knowledge Management. 2015: 493-502.
[14]	马如霞, 孟小峰. 基于数据源分类可信性的真值发现方法研究[J]. 计算机研究与发展, 2015, 52(9): 1931-1940.
[14]	(Ma Ruxia, Meng Xiaofeng. Truth Discovery Based Credibility of Data Categories on Data Sources[J]. Journal of Computer Research and Development, 2015, 52(9): 1931-1940.)
[15]	马如霞, 孟小峰, 王璐, 等. MTruths: Web信息多真值发现方法[J]. 计算机研究与发展, 2016, 53(12): 2858-2866.
[15]	(Ma Ruxia, Meng Xiaofeng, Wang Lu, et al. MTruths: An Approach of Multiple Truths Finding from Web Information[J]. Journal of Computer Research and Development, 2016, 53(12): 2858-2866.)
[16]	Canalle G K, Salgado A C, Loscio B F. A Survey on Data Fusion: What for? in What Form? What is Next?[J]. Journal of Intelligent Information Systems, 2021, 57(1): 25-50. doi: 10.1007/s10844-020-00627-4
[17]	卢菁, 胡成, 刘丛. 利用属性集相关性与源误差的多真值发现方法研究[J]. 小型微型计算机系统, 2019, 40(3): 601-605.
[17]	(Lu Jing, Hu Cheng, Liu Cong. Research on Multi-Truth Discovery Using Attribute Set Correlation and Source Error[J]. Journal of Chinese Computer Systems, 2019, 40(3): 601-605.)
[18]	Chen H F, Dong Y Q, Gu Q, et al. An End-to-End Deep Neural Network for Truth Discovery[C]// Proceedings of the International Conference on Web Information Systems and Applications. 2020: 377-387.
[19]	Fang X S, Sheng Q Z, Wang X Z, et al. SmartVote: A Full-Fledged Graph-Based Model for Multi-Valued Truth Discovery[J]. World Wide Web, 2019, 22(4): 1855-1885. doi: 10.1007/s11280-018-0629-3
[20]	Lin X L, Chen L. Domain-Aware Multi-Truth Discovery from Conflicting Sources[J]. Proceedings of the VLDB Endowment, 2018, 11(5): 635-647. doi: 10.1145/3187009.3177739

[1]	齐托托, 白如玉, 王天梅. *基于信息采纳模型的知识付费行为研究——产品类型的调节效应**[J]. 数据分析与知识发现, 2021, 5(12): 60-73.
[2]	姜雯, 许鑫. 在线问答社区信息质量评价研究综述[J]. 现代图书情报技术, 2014, 30(6): 41-50.
[3]	何远标, 乐小虬, 袁国华, 许丽媛, 管仲, 周强. 基于日志的泛在个人数据同步方法研究[J]. 现代图书情报技术, 2013, 29(10): 8-14.
[4]	沈旺, 国佳, 李贺. 网络社区信息质量及可靠性评价研究——基于用户视角[J]. 现代图书情报技术, 2013, 29(1): 69-74.

Viewed

Full text

Abstract

Cited

Shared

Discussed