1School of Information and Safety Engineering, Zhongnan University of Economics and Law, Wuhan 430073, China 2School of Information Management, Wuhan University, Wuhan 430072, China
[Objective] The paper tries to eliminate the ambiguity of author names in the document system, aiming to solve the problem of incorrect document aggregation.[Methods] First, we constructed three types of networks for authors, documents and author-documents, with structured document data. Then we combined different network embedding methods to obtain the representation of document nodes. Finally, we employed the unsupervised learning model and the hierarchical agglomerative clustering to process the documents.[Results] We conducted empirical studies on datasets from ArnetMiner, CiteSeerX and DBLP. Our method performed well on sparse networks and the macro-F1 value increased by 6%.[Limitations] We only explored author name disambiguation in English.[Conclusions] The proposed method could effectively reduce the ambiguity of author names. It is of great significance for scientific collaboration and citation recommendation, as well as knowledge network related research.
Trust Mechanism in Distributed Access Control Model of P2P Networks
<authors>
Lei Wang,Yanqin Zhu,Lanfang Jin,Xizhao Luo
<label>
0
<id>
4944
<jconf>
ACIS-ICIS
<year>
2008
<organization>
null
Table 1 实验数据样例
Macro-F1_ arnetminer
AuthorList
AuthorList-NMF
NDNE
ADNE
本文方法
Lei Wang
23.09
20.04
76.97
28.39
78.64
Jing Zhang
24.58
25.37
73.48
49.56
77.04
Yu Zhang
27.98
17.51
60.28
19.24
55.86
Bin Li
25.82
19.86
80.34
42.11
78.30
Yang Wang
19.01
18.70
53.06
21.07
54.42
Hao Wang
17.23
9.15
54.81
30.67
50.49
Wei Xu
24.57
18.81
66.46
25.05
72.58
Bo Liu
19.24
25.66
86.71
19.05
79.65
Gang Chen
25.79
9.77
63.07
28.09
67.99
Lei Chen
21.13
11.77
60.37
28.96
60.67
Table 2 在ArnetMiner数据集上的作者重名消歧结果
Macro-F1_citeseerx
AuthorList
AuthorList-NMF
NDNE
ADNE
本文方法
J Lee
6.41
6.25
42.58
6.62
21.12
S Lee
4.94
4.93
39.79
6.02
33.45
Y Chen
9.45
7.20
47.52
10.07
26.98
C Chen
11.20
4.92
35.63
7.89
18.03
J Smith
9.75
8.51
35.81
9.02
24.47
A Gupta
3.93
5.20
41.14
5.73
23.63
J Martin
17.17
13.83
53.98
22.58
41.05
D Johnson
12.91
15.23
28.55
17.85
24.07
A Kumar
25.67
21.78
35.74
17.96
14.33
M Brown
17.80
19.48
46.11
29.04
24.68
Table 3 在CiteSeerX数据集上的作者重名消歧结果
Macro-F1_dblp
AuthorList
AuthorList-NMF
NDNE
ADNE
本文 方法
Wei Wang
12.94
2.37
70.30
12.73
29.56
Yi Zhang
24.91
10.89
34.68
39.31
31.98
Jian Zhang
30.43
13.46
33.83
33.13
23.52
Jing Wang
16.67
11.92
77.00
58.33
67.71
Lei Zhang
5.94
9.68
50.54
8.96
19.28
Wei Li
18.94
4.67
42.52
31.45
32.03
Yang Wang
16.07
12.52
39.98
30.68
47.67
Minsoo Kim
17.73
21.08
43.24
33.85
52.34
Rui Wang
32.16
11.04
50.55
25.38
55.83
Jun Sun
17.42
16.97
58.04
24.57
40.63
Table 4 在DBLP数据集上的作者重名消歧结果
网络特征
ArnetMiner
CiteSeerX
DBLP
平均文献条目数
197.9
733.0
141.7
平均真实作者数
61.4
43.2
13.7
平均节点数(作者网络)
323.5
681.0
160.0
平均边数(作者网络)
600.5
1763.4
426.9
平均节点度数(作者网络)
3.7
4.7
5.0
平均边数(文献网络)
783.7
36541.7
2338.4
平均节点度数(文献网络)
8.1
83.1
19.4
Table 5 三组数据集的网络统计特征
Fig.3 学习迭代轮次对模型效果的影响
Fig.4 表示向量维数对模型效果的影响
Fig.5 LINE相似度选择对模型效果的影响
Fig.6 DeepWalk与LINE交换训练对象的影响
[1]
章顺瑞, 游宏梁 . 现代图书情报技术[J]. 现代图书情报技术, 2010(11):64-68.
[1]
( Zhang Shunrui, You Hongliang . Chinese People Name Disambiguation by Hierarchical Clustering[J]. New Technology of Library and Information Service, 2010(11):64-68.)
( Xiao Jing, Liang Bing, Zhang Xiaodan , et al. Author Disambiguation Rules and Algorithm for Article Level Data[J]. New Technology of Library and Information Service, 2012(5):55-59.)
( Liu Bin, Zhao Sheng, Sun Xiaoming , et al. Research on Inventors’ Names Disambiguation Algorithm in Chinese Patent Data[J]. Journal of the China Society for Scientific and Technical Information, 2016,35(4):405-414.)
( Zhou Jie, Li Bicheng, Tang Yongwang . Incremental Clustering Method Based on Key Evidence and E 2LSH for Person Name Disambiguation [J]. Journal of the China Society for Scientific and Technical Information, 2016,35(7):714-722.)
[5]
郭舒 . 现代图书情报技术[J]. 现代图书情报技术, 2013(7/8):69-74.
[5]
( Guo Shu . Research on Author Name Disambiguation Algorithm in the Literature Database[J]. New Technology of Library and Information Service, 2013(7/8):69-74.)
[6]
Han H, Giles L, Zha H , et al. Two Supervised Learning Approaches for Name Disambiguation in Author Citations [C]//Proceedings of the 4th ACM/IEEE-CS Joint Conference on Digital libraries, Tucson, Arizona, USA. New York, USA: ACM, 2004: 296-305.
[7]
Giles C L, Zha H, Han H . Name Disambiguation in Author Citations Using a K-way Spectral Clustering Method [C]//Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries, Denver, Colorado, USA. New York, USA: ACM, 2005: 334-343.
[8]
Tang J, Fong A C M, Wang B , et al. A Unified Probabilistic Framework for Name Disambiguation in Digital Library[J]. IEEE Transactions on Knowledge and Data Engineering, 2012,24(6):975-987.
[9]
Hermansson L, Kerola T, Johansson F , et al. Entity Disambiguation in Anonymized Graphs Using Graph Kernels [C]//Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, San Francisco, California, USA. New York, USA: ACM, 2013: 1037-1046.
[10]
Saha T K, Zhang B, Hasan M A . Name Disambiguation from Link Data in a Collaboration Graph Using Temporal and Topological Features[J]. Social Network Analysis and Mining, 2015,5(1):1-14.
( Tu Cunchao, Yang Cheng, Liu Zhiyuan , et al. Network Representation Learning: An Overview[J]. Scientia Sinica (Informationis), 2017,47(8):32-48.)
[12]
Perozzi B, Al-Rfou R, Skiena S . DeepWalk: Online Learning of Social Representations [C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, USA. 2014: 701-710.
[13]
Mikolov T, Chen K, Corrado G , et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301.3781.
[14]
Mikolov T, Sutskever I, Chen K , et al. Distributed Representations of Words and Phrases and Their Compositionality [C]//Proceedings of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, Nevada, USA. USA: Curran Associates, 2013: 3111-3119.
[15]
Grover A, Leskovec J . Node2vec: Scalable Feature Learning for Networks [C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California, USA. 2016: 855-864.
( Chen Li, Zhu Peisong, Qian Tieyun , et al. Edge Sampling Based Network Embedding Model[J]. Journal of Software, 2018,29(3):756-771.)
[17]
Tang J, Qu M, Wang M , et al. LINE: Large-scale Information Network Embedding [C]// Proceedings of the 24th International Conference on World Wide Web, Florence, Italy. 2015: 1067-1077.
[18]
Wang D, Peng C, Zhu W . Structural Deep Network Embedding [C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California, USA. 2016: 1225-1234.
[19]
Yang C, Liu Z, Zhao D , et al. Network Representation Learning with Rich Text Information[C]// Proceedings of the 24th International Conference on Artificial Intelligence, Buenos Aires, Argentina. San Francisco, California, USA: AAAI Press, 2015: 2111-2117.
[20]
Tu C, Liu H, Liu Z , et al. CANE: Context-Aware Network Embedding for Relation Modeling [C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada. ACL, 2017: 1722-1731.
( Liu Zhengming, Ma Hong, Liu Shuxin , et al. A Network Representation Learning Algorithm Fusing with Textual Attribute Information of Nodes[J]. Computer Engineering, 2018,44(11):165-171.)
[22]
ArnetMiner Name Disambiguation Dataset [EB/OL]. [2019-01-01].https://www.aminer.cn/disambiguation.
[23]
CiteSeerX Name Disambiguation Dataset [EB/OL]. [2019-01-01]. http://clgiles.ist.psu.edu/data/.
[24]
Xu J, Shen S Q, Li D S , et al. A Network-embedding Based Method for Author Disambiguation [C]// Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Torino, Italy. New York, USA: ACM, 2018: 1735-1738.
[25]
Zhang B, Hasan M A . Name Disambiguation in Anonymized Graphs Using Network Embedding [C]// Proceedings of the 2017 ACM Conference on Information and Knowledge Management, Singapore. New York, USA: ACM, 2017: 1239-1248.