Data Analysis and Knowledge Discovery
Research on Chinese Author Name Disambiguation with Fusion of Multiple Features
Lin Kerou, Wang Hao,Gong Lijuan
(School of Information Management, Nanjing University, Nanjing 210023, China)
(Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China)
[Objective] This paper aims to solve the problem of Chinese author name ambiguity in document resource management systems.

[Methods] We build author entities marked by “author name + institution name” based on documents data and use the attributes of author entities to construct six similarity features categorized into three aspects. Then we fuse the features by principal component analysis and direct weight assignment respectively and combined, aiming to study the disambiguated effects of the fusion methods and each feature.

[Results] The fusion method combining principal component analysis with direct weight assignment using one single feature as a unit, and the fusion method of direct weight assignment using one aspect features as a unit can reduce time costs effectively. The F1 value on the LIS test dataset can reach 70.74% and 70.42% respectively, and the F1 value on the economy test dataset can reach 81.90% and 80.93% respectively.

[Limitations] The attributes used in this research are limited, coming from only the metadata of essays, no external information or text content is excavated.

[Conclusions] The proposed fusion method can solve the problem of weight setting effectively when fusing multiple features.

Key words feature fusion      author name disambiguation      PCA      Chinese papers      
Published: 10 October 2020
Lin Kerou, Wang Hao, Gong Lijuan. Research on Chinese Author Name Disambiguation with Fusion of Multiple Features . Data Analysis and Knowledge Discovery, 0, (): 1-.

