Please wait a minute...
Data Analysis and Knowledge Discovery
Current Issue | Archive | Adv Search |
Research on Chinese Author Name Disambiguation with Fusion of Multiple Features
Lin Kerou, Wang Hao,Gong Lijuan
(School of Information Management, Nanjing University, Nanjing 210023, China)
(Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China)
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper aims to solve the problem of Chinese author name ambiguity in document resource management systems.

[Methods] We build author entities marked by “author name + institution name” based on documents data and use the attributes of author entities to construct six similarity features categorized into three aspects. Then we fuse the features by principal component analysis and direct weight assignment respectively and combined, aiming to study the disambiguated effects of the fusion methods and each feature.

[Results] The fusion method combining principal component analysis with direct weight assignment using one single feature as a unit, and the fusion method of direct weight assignment using one aspect features as a unit can reduce time costs effectively. The F1 value on the LIS test dataset can reach 70.74% and 70.42% respectively, and the F1 value on the economy test dataset can reach 81.90% and 80.93% respectively.

[Limitations] The attributes used in this research are limited, coming from only the metadata of essays, no external information or text content is excavated.

[Conclusions] The proposed fusion method can solve the problem of weight setting effectively when fusing multiple features.


Key words feature fusion      author name disambiguation      PCA      Chinese papers      
Published: 10 October 2020
ZTFLH:  TP393  

Cite this article:

Lin Kerou, Wang Hao, Gong Lijuan. Research on Chinese Author Name Disambiguation with Fusion of Multiple Features . Data Analysis and Knowledge Discovery, 0, (): 1-.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2020.0532     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y0/V/I/1

[1] Liu Weijiang,Wei Hai,Yun Tianhe. Evaluation Model for Customer Credits Based on Convolutional Neural Network[J]. 数据分析与知识发现, 2020, 4(6): 80-90.
[2] Li Junlian,Wu Yingjie,Deng Panpan,Leng Fuhai. Automatic Data Processing Strategy of Citation Anomie Based on Feature Fusion[J]. 数据分析与知识发现, 2020, 4(5): 38-45.
[3] Na Ma,Zhixiong Zhang,Pengmin Wu. Automatic Identification of Term Citation Object with Feature Fusion[J]. 数据分析与知识发现, 2020, 4(1): 89-98.
[4] Chen Yuan,Wang Chaoqun,Hu Zhongyi,Wu Jiang. Identifying Malicious Websites with PCA and Random Forest Methods[J]. 数据分析与知识发现, 2018, 2(4): 71-80.
[5] Yu Chuanming,Gong Yutian,Zhao Xiaoli,An Lu. Collaboration Recommendation of Finance Research Based on Multi-feature Fusion[J]. 数据分析与知识发现, 2017, 1(8): 39-47.
[6] Zhang Liyi, Zhang Jiao. A Brusher Detection Method Based on Principle Component Analysis and Random Forest[J]. 现代图书情报技术, 2015, 31(10): 65-71.
[7] Yu Xianzi, Gao Yinglian, Ma Chunxia, Liu Jinxing. The Penalized Matrix Decomposition Method of Extracting Core Characteristic Words——Taking Co-word Analysis as an Example[J]. 现代图书情报技术, 2014, 30(3): 88-95.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn