Please wait a minute...
New Technology of Library and Information Service  2010, Vol. 26 Issue (11): 64-68    DOI: 10.11925/infotech.1003-3513.2010.11.10
article Current Issue | Archive | Adv Search |
Chinese People Name Disambiguation by Hierarchical Clustering
Zhang Shunrui, You Hongliang
China Defense Science & Technology Information Center, Beijing 100142, China
Download: PDF(382 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

This paper works on the task of Chinese people name disambiguation by hierarchical clustering algorithm, and proposes several good features for the task by experiments. The authors apply TF to calculate feature weight, and get better results after using artificial rules designed for extracting people name from documents. Finally, an average F-value(α=0.5) of 88.15% is achieved in the test of the corpus containing 191 ambiguous names.

Key wordsPeople name disambiguation      Hierarchical clustering      Vector space model     
Received: 29 September 2010      Published: 04 January 2011
: 

TP391

 

Cite this article:

Zhang Shunrui, You Hongliang. Chinese People Name Disambiguation by Hierarchical Clustering. New Technology of Library and Information Service, 2010, 26(11): 64-68.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2010.11.10     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2010/V26/I11/64


[1] Malin B, Airoldi E, Carley K M. A Network Analysis Model for Disambiguation of Names in Lists
[J]. Computational & Mathematical Organization Theory, 2005,11(2):119-139.

[2] WePS-3 Workshop Program
[EB/OL].
[2010-07-10]. http://nlp.uned.es/weps/.

[3] SemEval 2007
[EB/OL].
[2010-07-10]. http://nlp.cs.swarthmore.edu/semeval/index.php.

[4] Mann G S, Yarowsky D. Unsupervised Personal Name Disambiguation
[C]. In: Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL.2003: 33-40.

[5] Balog K, Azzopardi L, Rijke M D. UVA: Language Modeling Techniques for Web People Search
[C]. In: Proceedings of the 4th International Workshop on Semantic Evaluations.2007: 468–471.

[6] Ono S, Sato I, Yoshida M,et al. Person Name Disambiguation in Web Pages Using Social Network, Compound Words and Latent Topics
[C]. In: Proceedings of the 12th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining.2008:260-271.

[7] Task3 Chinese Version
[EB/OL].
[2010-10-16]. http://www.cipsc.org.cn/clp2010/task3_ch.htm.

[8] 周晓,李超,胡明涵,等. 基于人物互斥属性的中文人名消歧
[C]. 见:第六届全国信息检索学术会议(CCIR2010).2010:333-340.

[9] 丁海波,肖桐,朱靖波. 基于多阶段的中文人名消歧聚类技术的研究
[C].见:第六届全国信息检索学术会(CCIR2010).2010:316-324.

[10] ICTCLAS-分词-中文分词-汉语分词
[EB/OL].
[2010-07-10]. http://ictclas.org/.

[11] Artiles J, Gonzalo J, Sekine S. Establishing a Benchmark for the Web People Search Task
[C]. In: Proceedings of the 4th International Workshop on Semantic Evaluations.2007: 64–69.

[1] Rujiang Bai,Fuhai Leng,Junhua Liao. An Improved Cosine Text Similarity Computing Method Based on Semantic Chunk Feature[J]. 数据分析与知识发现, 2017, 1(6): 56-64.
[2] Ding Shengchun,Gong Silan,Li Hongmei. A New Method to Detect Bursty Events from Micro-blog Posts Based on Bursty Topic Words and Agglomerative Hierarchical Clustering Algorithm[J]. 现代图书情报技术, 2016, 32(7-8): 12-20.
[3] Xiao Tianjiu, Liu Ying. Words and N-gram Models Analysis for “A Dream of Red Mansions”[J]. 现代图书情报技术, 2015, 31(4): 50-57.
[4] Tan Xueqing, Zhou Tong, Luo Lin. A Text Classification Algorithm Based on the Average Category Similarity[J]. 现代图书情报技术, 2014, 30(9): 66-73.
[5] Li Xiangdong, Liao Xiangpeng, Huang Li. Research and Implementation of Bibliographic Information Classification System in LDA Model[J]. 现代图书情报技术, 2014, 30(5): 18-25.
[6] Hu Jiming, Xiao Lu. Semantic Incremental Improvement on Vector Space Model for Text Modeling[J]. 现代图书情报技术, 2014, 30(10): 49-55.
[7] Zhao Pengwei, Ma Lin, Qin Chunxiu. Formation of Interest-based Peer-to-Peer Community[J]. 现代图书情报技术, 2013, 29(10): 53-58.
[8] Xiao Ming, Li Wenchao, Xia Qiuju. Mapping the Themes of Information Retrieval Based on Prefuse and Hierarchical Clustering[J]. 现代图书情报技术, 2012, 28(4): 35-40.
[9] Lu Yonghe, He Xinyu. An Application of Sharpen Gaussian Template in a Text Feature Weight Adjustment Methodology[J]. 现代图书情报技术, 2012, (12): 39-44.
[10] Zhang Zhiping Li Linna. Design and Implementation of Related Document Recommendation in Document Retrieval System of NSTL[J]. 现代图书情报技术, 2010, 26(7/8): 110-113.
[11] Yu Xitian,Wan Lili,Hu Tiejun,Li Danya. Research and Implementation of Related Articles Database Based on Vector Space Model[J]. 现代图书情报技术, 2008, 24(6): 61-66.
[12] Qiu Yuhong,Guo Jijun. Application of Vector Space Model in the Similarity Research of Medical Literature[J]. 现代图书情报技术, 2007, 2(7): 63-67.
[13] Liu Hua . A Text Categorization System with C#[J]. 现代图书情报技术, 2007, 2(3): 43-45.
[14] Yan Duanwu,Luo Shengyang,Cheng Xiao . Toward User-Document Matrix Based User Clustering for Collaborative Recommendation[J]. 现代图书情报技术, 2007, 2(3): 25-28.
[15] Liu Hua . Implementation and Comparison of Similarity and Probabilistic Mode in Text Categorization[J]. 现代图书情报技术, 2006, 1(4): 53-55.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn