|
|
Generating Hierarchical Paths of Chinese Text from Wikipedia |
Xia Tian() |
Key Laboratory of Data Engineering and Knowledge Engineering of Ministry of Education,Renmin University of China, Beijing 100872, China;School of Information Resource Management, Renmin University of China, Beijing 100872, China |
|
|
Abstract [Objective] Generate hierarchical semantic paths of texts from Wikipedia. [Methods] We first establish article concept vector of Chinese texts from Wikipedia through explicit semantic analysis. And then, we mapped the vector to the category nodes of hierarchical-tree-like graph. Finally, we generated the hierarchical paths with the help of seed node information diffusion and top-down path selection, as well as optimization technology. [Results] The average relevance degree of the first generated hierarchical path was 54.10% on the test dataset, and the top 20 paths were sorted by relevance in the descending order. [Limitations] We did not analyze the effect of using different numbers of explicit concept vector to the quality of the generated path. [Conclusions] The hierarchical paths generated from Wikipedia can reflect the main semantic meaning of the given texts.
|
Received: 16 November 2015
Published: 12 April 2016
|
[1] | 吴江宁, 刘巧凤. 基于图结构的中文文本表示方法研究[J]. 情报学报, 2010, 29(4): 618-624. | [1] | (Wu Jiangning, Liu Qiaofeng.Research on Graph Structure Based Method for Chinese Text Representation[J]. Journal of the China Society for Scientific and Technical Information, 2010, 29(4): 618-624.) | [2] | Blei D M, Ng A Y, Jordan M I.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022. | [3] | 何力, 贾焰, 韩伟红, 等. 大规模层次分类问题研究及其进展[J]. 计算机学报, 2012, 35(10): 2101-2115. | [3] | (He Li, Jia Yan, Han Weihong, et al.Research and Development of Large Scale Hierarchical Classification Problem[J]. Chinese Journal of Computers, 2012, 35(10): 2101-2115.) | [4] | Silla C N, Freitas A A.A Survey of Hierarchical Classification Across Different Application Domains[J]. Data Mining and Knowledge Discovery, 2011, 22(1-2): 31-72. | [5] | Zhang C, Xue G R, Yu Y, et al.Web-scale Classification with Naive Bayes [C]. In: Proceedings of the 18th International Conference on World Wide Web, Madrid, Spain. 2009. | [6] | Medelyan O, Milne D, Legg C, et al.Mining Meaning from Wikipedia[J]. International Journal of Human-Computer Studies, 2009, 67(9): 716-754. | [7] | Muchnik L, Itzhack R, Solomon S, et al.Self-emergence of Knowledge Trees: Extraction of the Wikipedia Hierarchies [J]. Physical Review E, 2007, 76(1): 1-12. DOI: . | [8] | Gabrilovich E, Markovitch S.Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis [C]. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence. 2007: 1606-1611. | [9] | Aggarwal N, Asooja K, Buitelaar P.Exploring ESA to Improve Word Relatedness [C]. In: Proceedings of the 3rd Joint Conference on Lexical and Computational Semantics. 2014: 51-56. | [10] | Milne D N, Witten I H, Nichols D M.et al.A Knowledge-Based Search Engine Powered by Wikipedia [C]. In: Proceedings of the 23rd ACM International Conference on Information and Knowledge Management. 2007. | [11] | Chakrabarti D, Mehta R.The Paths More Taken: Matching DOM Trees to Search Logs for Accurate Webpage Clustering [C]. In: Proceedings of the 19th International Conference on World Wide Web. 2010. |
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|