Generating Hierarchical Paths of Chinese Text from Wikipedia
Xia Tian()
Key Laboratory of Data Engineering and Knowledge Engineering of Ministry of Education,Renmin University of China, Beijing 100872, China;School of Information Resource Management, Renmin University of China, Beijing 100872, China
[Objective] Generate hierarchical semantic paths of texts from Wikipedia. [Methods] We first establish article concept vector of Chinese texts from Wikipedia through explicit semantic analysis. And then, we mapped the vector to the category nodes of hierarchical-tree-like graph. Finally, we generated the hierarchical paths with the help of seed node information diffusion and top-down path selection, as well as optimization technology. [Results] The average relevance degree of the first generated hierarchical path was 54.10% on the test dataset, and the top 20 paths were sorted by relevance in the descending order. [Limitations] We did not analyze the effect of using different numbers of explicit concept vector to the quality of the generated path. [Conclusions] The hierarchical paths generated from Wikipedia can reflect the main semantic meaning of the given texts.
夏天. 基于维基百科的中文文本层次路径生成研究*[J]. 现代图书情报技术, 2016, 32(3): 25-32.
Xia Tian. Generating Hierarchical Paths of Chinese Text from Wikipedia. New Technology of Library and Information Service, 2016, 32(3): 25-32.
(Wu Jiangning, Liu Qiaofeng.Research on Graph Structure Based Method for Chinese Text Representation[J]. Journal of the China Society for Scientific and Technical Information, 2010, 29(4): 618-624.)
[2]
Blei D M, Ng A Y, Jordan M I.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
(He Li, Jia Yan, Han Weihong, et al.Research and Development of Large Scale Hierarchical Classification Problem[J]. Chinese Journal of Computers, 2012, 35(10): 2101-2115.)
[4]
Silla C N, Freitas A A.A Survey of Hierarchical Classification Across Different Application Domains[J]. Data Mining and Knowledge Discovery, 2011, 22(1-2): 31-72.
[5]
Zhang C, Xue G R, Yu Y, et al.Web-scale Classification with Naive Bayes [C]. In: Proceedings of the 18th International Conference on World Wide Web, Madrid, Spain. 2009.
[6]
Medelyan O, Milne D, Legg C, et al.Mining Meaning from Wikipedia[J]. International Journal of Human-Computer Studies, 2009, 67(9): 716-754.
[7]
Muchnik L, Itzhack R, Solomon S, et al.Self-emergence of Knowledge Trees: Extraction of the Wikipedia Hierarchies [J]. Physical Review E, 2007, 76(1): 1-12. DOI: .
[8]
Gabrilovich E, Markovitch S.Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis [C]. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence. 2007: 1606-1611.
[9]
Aggarwal N, Asooja K, Buitelaar P.Exploring ESA to Improve Word Relatedness [C]. In: Proceedings of the 3rd Joint Conference on Lexical and Computational Semantics. 2014: 51-56.
[10]
Milne D N, Witten I H, Nichols D M.et al.A Knowledge-Based Search Engine Powered by Wikipedia [C]. In: Proceedings of the 23rd ACM International Conference on Information and Knowledge Management. 2007.
[11]
Chakrabarti D, Mehta R.The Paths More Taken: Matching DOM Trees to Search Logs for Accurate Webpage Clustering [C]. In: Proceedings of the 19th International Conference on World Wide Web. 2010.