Please wait a minute...
New Technology of Library and Information Service  2014, Vol. 30 Issue (11): 38-44    DOI: 10.11925/infotech.1003-3513.2014.11.06
Current Issue | Archive | Adv Search |
An Algorithm of Chinese Text Representation Based on Complex Network
Yang Zhimo, Liu Huailiang, Zhao Hui
School of Economics & Management, Xidian University, Xi'an 710126, China
Download: PDF(501 KB)   HTML
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] To solve the problem of the semantic deficiency in text representation based on Vector Space Model, this paper proposes an algorithm of Chinese text representation based on complex network. [Methods] Word relevance is calculated based on the concept pages, link structure and category system which are extracted from Wikipedia. Then, it represents the feature words of texts as nodes, and puts the semantic relevance relation between words as the edges, and uses the word relevance as edge weight of weighted complex network. [Results] Results of experiments show that the proposed text representation method can improve the calculation of text similarity and improve the performance of text categorization. [Limitations] The selection rules of co-occurred window and span in this paper draw lessons from the existing researches. [Conclusions] This text representation method can better keep the structure information and the correlation information between words. Besides, the computation method of word relevance based on Wikipedia makes semantic information represented by the text network more accurate.

Key wordsText representation      Complex network      Wikipedia      Word relevance      Text similarity     
Received: 06 April 2014      Published: 18 December 2014
PACS:  G350  

Cite this article:

Yang Zhimo, Liu Huailiang, Zhao Hui. An Algorithm of Chinese Text Representation Based on Complex Network. New Technology of Library and Information Service, 2014, 30(11): 38-44.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2014.11.06     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2014/V30/I11/38

[1] Salton G, Wong A, Yang C S. A Vector Space Model for Automatic Indexing [J]. Communications of the ACM, 1975, 18(11): 613-620.
[2] 谢凤宏, 张大为, 黄丹, 等. 基于加权复杂网络的文本关键词提取[J]. 系统科学与数学, 2010, 30(11): 1592-1596. (Xie Fenghong, Zhang Dawei, Huang Dan, et al. Keywords Extraction Based on Weighted Complex Network [J]. Journal of Systems Science and Mathematical Sciences, 2010, 30(11): 1592-1596.)
[3] 韩艳. 基于统计的中文文本关键短语自动抽取方法研究[D]. 苏州: 苏州大学, 2009. (Han Yan. Research on Statistic- based Automatic Keypharse Extraction from Chinese Texts [D]. Suzhou: Soochow University, 2009.)
[4] Grabska-Gradzinska I, Kulig A, Kwapien J, et al. Complex Network Analysis of Literary and Scientific Texts [J]. International Journal of Modern Physics C, 2012, 23(7). DOI: 10.1142/S0129183112500519.
[5] Liu J, Wang J. Keyword Extraction Using Language Network [C]. In: Proceedings of International Conference on Natural Language Processing and Knowledge Engineering (NLP- KE'07), Beijing, China. IEEE, 2007: 129-134.
[6] 李纲, 毛进. 文本图表示模型及其在文本挖掘中的应用[J]. 情报学报, 2013, 32(12): 1257-1264. (Li Gang, Mao Jin. A Review on Text Graph Representation and Its Application in Text Mining [J]. Journal of the China Society for Scientific and Technical Information, 2013, 32(12): 1257-1264.)
[7] Litvak M, Last M. Graph-based Keyword Extraction for Single-document Summarization [C]. In: Proceedings of the Workshop on Multi-source Multilingual Information Extrac­tion and Summarization (MMIES'08). Stroudsburg: Associa­tion for Computational Linguistics, 2008: 17-24.
[8] 赵辉, 刘怀亮, 范云杰. 复杂网络理论在中文文本特征选择中的应用研究[J]. 现代图书情报技术, 2012(9): 23-28. (Zhao Hui, Liu Huailiang, Fan Yunjie. Study on the Application of Complex Network Theory in Chinese Text Feature Selection [J]. New Technology of Library and Information Service, 2012(9): 23-28.)
[9] 赵辉, 刘怀亮, 张倩.一种基于复杂网络的中文文本分类算法[J]. 情报学报, 2012, 31(11): 1179-1186. (Zhao Hui, Liu Huailiang, Zhang Qian. A Chinese Text Classification Algorithm Based on Complex Network [J]. Journal of the China Society for Scientific and Technical Information, 2012, 31(11): 1179-1186.)
[10] 钟茂生, 刘慧, 刘磊. 词汇间语义相关关系量化计算方法[J]. 中文信息学报, 2009, 23(2): 115-122. (Zhong Maosheng, Liu Hui, Liu Lei. Method of Semantic Relevance Relation Measurement between Words [J]. Journal of Chinese Information Processing, 2009, 23(2): 115-122.)
[11] Pestman W R. Mathematical Statistics [M]. Walter de Gruyter, 1998: 198-199.
[12] Van der Lubbe J C A. Information Theory [M]. London: Cambridge University Press, 2001: 16-21.
[13] Manning C D, Schutze H. Foundations of Statistical Natural Language Processing [M]. Cambridge: MIT Press, 1999: 111-114.
[14] 涂新辉, 张红春, 周琨峰, 等. 中文维基百科的结构化信息抽取及词语相关度计算方法[J]. 中文信息学报, 2012, 26(3): 109-115. (Tu Xinhui, Zhang Hongchun, Zhou Kunfeng, et al. Extracting Structured Information from Chinese Wikipedia and Measuring Relatedness between Words [J]. Journal of Chinese Information Processing, 2012, 26(3): 109-115.)
[15] Milne D, Witten I H. An Effective, Low-cost Measure of Semantic Relatedness Obtained from Wikipedia Links [C]. In: Proceedings of the 23rd Association for the Advancement of Artificial Intelligence. 2008: 25-30.
[16] Cancho R F I, Sole R V. The Small World of Human Language[J]. Proceedings of the Royal Society of London: Series B-Biological Sciences, 2001, 268 (1482): 2261-2265. DOI: 10.1098/rspb.2001.1800.
[17] 刘知远, 孙茂松. 汉语词同现网络的小世界效应和无标度特性[J]. 中文信息学报, 2007, 21(6): 52-58. (Liu Zhiyuan, Sun Maosong. Chinese Word Co-occurrence Network: Its Small World Effect and Scale-free Property [J]. Journal of Chinese Information Processing, 2007, 21(6): 52-58.)
[18] 吴江宁, 刘巧凤. 基于图结构的中文文本表示方法研究[J]. 情报学报, 2010, 29(4): 618-624. (Wu Jiangning, Liu Qiaofeng. Research on Graph Structure Based Method for Chinese Text Representation [J]. Journal of the China Society for Scientific and Technical Information, 2010, 29(4): 618-624.)
[19] 赵鹏, 耿焕同, 蔡庆生, 等. 一种基于加权复杂网络特征的 K-means 聚类算法[J]. 计算机技术与发展, 2007, 17(9): 35-37, 40. (Zhao Peng, Geng Huantong, Cai Qingsheng, et al. A Novel K-means Clustering Algorithm Based on Weighted Complex Networks Feature[J]. Computer Technology and Development, 2007, 17(9): 35-37, 40.)
[20] Van Rijsbergen C J. Information Retrieval [M]. London: Butterworths, 1979.
[21] Cover T, Hart P. Nearest Neighbor Pattern Classification [J]. IEEE Transactions on Information Theory, 1967, 13(1): 21-27.

[1] Wu Jiang,Chen Jun,Zhang Jinfan. A Knowledge Supply-Demand Simulation System for Collaborative Innovation[J]. 现代图书情报技术, 2016, 32(9): 27-33.
[2] Ye Teng,Han Lichuan,Xing Chunxiao,Zhang Yan. Knowledge Dissemination Mechanism in Virtual Communities: Case Study Based on Complex Network Theory[J]. 现代图书情报技术, 2016, 32(7-8): 70-77.
[3] Zhou Pengcheng,Wu Chuan,Lu Wei. Entity Linking Method for Short Texts with Multi-Knowledge Bases: Case Study of Wikipedia and Freebase[J]. 现代图书情报技术, 2016, 32(6): 1-11.
[4] Xia Tian. Generating Hierarchical Paths of Chinese Text from Wikipedia[J]. 现代图书情报技术, 2016, 32(3): 25-32.
[5] Guo Xu,Qi Ruihua. Using Non-standard Text Features to Identify Authors[J]. 现代图书情报技术, 2016, 32(11): 27-33.
[6] Lixin Xia,Ying Tan. Analysis and Visualization of the LOD Network Structure[J]. 现代图书情报技术, 2016, 32(1): 65-72.
[7] Li Hui, Xiang Huating, Tang Qiang. A Trust Model for Wikipedia Based on Structure Information and Edit History[J]. 现代图书情报技术, 2015, 31(3): 33-38.
[8] Yang Ning, Huang Feihu, Wen Yi, Chen Yunwei. An Opinion Evolution Model Based on the Behavior of Micro-blog Users[J]. 现代图书情报技术, 2015, 31(12): 34-41.
[9] Ren Haiying, Yu Liting. A Multi-strategy Method for Word Sense Disambiguation Based on Wikipedia[J]. 现代图书情报技术, 2015, 31(11): 18-25.
[10] Du Kun, Liu Huailiang, Guo Lujie. Study on the Modified Method of Feature Weighting with Complex Networks[J]. 现代图书情报技术, 2015, 31(11): 26-32.
[11] Zhu Hou. Co-evolution of Social Networks and Public Opinion Considering the Effect of Trust and Authority[J]. 现代图书情报技术, 2015, 31(10): 50-57.
[12] He Yumei, Qi Jiayin, Liu Huili. The Study of Local-world Network Evolution Model Based on Microblog[J]. 现代图书情报技术, 2014, 30(5): 66-73.
[13] Tang Xiaobo, Xiao Lu. Research of Text Feature Extraction on Dependency Parsing Network[J]. 现代图书情报技术, 2014, 30(11): 31-37.
[14] Zhao Hui, Liu Huailiang. Research on Short Text Clustering Algorithm for User Generated Content[J]. 现代图书情报技术, 2013, 29(9): 88-92.
[15] Li Shengqing, Cai Guoyong. Study on Network Evolution and Knowledge Dissemination of Scientific Collaboration Network in the Field of Complex Networks[J]. 现代图书情报技术, 2013, (5): 64-72.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn