New Technology of Library and Information Service  2014, Vol. 30 Issue (11): 38-44    DOI: 10.11925/infotech.1003-3513.2014.11.06
An Algorithm of Chinese Text Representation Based on Complex Network
Yang Zhimo, Liu Huailiang, Zhao Hui
School of Economics & Management, Xidian University, Xi'an 710126, China
[Objective] To solve the problem of the semantic deficiency in text representation based on Vector Space Model, this paper proposes an algorithm of Chinese text representation based on complex network. [Methods] Word relevance is calculated based on the concept pages, link structure and category system which are extracted from Wikipedia. Then, it represents the feature words of texts as nodes, and puts the semantic relevance relation between words as the edges, and uses the word relevance as edge weight of weighted complex network. [Results] Results of experiments show that the proposed text representation method can improve the calculation of text similarity and improve the performance of text categorization. [Limitations] The selection rules of co-occurred window and span in this paper draw lessons from the existing researches. [Conclusions] This text representation method can better keep the structure information and the correlation information between words. Besides, the computation method of word relevance based on Wikipedia makes semantic information represented by the text network more accurate.

Key wordsText representation      Complex network      Wikipedia      Word relevance      Text similarity     
Received: 06 April 2014      Published: 18 December 2014
:  G350  

Cite this article:

Yang Zhimo, Liu Huailiang, Zhao Hui. An Algorithm of Chinese Text Representation Based on Complex Network. New Technology of Library and Information Service, 2014, 30(11): 38-44.

