Please wait a minute...
New Technology of Library and Information Service  2012, Vol. Issue (9): 23-28    DOI: 10.11925/infotech.1003-3513.2012.09.05
Current Issue | Archive | Adv Search |
Study on the Application of Complex Network Theory in Chinese Text Feature Selection
Zhao Hui, Liu Huailiang, Fan Yunjie
Economy and Management College, Xidian University, Xi’an 710071, China
Download: PDF(605 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  This paper proposes a feature selection method based on complex network. The weighted complex network of text is built to represent the semantic relations between words and text structure. The weighted degree, weighted clustering coefficient and betweenness are considered in the characteristics calculation of network nodes, the key words which can reflect the theme of the text are selected by the synthetic characteristics of network nodes. A Chinese text feature selection algorithm based on complex network is proposed and verified. The results of experiments show that the method proposed in this paper can get a better effect on the performance of text classification.
Key wordsComplex network      Semantic relevance relation      Synthetic characteristics of nodes      Feature selection     
Received: 25 July 2012      Published: 25 December 2012
:  TP391.1  

Cite this article:

Zhao Hui, Liu Huailiang, Fan Yunjie. Study on the Application of Complex Network Theory in Chinese Text Feature Selection. New Technology of Library and Information Service, 2012, (9): 23-28.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2012.09.05     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2012/V/I9/23

[1] John G H, Kohavi R, Pfleger K. Irrelevant Features and the Subset Selection Problem[C]. In:Proceedings of the 11th International Conference on Machine Learning(ICML’94). 1994:121-129.
[2] Quinlan J R. Induction of Decision Trees[J]. Machine Learning, 1986, 1(1):81-106.
[3] Church K W, Hanks P. Word Association Norms, Mutual Information and Lexicography[J]. Computational Linguistics, 1990,16(1):22-29.
[4] Koller D, Sahami M. Hierarchically Classifying Documents Using Very Few Words[C]. In:Proceedings of the 14th International Conference on Machine Learning(ICML’97). San Francisco:Morgan Kaufmann Publishers Inc., 1997:170-178.
[5] Kononenko I. On Biases in Estimating Multi-valued Attributes[C]. In:Proceedings of the 14th International Joint Conference on Artificial Intelligence(IJCAI’95). San Francisco:Morgan Kaufmann Publishers Inc., 1995:1034-1040.
[6] Rijsbergen C V. The Selection of Good Search Terms[J]. Information Processing & Management, 1981, 17(2):77-91.
[7] Huang C, Tian Y H, Huang T J, et al. Semantic Scoring Based on Small-World Phenomenon for Feature Selection in Text Mining[C]. In:Proceedings of the the 2nd International Conference on Advanced Data Mining and Applications(ADMA’06). Heidelberg,Berlin:Springer-Verlag, 2006:636-643.
[8] Liu G, Zhai Z W. Research on Keywords Extraction of Chinese Documents Based on TEXT-NET[C]. In:Proceedings of the 2011 International Conference on Electric Information and Control Engineering.2011:6074-6077.
[9] 赵鹏, 蔡庆生, 王清毅, 等. 一种基于复杂网络特征的中文文档关键词抽取算法[J]. 模式识别与人工智能, 2007, 20(6):827-831. (Zhao Peng, Cai Qingsheng, Wang Qingyi, et al. An Automatic Keyword Extraction of Chinese Document Algorithm Based on Complex Network Features[J]. Pattern Recognition and Artificial Intelligence, 2007, 20(6):827-831.)
[10] 谢凤宏, 张大为, 黄丹, 等. 基于加权复杂网络的文本关键词提取[J]. 系统科学与数学, 2010, 30(11):1592-1596. (Xie Fenghong, Zhang Dawei, Huang Dan, et al. Keywords Extraction Based on Weighted Complex Network[J]. Journal of Systems Science and Mathematical Sciences, 2010, 30(11):1592-1596.)
[11] 韩艳. 基于统计的中文文本关键短语自动抽取方法研究[D]. 苏州:苏州大学, 2009. (Han Yan. Research on Statistic-based Methods Automatic Keypharse Extraction from Chinese Texts[D]. Suzhou:Soochow University, 2009.)
[12] Jia X Q. Feature Selection Algorithm Based on the Community Discovery[C]. In:Proceedings of the 7th International Conference on Computational Intelligence and Security.2011:455-458.
[13] 郑碎潘.Web 数据挖掘中的文本分类研究[D]. 南京:南京航空航天大学,2007. (Zheng Suipan. Research on Text Classification of Web Data Mining[D]. Nanjing:Nanjing University of Aeronautics and Astronautics, 2007.)
[14] Yang Y M, Pedersen J O. A Comparative Study on Feature Selection in Text Categorization[C]. In:Proceedings of the 14th International Conference on Machine Learning(ICML’97). San Francisco:Morgan Kaufmann Publishers Inc., 1997:412-420.
[15] Matsuo Y, Ohsawa Y, Ishizuka M. A Document as a Small World[J]. Lecture Notes in Computer Science, 2001, 2253(2001):444-448.
[16] 李勇. 复杂网络理论与应用研究[D]. 广州:华南理工大学, 2005. (Li Yong. Researches on the Theory and Application of Complex Network[D]. Guangzhou:South China University of Technology, 2005.)
[17] 赵鹏, 耿焕同, 蔡庆生, 等. 一种基于加权复杂网络特征的 K-means 聚类算法[J]. 计算机技术与发展, 2007, 17(9):35-37. (Zhao Peng, Geng Huantong, Cai Qingsheng, et al. A Novel K-means Clustering Algorithm Based on Weighted Complex Networks Feature[J]. Computer Technology and Development, 2007, 17(9):35-37.)
[18] 王莉. 语义网、社会网络计算与Web资源共享[M]. 北京:电子工业出版社, 2011. (Wang Li. The Semantic Web, Social Network Computing and Web Resources Sharing[M]. Beijing:Publishing House of Electronics Industry, 2011.)
[19] Cancho R F I, Sole R V. The Small World of Human Language[J]. Proceedings of the Royal Society of London Series B-Biological Sciences, 2001, 268 (1482):2261-2265.
[20] 耿焕同, 蔡庆生, 赵鹏, 等. 一种基于词共现图的文档自动摘要研究[J]. 情报学报, 2005, 24(6):651-656. (Geng Huantong, Cai Qingsheng, Zhao Peng, et al. Research on Document Automatic Summarization Based on Word Co-occurrence[J]. Journal of the China Society for Scientific and Technical Information, 2005, 24(6):651-656.)
[21] 苏小康. 基于维基百科构建语义知识库及其在文本分类领域的应用研究[D]. 武汉:华中师范大学, 2010.(Su Xiaokang. Research on Building Wikipedia Semantic Knowledge Base and Its Application in Text Classification[D]. Wuhan:Central China Normal University, 2010.)
[22] Sebastiani F. Machine Learning in Automated Text Categorization[J]. ACM Computing Surveys, 2002, 34(1):1-47.
[23] Salton G, McGill M J. Introduction to Modern Information Retrieval[M]. New York:McGraw Hill, 1986.
[1] Cheng Zhou,Hongqin Wei. Evaluating and Classifying Patent Values Based on Self-Organizing Maps and Support Vector Machine[J]. 数据分析与知识发现, 2019, 3(5): 117-124.
[2] Jiaming Liang,Jie Zhao,Zhou Jianlong,Zhenning Dong. Detecting Collusive Fraudulent Online Transaction with Implicit User Behaviors[J]. 数据分析与知识发现, 2019, 3(5): 125-138.
[3] Tingxin Wen,Yangzi Li,Jingshuang Sun. News Hotspots Discovery Method Based on Multi Factor Feature Selection and AFOA/K-means[J]. 数据分析与知识发现, 2019, 3(4): 97-106.
[4] Xiang Li,Xiaodong Qian. Research on Impact of Commodity Online Evaluation for Consumption Convergence[J]. 数据分析与知识发现, 2019, 3(3): 102-111.
[5] Zhanglu Tan,Zhaogang Wang,Han Hu. Study on a Method of Feature Classification Selection Based on χ2 Statistics[J]. 数据分析与知识发现, 2019, 3(2): 72-78.
[6] Xiaodong Qian,Min Li. Identifying E-commerce User Types Based on Complex Network Overlapping Community[J]. 数据分析与知识发现, 2018, 2(6): 79-91.
[7] Tingxin Wen,Yangzi Li,Jingshuang Sun. Extracting Text Features with Improved Fruit Fly Optimization Algorithm[J]. 数据分析与知识发现, 2018, 2(5): 59-69.
[8] Yunwei Chen,Ruihong Zhang. Comparing on Community Detection Algorithms for Information Mining[J]. 数据分析与知识发现, 2018, 2(10): 84-94.
[9] Zhipeng Li,Weizhong Li. Feature Selection Based on Modified QPSO Algorithm[J]. 数据分析与知识发现, 2017, 1(7): 82-89.
[10] Yue Zhang,Dongbo Wang,Danhao Zhu. Segmenting Chinese Words from Food Safety Emergencies[J]. 数据分析与知识发现, 2017, 1(2): 64-72.
[11] Bingyao Liu,Jing Ma,Xiaofeng Li. Topic Representation Model Based on “Feature Dimensionality Reduction”[J]. 数据分析与知识发现, 2017, 1(11): 53-61.
[12] Xiangdong Li,Tao Ruan,Kang Liu. Automatic Classification of Documents from Wikipedia[J]. 数据分析与知识发现, 2017, 1(10): 43-52.
[13] Yonghe Lu,Jinghuang Chen. Optimizing Feature Selection Method for Text Classification with Shuffled Frog Leaping Algorithm[J]. 数据分析与知识发现, 2017, 1(1): 91-101.
[14] Wu Jiang,Chen Jun,Zhang Jinfan. A Knowledge Supply-Demand Simulation System for Collaborative Innovation[J]. 现代图书情报技术, 2016, 32(9): 27-33.
[15] Ye Teng,Han Lichuan,Xing Chunxiao,Zhang Yan. Knowledge Dissemination Mechanism in Virtual Communities: Case Study Based on Complex Network Theory[J]. 现代图书情报技术, 2016, 32(7-8): 70-77.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn