Please wait a minute...
New Technology of Library and Information Service  2015, Vol. 31 Issue (11): 26-32    DOI: 10.11925/infotech.1003-3513.2015.11.05
Current Issue | Archive | Adv Search |
Study on the Modified Method of Feature Weighting with Complex Networks
Du Kun, Liu Huailiang, Guo Lujie
School of Economics & Management, Xidian University, Xi'an 710126, China
Download: PDF(552 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper aims to calculate feature weights more accurately for the improvement of the accuracy of text similarity calculation. [Methods] The semantic association among features is considered to structure text complex networks and select features. An improved calculation method of feature weighting is proposed to carry out the Chinese text classification experiment with the definition of category correlation coefficient and the combination of the feature selection results. [Results] Experiment results show that the proposed Chinese text classification method works better in classification than the TFIDF algorithm. [Limitations] The parameters in the feature selection evaluation function need to be given. [Conclusions] Compared with the traditional TFIDF algorithm, the new algorithm is more accurate in the representation of feature weights.

Received: 26 May 2015      Published: 06 April 2016
:  TP391  
  G356  

Cite this article:

Du Kun, Liu Huailiang, Guo Lujie. Study on the Modified Method of Feature Weighting with Complex Networks. New Technology of Library and Information Service, 2015, 31(11): 26-32.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2015.11.05     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2015/V31/I11/26

[1] 台德艺, 王俊. 文本分类特征权重改进算法[J]. 计算机工程, 2010, 36(9): 197-199, 202. (Tai Deyi, Wang Jun. Improved Feature Weighting Algorithm for Text Categorization [J]. Computer Engineering, 2010, 36(9): 197-199, 202.)
[2] 苏丹, 周明全, 王学松, 等. 一种基于最少出现文档频的文本特征提取方法[J]. 计算机工程与应用, 2012, 48(10): 164-166, 178. (Su Dan, Zhou Mingquan, Wang Xuesong, et al. Method Based on Least Document Frequency for Text Feature Extraction [J]. Computer Engineering and Applications, 2012, 48(10): 164-166, 178.)
[3] 赵小华, 马建芬. 文本分类算法中词语权重计算方法的改进[J]. 电脑知识与技术, 2009, 5(36): 10626-10628. (Zhao Xiaohua, Ma Jianfen. Modify the Method of Feature's Weight in Text Classification [J]. Computer Knowledge and Technology, 2009, 5(36): 10626-10628.)
[4] 李原. 中文文本分类中分词和特征选择方法研究[D]. 长春: 吉林大学, 2011. (Li Yuan. Research on Word Segmenta­tion and Feature Selection of Chinese Text Classification [D]. Changchun: Jilin University, 2011.)
[5] Debole F, Sebastiani F. Supervised Term Weighting for Automated Text Categorization [C]. In: Proceedings of the 2003 ACM Symposium on Applied Computing, 2003: 784-788.
[6] 陆玉昌, 鲁明羽, 李凡, 等. 向量空间法中单词权重函数的分析和构造[J]. 计算机研究与发展, 2002, 39(10): 1205-1210. (Lu Yuchang, Lu Mingyu, Li Fan, et al. Analysis and Construction of Word Weighing Function in VSM [J]. Journal of Computer Research and Development, 2002, 39(10): 1205-1210.)
[7] Huang C, Tian Y H, Huang T J, et al. Semantic Scoring Based on Small-Word Phenomenon for Feature Selection in Text Mining [C]. In: Proceedings of the 2nd International Conference on Advance Data Mining and Application (ADMA'06). Heidelberg, Berlin: Springer-Verlag, 2006: 636-643.
[8] Liu G, Zhai Z W. Research on Keywords Extraction of Chinese Documents Based on TEXT-NET [C]. In: Proceedings of the 2011 International Conference on Electric Information and Control Engineering. 2011: 6074-6077.
[9] 赵辉, 刘怀亮, 范云杰. 复杂网络理论在中文文本特征选择中的应用研究[J]. 现代图书情报技术, 2012(9): 23-28. (Zhao Hui, Liu Huailiang, Fan Yunjie. Study on the Application of Complex Network Theory in Chinese Text Feature Selection [J]. New Technology of Library and Information Service, 2012(9): 23-28.)
[10] Manning C D, Schutze H. Foundations of Statistical Natural Language Processing [M]. MIT Press, 1999: 111-114.
[11] 涂新辉, 张红春, 周琨峰, 等. 中文维基百科的结构化信息抽取及词语相关度计算方法[J]. 中文信息学报, 2012, 26(3): 109-115. (Tu Xinhui, Zhang Hongchun, Zhou Kunfeng, et al. Extracting Structured Information from Chinese Wikipedia and Measuring Relatedness Between Words [J]. Journal of Chinese Information Processing, 2012, 26(3): 109-115.)
[12] 王娟, 曹树金, 姜灵敏, 等. 基于中文维基百科的领域概念相关性研究[J]. 图书情报工作, 2014, 58(23): 136-142. (Wang Juan, Cao Shujin, Jiang Lingmin, et al. Research on Semantic Relatedness of Domain-specific Concepts Based on Chinese Wikipedia [J]. Library and Information Service, 2014, 58(23): 136-142.)
[13] Witten I H, Milne D N. An Effective, Low-cost Measure of Semantic Relatedness Obtained from Wikipedia Links [C]. In: Proceedings of AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy. AAAI Press, 2008: 25-30.
[14] Rada R, Mili H, Bicknell E, et al. Development and Application of a Metric on Semantic Nets [J]. IEEE Transactions on Systems, Man, and Cybernetics, 1989, 19(1): 17-30.
[15] Wu Z B, Palmer M. Verb Semantics and Lexical Selection [C]. In: Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 1994: 133-138.
[16] 郭雷, 许晓鸣. 复杂网络[M]. 上海: 上海科技教育出版社, 2006: 28-30. (Guo Lei, Xu Xiaoming. Complex Networks [M]. Shanghai: Shanghai Science and Technology Education Press, 2006: 28-30.)
[17] 赵鹏, 耿焕同, 蔡庆生, 等. 一种基于加权复杂网络特征的K-means聚类算法[J]. 计算机技术与发展, 2007, 17(9): 35-37. (Zhao Peng, Geng Huantong, Cai Qingsheng, et al. A Novel K-means Clustering Algorithm Based on Weighted Complex Networks Feature [J]. Computer Technology and Development, 2007, 17(9): 35-37.)
[18] 中国科学院计算技术研究所. ICTCLAS汉语分词系统[EB/OL]. [2014-07-06]. http://ictclas.org/ictclas_download. aspx. (Institute of Computing Technology, Chinese Academy of Sciences. ICTCLAS [EB/OL]. [2014-07-06]. http://ictclas. org/ictclas_download.aspx.)

[1] Xiaofeng Li,Jing Ma,Chi Li,Hengmin Zhu. Identifying Commodity Names Based on XGBoost Model[J]. 数据分析与知识发现, 2019, 3(7): 34-41.
[2] Zhongxi You,Weina Hua,Xuelian Pan. Matching Book Reviews and Essential Sentiment Lexicons with Chinese Word Segmenters[J]. 数据分析与知识发现, 2019, 3(7): 23-33.
[3] Peng Guan,Yuefen Wang,Zhu Fu. Analyzing Topic Semantic Evolution with LDA: Case Study of Lithium Ion Batteries[J]. 数据分析与知识发现, 2019, 3(7): 61-72.
[4] Jiahui Hu,An Fang,Wanqing Zhao,Chenliu Yang,Huiling Ren. Annotating Chinese E-Medical Record for Knowledge Discovery[J]. 数据分析与知识发现, 2019, 3(7): 123-132.
[5] Beibei Kong,Jing Xie,Li Qian,Zhijun Chang,Zhenxin Wu. Methodology and Tools to Enrich Sci-Tech Big Data[J]. 数据分析与知识发现, 2019, 3(7): 113-122.
[6] Fan Xuexue, Wang Zhirong, Xu Wu, Liang Yin, Ma Xiaohu. Research on Semantic Similarity Estimation Algorithm of Medical Terminology Based on Medical Ontology[J]. 现代图书情报技术, 2015, 31(12): 57-64.
[7] Ren Haiying, Yu Liting. A Multi-strategy Method for Word Sense Disambiguation Based on Wikipedia[J]. 现代图书情报技术, 2015, 31(11): 18-25.
[8] Ye Chuan, Ma Jing. Research on Topic Discovery Algoritm of Multimedia Microblog Comments Information[J]. 现代图书情报技术, 2015, 31(11): 51-59.
[9] Xie Xiaqing, Wu Xu. Application of Visualization Technology for “Classic Reading” Platform[J]. 现代图书情报技术, 2015, 31(11): 96-103.
[10] He Yu, Lv Xueqiang, Xu Liping. A Chinese Term Extraction System in New Energy Vehicles Domain[J]. 现代图书情报技术, 2015, 31(10): 88-94.
[11] Du Siqi, Li Honglian, Lv Xueqiang. Research of Chinese Chunk Parsing in Application of the Product Feature Extraction[J]. 现代图书情报技术, 2015, 31(9): 26-30.
[12] Xu Deshan, Li Hui, Zhang Yunliang. A Method of Keywords Annotation Based on Linked Triples[J]. 现代图书情报技术, 2015, 31(9): 31-37.
[13] Dun Wenjie, Sun Yigang, Zhu Xianzhong. Design and Realization of Multimedia Document Structure of Internet TV[J]. 现代图书情报技术, 2015, 31(9): 82-89.
[14] Chen Shiqin, Li Wenjiang. Application of WebSocket in Library Mobile Information Service[J]. 现代图书情报技术, 2015, 31(9): 90-96.
[15] Tong Guoping, Sun Jianjun. User Behavior Analysis Based on Search Engine Log[J]. 现代图书情报技术, 2015, 31(7-8): 80-88.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn