Please wait a minute...
New Technology of Library and Information Service  2014, Vol. 30 Issue (11): 31-37    DOI: 10.11925/infotech.1003-3513.2014.11.05
Current Issue | Archive | Adv Search |
Research of Text Feature Extraction on Dependency Parsing Network
Tang Xiaobo, Xiao Lu
Center for the Studies of Information Resources, Wuhan University, Wuhan 430072, China
Download: PDF(1414 KB)   HTML
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] In order to promote the accuracy of text feature extraction method based on network, this paper builds a more accurate text network by dependency parsing. [Methods] This method determines the semantic association between feature words according to the result of dependency parsing and the direction of the edges by dependent direction of feature words. And then the improved PageRank algorithm is used to calculate the network node importance to complete the feature extraction. [Results] Experimental results show that to some extent, text feature extraction based on dependency parsing network can improve the effect of document clustering, compared to co-word network. [Limitations] This paper does not distinguish different dependent type when determines the direction between feature words by dependent relationship. [Conclusions] The proposed method based on dependency parsing network is effective on the text feature extraction.

Key wordsFeature extraction      Dependency parsing      Complex network     
Received: 23 May 2014      Published: 18 December 2014
PACS:  TP391.1  

Cite this article:

Tang Xiaobo, Xiao Lu. Research of Text Feature Extraction on Dependency Parsing Network. New Technology of Library and Information Service, 2014, 30(11): 31-37.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2014.11.05     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2014/V30/I11/31

[1] 赵鹏, 蔡庆生, 王清毅, 等. 一种基于复杂网络特征的中文文档关键词抽取算法[J]. 模式识别与人工智能, 2007, 20(6): 827-831. (Zhao Peng, Cai Qingsheng, Wang Qingyi, et al. An Automatic Keyword Extraction of Chinese Document Algorithm Based on Complex Network Features [J]. Pattern Recognition and Artificial Intelligence, 2007, 20(6): 827-831.)
[2] Dumais S, Platt J, Heckerman D, et al. Inductive Learning Algorithms and Representations for Text Categorization [C]. In: Proceedings of the 7th International Conference on Information and Knowledge Management (CIKM'98). New York: ACM, 1998: 148-155.
[3] Apté C, Damerau F, Weiss S M. Automated Learning of Decision Rules for Text Categorization [J]. ACM Transactions on Information Systems, 1994, 12(3): 233-251.
[4] Joachims T. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization [C]. In: Proceedings of the 14th International Conference on Machine Learning (ICML'97). San Francisco: Morgan Kaufmann Publishers Inc., 1997: 143-151.
[5] Yang Y, Pedersen J O. A Comparative Study on Feature Selection in Text Categorization [C]. In: Proceedings of the 14th International Conference on Machine Learning (ICML'97). San Francisco: Morgan Kaufmann Publishers Inc., 1997: 412-420.
[6] Church K W, Hanks P. Word Association Norms, Mutual Information, and Lexicography [J]. Computational Linguistics, 1990, 16(1): 22-29.
[7] Quinlan J R. Induction of Decision Trees [J]. Machine Learning, 1986, 1(1): 81-106.
[8] Mesleh A M A. Chi Square Feature Extraction Based SVMs Arabic Language Text Categorization System [J]. Journal of Computer Science, 2007, 3(6): 430-435.
[9] 张玉芳, 万斌候, 熊忠阳. 文本分类中的特征降维方法研究[J]. 计算机应用研究, 2012, 29(7): 2541-2543. (Zhang Yufang, Wan Binhou, Xiong Zhongyang. Research on Feature Dimension Reduction in Text Classification [J]. Application Research of Computers, 2012, 29(7): 2541-2543.)
[10] 邹加棋, 陈国龙, 郭文忠. 基于图模型的中文文档分类研究[J]. 小型微型计算机系统, 2006, 27(4): 754-757. (Zou Jiaqi, Chen Guolong, Guo Wenzhong. Research on Chinese Document Classification Based on Graph Model [J]. Mini- Micro Systems, 2006, 27(4): 754-757.)
[11] 孟海东, 张炼, 吕海林. 基于图模型的文本分类方法的研究[J]. 计算机与现代化, 2010 (9): 38-40, 44. (Meng Haidong, Zhang Lian, Lv Hailin. Research on Document Classification Method Based on Graph Model [J]. Computer and Modernization, 2010(9): 38-40, 44.)
[12] 赵辉, 刘怀亮, 张倩. 一种基于复杂网络的中文文本分类算法[J]. 情报学报, 2012, 31(11): 1179-1186. (Zhao Hui, Liu Huailiang, Zhang Qian. A Chinese Text Classification Algorithm Based on Complex Network [J]. Journal of the China Society for Scientific and Technical Information, 2012, 31(11): 1179-1186.)
[13] Liu H. The Complexity of Chinese Syntactic Dependency Networks [J]. Physica A: Statistical Mechanics and Its Applications, 2008, 387(12): 3048-3058.
[14] Liu G, Zhai Z. Research on Keywords Extraction of Chinese Documents Based on TEXT-NET [C]. In: Proceedings of 2011 International Conference on Electric Information and Control Engineering (ICEICE), Wuhan, China. IEEE, 2011: 6074- 6077.
[15] Hensman S. Construction of Conceptual Graph Representa­tion of Texts [C]. In: Proceedings of the Student Research Workshop at HLT-NAACL 2004. Stroudsburg: Association for Computational Linguistics, 2004: 49-54.
[16] 谢凤宏, 张大为, 黄丹, 等. 基于加权复杂网络的文本关键词提取[J]. 系统科学与数学, 2010, 30(11): 1592-1596. (Xie Fenghong, Zhang Dawei, Huang Dan, et al. Keywords Extraction Based on Weighted Complex Network [J]. Journal of Systems Science and Mathematical Sciences, 2010, 30(11): 1592-1596.)
[17] 吕西安·泰尼埃尔. 结构句法基础[G]. 北京: 中国人民大学语言文学系, 1987. (Tesniere L. The Basis of Structure Syntax [G]. Beijing: Language and Literature Department of Renmin University of China, 1987.)
[18] 李彬, 刘挺, 秦兵, 等. 基于语义依存的汉语句子相似度计算[J]. 计算机应用研究, 2003, 20(12): 15-17. (Li Bin, Liu Ting, Qin Bing, et al. Chinese Sentence Similarity Computing Based on Semantic Dependency Relationship Analysis [J]. Application Research of Computers, 2003, 20(12): 15-17.)
[19] 王鹏, 樊兴华. 中文文本分类中利用依存关系的实验研究[J]. 计算机工程与应用, 2010, 46(3): 131-133, 141. (Wang Peng, Fan Xinghua. Study on Chinese Text Classification Based on Dependency Relation [J]. Computer Engineering and Applications, 2010, 46(3): 131-133, 141.)
[20] Che W, Li Z, Liu T. LTP: A Chinese Language Technology Platform [C]. In: Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations, Beijing, China. Stroudsburg: Association for Computational Linguistics, 2010: 13-16.
[21] Matsuo Y, Ohsawa Y, Ishizuka M. A Document as a Small World[A]//New Frontiers in Artificial Intelligence [M]. Springer Berlin Heidelberg, 2001: 444-448.
[22] 刘知远, 郑亚斌, 孙茂松. 汉语依存句法网络的复杂网络性质[J]. 复杂系统与复杂性科学, 2008, 5(2): 37-45. (Liu Zhiyuan, Zheng Yabin, Sun Maosong. Complex Network Properties of Chinese Syntactic Dependency Network [J]. Complex Systems and Complexity Science, 2008, 5(2): 37-45.)
[23] 刘海涛. 汉语句法网络的复杂性研究[J]. 复杂系统与复杂性科学, 2007, 4(4): 38-44. (Liu Haitao. The Complexity of Chinese Syntactic Network[J]. Complex Systems and Complexity Science, 2007, 4(4): 38-44.)
[24] 刘旭. 克里米亚公投结束 民调显示93%选民赞成入俄[EB/OL]. (2014-03-17). http://news.sohu.com/20140317/ n396701134.shtml. (Liu Xu. The End of the Crimean Referendum Poll Shows 93% of Voters is in Favor of the Entry of Russia [EB/OL]. (2014-03-17). http://news.sohu. com/20140317/n396701134.shtml.)
[25] The Open Graph Viz Platform [EB/OL]. [2014-03-05]. http:// www.gephi.org.
[26] 张巍. 基于PageRank算法的搜索引擎优化策略研究[D]. 成都: 四川大学, 2005. (Zhang Wei. Research on Optimizing Strategies of Search Engine Based on PageRank Algorithm [D]. Chengdu: Sichuan University, 2005.)
[27] 陈小飞, 王轶彤, 冯小军. 一种基于网页质量的PageRank算法改进[J]. 计算机研究与发展, 2009, 46(S): 381-387. (Chen Xiaofei, Wang Yitong, Feng Xiaojun. An Improvement of PageRank Algorithm Based on Page Quality [J]. Journal of Computer Research and Development, 2009, 46(S): 381-387.)
[28] 夏天. 词语位置加权TextRank的关键词抽取研究[J]. 现代图书情报技术, 2013(9): 30-34. (Xia Tian. Study on Keyword Extraction Using Word Position Weighted TextRank[J].New Technology of Library and Information Service, 2013(9): 30-34.)
[29] Zhang H, Yu H, Xiong D, et al. HHMM-based Chinese Lexical Analyzer ICTCLAS[C]. In: Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing (SIGHAN'03), Sapporo, Japan. Stroudsburg: Association for Computational Linguistics, 2003: 184-187.
[30] The Stanford Parser: A Statistical Parser [EB/OL]. [2014-05- 29]. http://nlp.stanford.edu/software/lex-parser.shtml#Download.
[31] 陈果, 胡昌平. 科研领域关键词网络的结构特征与启示——基于图情学科的实证研究[J]. 现代图书情报技术, 2014(7-8): 84-91. (Chen Guo, Hu Changping. Research on the Structural Features of Keyword Network of Scientific Research Areas: An Empirical Study of LIS [J]. New Technology of Library and Information Service, 2014(7-8): 84-91.)

[1] Wu Jiang,Chen Jun,Zhang Jinfan. A Knowledge Supply-Demand Simulation System for Collaborative Innovation[J]. 现代图书情报技术, 2016, 32(9): 27-33.
[2] Ye Teng,Han Lichuan,Xing Chunxiao,Zhang Yan. Knowledge Dissemination Mechanism in Virtual Communities: Case Study Based on Complex Network Theory[J]. 现代图书情报技术, 2016, 32(7-8): 70-77.
[3] Lan Qiujun,Liu Wenxing,Li Weikang,Hu Xingye. Sentiment Analysis of Financial Forum Textual Message[J]. 现代图书情报技术, 2016, 32(4): 64-71.
[4] Lixin Xia,Ying Tan. Analysis and Visualization of the LOD Network Structure[J]. 现代图书情报技术, 2016, 32(1): 65-72.
[5] Du Siqi, Li Honglian, Lv Xueqiang. Research of Chinese Chunk Parsing in Application of the Product Feature Extraction[J]. 现代图书情报技术, 2015, 31(9): 26-30.
[6] Zhang Fan, Le Xiaoqiu. Research on Recognition of Concept Attribute Instances in Innovation Sentences of Scientific Research Paper[J]. 现代图书情报技术, 2015, 31(5): 15-23.
[7] Yang Ning, Huang Feihu, Wen Yi, Chen Yunwei. An Opinion Evolution Model Based on the Behavior of Micro-blog Users[J]. 现代图书情报技术, 2015, 31(12): 34-41.
[8] Du Kun, Liu Huailiang, Guo Lujie. Study on the Modified Method of Feature Weighting with Complex Networks[J]. 现代图书情报技术, 2015, 31(11): 26-32.
[9] Zhu Hou. Co-evolution of Social Networks and Public Opinion Considering the Effect of Trust and Authority[J]. 现代图书情报技术, 2015, 31(10): 50-57.
[10] He Yumei, Qi Jiayin, Liu Huili. The Study of Local-world Network Evolution Model Based on Microblog[J]. 现代图书情报技术, 2014, 30(5): 66-73.
[11] Lu Yonghe, Liang Minghui. Improvement of Text Feature Extraction with Genetic Algorithm[J]. 现代图书情报技术, 2014, 30(4): 48-57.
[12] Nie Hui, Du Jiazhong. Using Dependency Parsing Pattern to Extract Product Feature Tags[J]. 现代图书情报技术, 2014, 30(12): 44-50.
[13] Yang Zhimo, Liu Huailiang, Zhao Hui. An Algorithm of Chinese Text Representation Based on Complex Network[J]. 现代图书情报技术, 2014, 30(11): 38-44.
[14] You Guirong, Wu Wei, Qian Yuntao. Feature Extraction Method for Detecting Spam in Electronic Commerce[J]. 现代图书情报技术, 2014, 30(10): 93-100.
[15] Zhao Hui, Liu Huailiang. Research on Short Text Clustering Algorithm for User Generated Content[J]. 现代图书情报技术, 2013, 29(9): 88-92.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn