Please wait a minute...
New Technology of Library and Information Service  2014, Vol. 30 Issue (11): 31-37    DOI: 10.11925/infotech.1003-3513.2014.11.05
Current Issue | Archive | Adv Search |
Research of Text Feature Extraction on Dependency Parsing Network
Tang Xiaobo, Xiao Lu
Center for the Studies of Information Resources, Wuhan University, Wuhan 430072, China
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] In order to promote the accuracy of text feature extraction method based on network, this paper builds a more accurate text network by dependency parsing. [Methods] This method determines the semantic association between feature words according to the result of dependency parsing and the direction of the edges by dependent direction of feature words. And then the improved PageRank algorithm is used to calculate the network node importance to complete the feature extraction. [Results] Experimental results show that to some extent, text feature extraction based on dependency parsing network can improve the effect of document clustering, compared to co-word network. [Limitations] This paper does not distinguish different dependent type when determines the direction between feature words by dependent relationship. [Conclusions] The proposed method based on dependency parsing network is effective on the text feature extraction.

Key wordsFeature extraction      Dependency parsing      Complex network     
Received: 23 May 2014      Published: 18 December 2014
:  TP391.1  

Cite this article:

Tang Xiaobo, Xiao Lu. Research of Text Feature Extraction on Dependency Parsing Network. New Technology of Library and Information Service, 2014, 30(11): 31-37.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2014.11.05     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2014/V30/I11/31

[1] 赵鹏, 蔡庆生, 王清毅, 等. 一种基于复杂网络特征的中文文档关键词抽取算法[J]. 模式识别与人工智能, 2007, 20(6): 827-831. (Zhao Peng, Cai Qingsheng, Wang Qingyi, et al. An Automatic Keyword Extraction of Chinese Document Algorithm Based on Complex Network Features [J]. Pattern Recognition and Artificial Intelligence, 2007, 20(6): 827-831.)
[2] Dumais S, Platt J, Heckerman D, et al. Inductive Learning Algorithms and Representations for Text Categorization [C]. In: Proceedings of the 7th International Conference on Information and Knowledge Management (CIKM'98). New York: ACM, 1998: 148-155.
[3] Apté C, Damerau F, Weiss S M. Automated Learning of Decision Rules for Text Categorization [J]. ACM Transactions on Information Systems, 1994, 12(3): 233-251.
[4] Joachims T. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization [C]. In: Proceedings of the 14th International Conference on Machine Learning (ICML'97). San Francisco: Morgan Kaufmann Publishers Inc., 1997: 143-151.
[5] Yang Y, Pedersen J O. A Comparative Study on Feature Selection in Text Categorization [C]. In: Proceedings of the 14th International Conference on Machine Learning (ICML'97). San Francisco: Morgan Kaufmann Publishers Inc., 1997: 412-420.
[6] Church K W, Hanks P. Word Association Norms, Mutual Information, and Lexicography [J]. Computational Linguistics, 1990, 16(1): 22-29.
[7] Quinlan J R. Induction of Decision Trees [J]. Machine Learning, 1986, 1(1): 81-106.
[8] Mesleh A M A. Chi Square Feature Extraction Based SVMs Arabic Language Text Categorization System [J]. Journal of Computer Science, 2007, 3(6): 430-435.
[9] 张玉芳, 万斌候, 熊忠阳. 文本分类中的特征降维方法研究[J]. 计算机应用研究, 2012, 29(7): 2541-2543. (Zhang Yufang, Wan Binhou, Xiong Zhongyang. Research on Feature Dimension Reduction in Text Classification [J]. Application Research of Computers, 2012, 29(7): 2541-2543.)
[10] 邹加棋, 陈国龙, 郭文忠. 基于图模型的中文文档分类研究[J]. 小型微型计算机系统, 2006, 27(4): 754-757. (Zou Jiaqi, Chen Guolong, Guo Wenzhong. Research on Chinese Document Classification Based on Graph Model [J]. Mini- Micro Systems, 2006, 27(4): 754-757.)
[11] 孟海东, 张炼, 吕海林. 基于图模型的文本分类方法的研究[J]. 计算机与现代化, 2010 (9): 38-40, 44. (Meng Haidong, Zhang Lian, Lv Hailin. Research on Document Classification Method Based on Graph Model [J]. Computer and Modernization, 2010(9): 38-40, 44.)
[12] 赵辉, 刘怀亮, 张倩. 一种基于复杂网络的中文文本分类算法[J]. 情报学报, 2012, 31(11): 1179-1186. (Zhao Hui, Liu Huailiang, Zhang Qian. A Chinese Text Classification Algorithm Based on Complex Network [J]. Journal of the China Society for Scientific and Technical Information, 2012, 31(11): 1179-1186.)
[13] Liu H. The Complexity of Chinese Syntactic Dependency Networks [J]. Physica A: Statistical Mechanics and Its Applications, 2008, 387(12): 3048-3058.
[14] Liu G, Zhai Z. Research on Keywords Extraction of Chinese Documents Based on TEXT-NET [C]. In: Proceedings of 2011 International Conference on Electric Information and Control Engineering (ICEICE), Wuhan, China. IEEE, 2011: 6074- 6077.
[15] Hensman S. Construction of Conceptual Graph Representa­tion of Texts [C]. In: Proceedings of the Student Research Workshop at HLT-NAACL 2004. Stroudsburg: Association for Computational Linguistics, 2004: 49-54.
[16] 谢凤宏, 张大为, 黄丹, 等. 基于加权复杂网络的文本关键词提取[J]. 系统科学与数学, 2010, 30(11): 1592-1596. (Xie Fenghong, Zhang Dawei, Huang Dan, et al. Keywords Extraction Based on Weighted Complex Network [J]. Journal of Systems Science and Mathematical Sciences, 2010, 30(11): 1592-1596.)
[17] 吕西安·泰尼埃尔. 结构句法基础[G]. 北京: 中国人民大学语言文学系, 1987. (Tesniere L. The Basis of Structure Syntax [G]. Beijing: Language and Literature Department of Renmin University of China, 1987.)
[18] 李彬, 刘挺, 秦兵, 等. 基于语义依存的汉语句子相似度计算[J]. 计算机应用研究, 2003, 20(12): 15-17. (Li Bin, Liu Ting, Qin Bing, et al. Chinese Sentence Similarity Computing Based on Semantic Dependency Relationship Analysis [J]. Application Research of Computers, 2003, 20(12): 15-17.)
[19] 王鹏, 樊兴华. 中文文本分类中利用依存关系的实验研究[J]. 计算机工程与应用, 2010, 46(3): 131-133, 141. (Wang Peng, Fan Xinghua. Study on Chinese Text Classification Based on Dependency Relation [J]. Computer Engineering and Applications, 2010, 46(3): 131-133, 141.)
[20] Che W, Li Z, Liu T. LTP: A Chinese Language Technology Platform [C]. In: Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations, Beijing, China. Stroudsburg: Association for Computational Linguistics, 2010: 13-16.
[21] Matsuo Y, Ohsawa Y, Ishizuka M. A Document as a Small World[A]//New Frontiers in Artificial Intelligence [M]. Springer Berlin Heidelberg, 2001: 444-448.
[22] 刘知远, 郑亚斌, 孙茂松. 汉语依存句法网络的复杂网络性质[J]. 复杂系统与复杂性科学, 2008, 5(2): 37-45. (Liu Zhiyuan, Zheng Yabin, Sun Maosong. Complex Network Properties of Chinese Syntactic Dependency Network [J]. Complex Systems and Complexity Science, 2008, 5(2): 37-45.)
[23] 刘海涛. 汉语句法网络的复杂性研究[J]. 复杂系统与复杂性科学, 2007, 4(4): 38-44. (Liu Haitao. The Complexity of Chinese Syntactic Network[J]. Complex Systems and Complexity Science, 2007, 4(4): 38-44.)
[24] 刘旭. 克里米亚公投结束 民调显示93%选民赞成入俄[EB/OL]. (2014-03-17). http://news.sohu.com/20140317/ n396701134.shtml. (Liu Xu. The End of the Crimean Referendum Poll Shows 93% of Voters is in Favor of the Entry of Russia [EB/OL]. (2014-03-17). http://news.sohu. com/20140317/n396701134.shtml.)
[25] The Open Graph Viz Platform [EB/OL]. [2014-03-05]. http:// www.gephi.org.
[26] 张巍. 基于PageRank算法的搜索引擎优化策略研究[D]. 成都: 四川大学, 2005. (Zhang Wei. Research on Optimizing Strategies of Search Engine Based on PageRank Algorithm [D]. Chengdu: Sichuan University, 2005.)
[27] 陈小飞, 王轶彤, 冯小军. 一种基于网页质量的PageRank算法改进[J]. 计算机研究与发展, 2009, 46(S): 381-387. (Chen Xiaofei, Wang Yitong, Feng Xiaojun. An Improvement of PageRank Algorithm Based on Page Quality [J]. Journal of Computer Research and Development, 2009, 46(S): 381-387.)
[28] 夏天. 词语位置加权TextRank的关键词抽取研究[J]. 现代图书情报技术, 2013(9): 30-34. (Xia Tian. Study on Keyword Extraction Using Word Position Weighted TextRank[J].New Technology of Library and Information Service, 2013(9): 30-34.)
[29] Zhang H, Yu H, Xiong D, et al. HHMM-based Chinese Lexical Analyzer ICTCLAS[C]. In: Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing (SIGHAN'03), Sapporo, Japan. Stroudsburg: Association for Computational Linguistics, 2003: 184-187.
[30] The Stanford Parser: A Statistical Parser [EB/OL]. [2014-05- 29]. http://nlp.stanford.edu/software/lex-parser.shtml#Download.
[31] 陈果, 胡昌平. 科研领域关键词网络的结构特征与启示——基于图情学科的实证研究[J]. 现代图书情报技术, 2014(7-8): 84-91. (Chen Guo, Hu Changping. Research on the Structural Features of Keyword Network of Scientific Research Areas: An Empirical Study of LIS [J]. New Technology of Library and Information Service, 2014(7-8): 84-91.)

[1] Fan Tao,Wang Hao,Wu Peng. Sentiment Analysis of Online Users' Negative Emotions Based on Graph Convolutional Network and Dependency Parsing[J]. 数据分析与知识发现, 2021, 5(9): 97-106.
[2] Chen Wenjie,Wen Yi,Yang Ning. Fuzzy Overlapping Community Detection Algorithm Based on Node Vector Representation[J]. 数据分析与知识发现, 2021, 5(5): 41-50.
[3] Zheng Xinman, Dong Yu. Constructing Degree Lexicon for STI Policy Texts[J]. 数据分析与知识发现, 2021, 5(10): 81-93.
[4] Li Wenzheng,Gu Yijun,Yan Hongli. Predicting Community Numbers with Network Bayesian Information Criterion[J]. 数据分析与知识发现, 2020, 4(4): 72-82.
[5] Cai Jingxuan,Wu Jiang,Wang Chengkun. Predicting Usefulness of Crowd Testing Reports with Deep Learning[J]. 数据分析与知识发现, 2020, 4(11): 102-111.
[6] Hui Nie,Huan He. Identifying Implicit Features with Word Embedding[J]. 数据分析与知识发现, 2020, 4(1): 99-110.
[7] Gang Li,Huayang Zhou,Jin Mao,Sijing Chen. Classifying Social Media Users with Machine Learning[J]. 数据分析与知识发现, 2019, 3(8): 1-9.
[8] Xiaofeng Li,Jing Ma,Chi Li,Hengmin Zhu. Identifying Commodity Names Based on XGBoost Model[J]. 数据分析与知识发现, 2019, 3(7): 34-41.
[9] Xiang Li,Xiaodong Qian. Research on Impact of Commodity Online Evaluation for Consumption Convergence[J]. 数据分析与知识发现, 2019, 3(3): 102-111.
[10] Jiao Yan,Jing Ma,Kang Fang. Computing Text Semantic Similarity with Syntactic Network of Co-occurrence Distance[J]. 数据分析与知识发现, 2019, 3(12): 93-100.
[11] Qinghong Zhong,Xiaodong Qiao,Yunliang Zhang,Mengjuan Weng. Cross-media Fusion Method Based on LDA2Vec and Residual Network[J]. 数据分析与知识发现, 2019, 3(10): 78-88.
[12] Wuxuan Jiang,Huixiang Xiong,Jiaxin Ye,Ning An. Creating Dynamic Tags for Social Networking Groups[J]. 数据分析与知识发现, 2019, 3(10): 98-109.
[13] Guijun Yang,Xue Xu,Fuqiang Zhao. Predicting User Ratings with XGBoost Algorithm[J]. 数据分析与知识发现, 2019, 3(1): 118-126.
[14] Qian Xiaodong,Li Min. Identifying E-commerce User Types Based on Complex Network Overlapping Community[J]. 数据分析与知识发现, 2018, 2(6): 79-91.
[15] Zhou Lixin,Lin Jie. Extracting Product Features with NodeRank Algorithm[J]. 数据分析与知识发现, 2018, 2(4): 90-98.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn