Please wait a minute...
New Technology of Library and Information Service  2015, Vol. 31 Issue (2): 39-45    DOI: 10.11925/infotech.1003-3513.2015.02.06
Current Issue | Archive | Adv Search |
Research on Chinese Text Categorization Based on Semantic Similarity of HowNet
Liu Huailiang, Du Kun, Qin Chunxiu
School of Economics & Management, Xidian University, Xi'an 710126, China
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This is an algorithm for improving the classification precision of Chinese text classification, which can calculate the similarity between Chinese texts more accurately. [Methods] With the TF-IDF algorithm calculating item weight and HowNet analyzing the semantic relationships between lexical items, this paper proposes a text similarity weighting algorithm based on HowNet semantics similarity, and makes an experiment on its Chinese text classification. [Results] The experiment resualts show that the proposed method can improve the text categorization performance comparing with the traditional ones. [Limitations] This algorithm is quite high in its time complexity, and its speed of text classification needs to be improved. [Conclusions] It is proved to be an effective algorithm for enhancing the classification accuracy of Chinese text by analyzing the semantic relationships between feature items.

Key wordsText classification      Semantic similarity      HowNet     
Received: 22 September 2014      Published: 17 March 2015
:  G353.1  

Cite this article:

Liu Huailiang, Du Kun, Qin Chunxiu. Research on Chinese Text Categorization Based on Semantic Similarity of HowNet. New Technology of Library and Information Service, 2015, 31(2): 39-45.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2015.02.06     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2015/V31/I2/39

[1] 中国互联网络信息中心. 第34 次中国互联网络发展状况统 计报告[EB/OL]. [2014-07-21]. http://www.cnnic.net.cn. (China Internet Network Information Center. The 34th Statistical Report on Internet Development in China [EB/OL]. [2014-07-21]. http://www.cnnic.net.cn.)
[2] 刘青磊, 顾小丰. 基于《知网》的词语相似度算法研究[J]. 中文信息学报, 2011, 24(6): 31-36. (Liu Qinglei, Gu Xiaofeng. Study on HowNet-based Word Similarity Algorithm [J]. Journal of Chinese Information Processing, 2011, 24(6): 31-36.)
[3] 唐歆瑜, 乐文忠, 李志成, 等. 基于知网语义相似度计算 的特征降维方法研究[J]. 科学技术与工程, 2006, 6(21): 3442-3446. (Tang Xinyu, Le Wenzhong, Li Zhicheng, et al. The Research on Reduced Feature Dimension Based on Hownet Similarity Computing [J]. Science Technology and Engineering, 2006, 6(21): 3442-3446.)
[4] 江敏, 肖诗斌, 王弘蔚, 等. 一种改进的基于《知网》的词 语语义相似度计算[J]. 2008, 22(5): 84-89. (Jiang Min, Xiao Shibin, Wang Hongwei, et al. An Improved Word Similarity Computing Method Based on HowNet [J]. Journal of Chinese Information Processing, 2008, 22(5): 84-89.)
[5] 朱征宇, 孙俊华. 改进的基于《知网》的词汇语义相似度计 算[J]. 计算机应用, 2013, 33(8): 2276-2279, 2288. (Zhu Zhengyu, Sun Junhua. Improved Vocabulary Semantic Similarity Calculation Based on HowNet [J]. Journal of Computer Applications, 2013, 33(8): 2276-2279, 2288.)
[6] 肖志军, 冯广丽. 基于《知网》义原空间的文本相似度计算 [J]. 科学技术与工程, 2013, 13(29): 8651-8656. (Xiao Zhijun, Feng Guangli. Text Similarity Computing Based on HowNet Sememe Space [J]. Science Technology and Engineering, 2013, 13(29): 8651-8656.)
[7] 白秋产, 金春霞, 周海岩. 概念向量文本聚类算法[J]. 计 算机工程与应用, 2011, 47(35): 155-157, 209. (Bai Qiuchan, Jin Chunxia, Zhou Haiyan. Text Clustering Algorithm Based on Concept Vector [J]. Computer Engineering and Applications, 2011, 47(35): 155-157, 209.)
[8] Salton G, Yang C S. On the Specification of Term Value in Automatic Indexing [J]. Journal of Documentation, 1973, 29(4): 351-372.
[9] Satlon G, Wong A, Yang C S. A Vector Space Model for Automatic Indexing [J]. Communications of ACM, 1975, 18(11): 613-620.
[10] Salton G, McGill M J. Introduction to Modern Information Retrieval [M]. New York: McGraw-Hill Inc, 1986.
[11] 刘群, 李素建. 基于知网的词汇语义相似度计算[C]. 见: 第三届汉语词汇语义学研讨会, 2002: 59-76. (Liu Qun, Li Sujian. Vocabulary Semantic Similarity Calculation Based on HowNet [C]. In: Proceedings of Chinese Lexical Semantic Workshop 2002. 2002: 59-76.)
[12] 孙继明, 李舟军, 文健. 基于《知网》的汉语词语词义消歧 方法[J]. 计算机与信息技术, 2007(3): 18-20. (Sun Jiming, Li Zhoujun, Wen Jian. Method of Chinese Word Sense Disambiguation Based on Hownet [J]. Computer and Information Technology, 2007(3): 18-20.)
[13] Tan P, Steinbach M, Kumar V. 数据挖掘导论[M]. 北京: 人 民邮电出版社, 2011. (Tan P, Steinbach M, Kumar V. Introduction to Data Mining [M]. Beijing: Posts & Telecom Press, 2011.)
[14] 中国科学院计算技术研究所. ICTCLAS 汉语分词系统 [EB/OL]. [2014-07-06]. http://ictclas.org/ictclas_download. aspx. (Institute of Computing Technology, Chinese Academy of Sciences. ICTCLAS [EB/OL]. [2014-07-06]. http://ictclas.org/ictclas_download.aspx.)
[15] 哈工大社会计算与信息检索研究中心. 《同义词词林》扩展版[EB/OL]. [2014-07-10]. http://ir.hit.edu.cn/.(HIT-SCIR. Tongyicicilin [EB/OL]. [2014-07-10]. http://ir.hit.edu.cn/.)
[16] 刘怀亮, 张志国, 马志辉, 等.基于KNN 的中文文本分类反馈 学习研究[J]. 图书情报工作, 2008, 52(10): 101-104. (Liu Huailiang, Zhang Zhiguo, Ma Zhihui, et al. A Feedback Learning Study of Chinese Text Categorization Based on KNN [J]. Library and Information Service, 2008, 52(10): 101-104.)

[1] Chen Jie,Ma Jing,Li Xiaofeng. Short-Text Classification Method with Text Features from Pre-trained Models[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[2] Zhou Zeyu,Wang Hao,Zhao Zibo,Li Yueyan,Zhang Xiaoqin. Construction and Application of GCN Model for Text Classification with Associated Information[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
[3] Yu Bengong,Zhu Xiaojie,Zhang Ziwei. A Capsule Network Model for Text Classification with Multi-level Feature Extraction[J]. 数据分析与知识发现, 2021, 5(6): 93-102.
[4] Wang Yan, Wang Huyan, Yu Bengong. Chinese Text Classification with Feature Fusion[J]. 数据分析与知识发现, 2021, 5(10): 1-14.
[5] Wang Sidi,Hu Guangwei,Yang Siyu,Shi Yun. Automatic Transferring Government Website E-Mails Based on Text Classification[J]. 数据分析与知识发现, 2020, 4(6): 51-59.
[6] Xu Yuemei,Liu Yunwen,Cai Lianqiao. Predicitng Retweets of Government Microblogs with Deep-combined Features[J]. 数据分析与知识发现, 2020, 4(2/3): 18-28.
[7] Xu Tongtong,Sun Huazhi,Ma Chunmei,Jiang Lifen,Liu Yichen. Classification Model for Few-shot Texts Based on Bi-directional Long-term Attention Features[J]. 数据分析与知识发现, 2020, 4(10): 113-123.
[8] Bengong Yu,Yumeng Cao,Yangnan Chen,Ying Yang. Classification of Short Texts Based on nLD-SVM-RF Model[J]. 数据分析与知识发现, 2020, 4(1): 111-120.
[9] Weimin Nie,Yongzhou Chen,Jing Ma. A Text Vector Representation Model Merging Multi-Granularity Information[J]. 数据分析与知识发现, 2019, 3(9): 45-52.
[10] Yunfei Shao,Dongsu Liu. Classifying Short-texts with Class Feature Extension[J]. 数据分析与知识发现, 2019, 3(9): 60-67.
[11] Heran Qin,Liu Liu,Bin Li,Dongbo Wang. Automatic Classification of Ancient Classics with Entity Features[J]. 数据分析与知识发现, 2019, 3(9): 68-76.
[12] Guo Chen,Tianxiang Xu. Sentence Function Recognition Based on Active Learning[J]. 数据分析与知识发现, 2019, 3(8): 53-61.
[13] Bengong Yu,Yangnan Chen,Ying Yang. Classifying Short Text Complaints with nBD-SVM Model[J]. 数据分析与知识发现, 2019, 3(5): 77-85.
[14] Zhiyong Tao,Xiaobing Li,Ying Liu,Xiaofang Liu. Classifying Short Texts with Improved-Attention Based Bidirectional Long Memory Network[J]. 数据分析与知识发现, 2019, 3(12): 21-29.
[15] Jiao Yan,Jing Ma,Kang Fang. Computing Text Semantic Similarity with Syntactic Network of Co-occurrence Distance[J]. 数据分析与知识发现, 2019, 3(12): 93-100.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn