Please wait a minute...
New Technology of Library and Information Service  2014, Vol. 30 Issue (10): 49-55    DOI: 10.11925/infotech.1003-3513.2014.10.08
Current Issue | Archive | Adv Search |
Semantic Incremental Improvement on Vector Space Model for Text Modeling
Hu Jiming, Xiao Lu
Center for the Studies of Information Resources, Wuhan University, Wuhan 430072, China
Download: PDF(552 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper improves the methods of text classification based on VSM using semantic increment, and the model is verified by experiments. [Methods] Combing the studies of semantic vector and its improvement in text representation, this paper improves VSM based on semantic increment, and proposes an implementation frame of semantic vector representation of texts. Furthermore, based on the mapping relationships between words and concepts in domain Ontology, the construction of concept hierarchy tree and words positioning are constructed, semantic similarity of concepts is calculated, and the semantic vector model of texts' representation is achieved. [Results] The comparative experiments of texts classification demonstrate that the proposed method is feasible and effective, and the performance of this method is better than traditional methods from the perspectives of Precison, Recall and F1-Measure. [Limitations] The description of text semantic information is not good enough, and it is necessary to explore the authentic semantic methods in text modeling. In addition, more comparative experiments on several datasets should be conducted in order to obtain more accurate results. [Conclusions] The semantic improvement on traditional VSM is explored which is important for further text classification and semantic association.

Key wordsText modeling      Semantic Vector Space Model      Semantic increment      Semantic similarity     
Received: 17 March 2014      Published: 28 November 2014
:  TP391  

Cite this article:

Hu Jiming, Xiao Lu. Semantic Incremental Improvement on Vector Space Model for Text Modeling. New Technology of Library and Information Service, 2014, 30(10): 49-55.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2014.10.08     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2014/V30/I10/49

[1] Salton G, Wong A, Yang C S. A Vector Space Model for Automatic Indexing [J]. Communications of the ACM, 1975, 18(11): 613-620.
[2] Liu G Z. The Semantic Vector Space Model (SVSM): A Text Representation and Searching Technique [C]. In: Proceedings of the 27th Hawaii International Conference on System Science. 1994:928-937.
[3] 杨玉珍, 刘培玉, 姜沛佩. 向量空间模型中结合句法的文本表示研究[J]. 计算机工程, 2011, 37(3): 58-60. (Yang Yuzhen, Liu Peiyu, Jiang Peipei. Research on Text Representation with Combination of Syntactic in Vector Space Model [J]. Computer Engineering, 2011, 37(3): 58-60.)
[4] Chang B, Dho H, Lee Y, et al. Concept Based Learning Contents Retrieval by Using Extended Vector Space Model with Ontology [J]. Information-an International Interdisciplinary Journal, 2012, 15(2): 793-804.
[5] Tasi C, Huang Y, Liu C, et al. Applying VSM and LCS to Develop an Integrated Text Retrieval Mechanism [J]. Expert Systems with Applications, 2012, 39(4): 3974-3982.
[6] Virpioja S, Paukkeri M, Tripathi A, et al. Evaluating Vector Space Models with Canonical Correlation Analysis [J]. Natural Language Engineering, 2012, 18(3): 399-436.
[7] Nasir J A, Varlamis I, Karim A, et al. Semantic Smoothing for Text Clustering [J]. Knowledge-Based Systems, 2012, 54: 216-229.
[8] Sbattella L, Tedesco R. A Novel Semantic Information Retrieval System Based on a Three-level Domain Model [J]. Journal of Systems and Software, 2013, 86(5): 1426-1452.
[9] Liu G Z. Semantic Vector Space Model: Implementation and Evaluation [J]. Journal of the American Society for Information Science, 1997, 48(5): 395-417.
[10] Zadeh P D H, Reformat M Z. Assessment of Semantic Similarity of Concepts Defined in Ontology [J]. Information Sciences, 2013, 250: 21-39.
[11] Bobillo F, Delgado M, Sanchez-Sanchez J C. Parallel Algorithms for Fuzzy Ontology Reasoning [J]. IEEE Transactions on Fuzzy Systems, 2013, 21(4): 775-781.
[12] Turney P D, Pantel P. From Frequency to Meaning: Vector Space Models of Semantics [J]. Journal of Artificial Intelligence Research, 2010, 37(1): 141-188.
[13] 余传明, 张小青, 陈雷. 基于LDA模型的评论热点挖掘:原理与实现[J]. 情报理论与实践, 2010, 33(5): 103-106. (Yu Chuanming, Zhang Xiaoqing, Chen Lei. Mining Hot Topics of User Comment Based on LDA Model: Principle & Approach [J]. Information Studies: Theory & Application, 2010, 33(5): 103-106.)
[14] Maedche A, Staab S. Ontology Learning for the Semantic Web[J]. IEEE Intelligent Systems, 2001, 16(2): 72-79.
[15] 唐明伟, 卞艺杰, 陶飞飞. 基于领域本体的语义向量空间模型[J]. 情报学报, 2011, 30(9): 951-955. (Tang Mingwei, Bian Yijie, Tao Feifei. Semantic Vector Space Model Based on Domain Ontology [J]. Journal of the China Society for Scientific and Technical Information, 2011, 30(9): 951-955.)
[16] Oleshchuk V, Pedersen A. Ontology Based Semantic Similarity Comparison of Documents [C]. In: Proceedings of the 14th International Workshop on Database and Expert Systems Applications. IEEE, 2003: 735-738.
[17] 魏凯斌, 冉延平, 余牛. 语义相似度的计算方法研究与分析[J]. 计算机技术与发展, 2010, 20(7): 102-105. (Wei Kaibin, Ran Yanping, Yu Niu. The Research and Analysis of Computing Methods on Semantic Similarity [J]. Computer Technology and Development, 2010, 20(7): 102-105.)
[18] Sanchez D, Batet M. A Semantic Similarity Method Based on Information Content Exploiting Multiple Ontologies [J]. Expert Systems with Applications, 2013, 40(4): 1393-1399.
[19] Pietranik M, Nguyen N T. Semantic Distance Measure Between Ontology Concept's Attributes [C]. In: Proceedings of the 15th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems. Berlin, Heidelberg: Springer-Verlag, 2011: 210-219.
[20] Turney P D. Similarity of Semantic Relations [J]. Computational Linguistics, 2006, 32(3): 379-416.
[21] 谭松波, 王月粉. 中文文本分类语料库-TanCorpV1.0 [OL]. [2013-09-10]. http://www.searchforum.org.cn/tansongbo/corpus. htm. (Tan Songbo, Wang Yuefen. The Corpus of Chinese Text Classification- TanCorpV1.0 [OL]. [2013-09-10]. http://www. searchforum.org.cn/tansongbo/corpus.htm.)
[22] 中国科学院计算技术研究所. ICTCLAS2011[EB/OL]. [2013-09-21]. http://ictclas.org/ictclas_download.aspx. (Institute of Computing Technology, Chinese Academy of Sciences. ICTCLAS2011[EB/OL]. [2013-09-21]. http://ictclas.org/ictclas_ download.aspx.)
[23] 求TanCorp的文档向量[EB/OL]. [2014-03-10]. http://www. cnblogs.com/zhangchaoyang/articles/2355397.html. (Calculate the Text Vector from TanCorp [EB/OL]. [2014-03-10]. http://www.cnblogs.com/zhangchaoyang/articles/2355397.html.)
[24] Tsang I W, Kocsor A, Kwok J T. LibCVM Toolkit Version: 2.2 (beta)[EB/OL]. [2011-08-29]. http://c2inet.sce.ntu.edu.sg/ ivor/cvm.html.
[25] Y?ld?r?m E A. Two Algorithms for the Minimum Enclosing Ball Problem [J]. SIAM Journal on Optimization, 2008, 19(3): 1368-1391.
[26] Sebastiani F. Machine Learning in Automated Text Categorization [J]. ACM Computing Surveys, 2002, 34(1): 1-47.
[27] Mobasher B, Dai H, Luo T, et al. Discovery and Evaluation of Aggregate Usage Profiles for Web Personalization [J]. Data Mining and Knowledge Discovery, 2002, 6(1): 61-82.

[1] Erjing Chen,Enbo Jiang. Review of Studies on Text Similarity Measures[J]. 数据分析与知识发现, 2017, 1(6): 1-11.
[2] Zixuan Wang,Xiaoqiu Le,Yuanbiao He. Recognizing Core Topic Sentences with Improved TextRank Algorithm Based on WMD Semantic Similarity[J]. 数据分析与知识发现, 2017, 1(4): 1-8.
[3] Dongsheng Zhai,Wenhao Cai,Jie Zhang,Zhenfei Li. An Improved Method of Semantic Similarity Calculation of Chinese Trademarks[J]. 数据分析与知识发现, 2017, 1(11): 19-28.
[4] Liu Jian,Bi Qiang,Liu Qingxu,Wang Fu. New Content Recommendation Service of Digital Literature[J]. 现代图书情报技术, 2016, 32(9): 70-77.
[5] Ba Zhichao,Li Gang,Zhu Shiwei. Similarity Measurement of Research Interests in Semantic Network[J]. 现代图书情报技术, 2016, 32(4): 81-90.
[6] Qiang Bi, Jian Liu, Yulai Bao. A New Text Clustering Method Based on Semantic Similarity[J]. 数据分析与知识发现, 2016, 32(12): 9-16.
[7] Liu Huailiang, Du Kun, Qin Chunxiu. Research on Chinese Text Categorization Based on Semantic Similarity of HowNet[J]. 现代图书情报技术, 2015, 31(2): 39-45.
[8] Fan Xuexue, Wang Zhirong, Xu Wu, Liang Yin, Ma Xiaohu. Research on Semantic Similarity Estimation Algorithm of Medical Terminology Based on Medical Ontology[J]. 现代图书情报技术, 2015, 31(12): 57-64.
[9] He Chao, Zhang Yufeng. Research on Business Intelligence Link Analysis Algorithm Combining Semantic Similarity[J]. 现代图书情报技术, 2013, 29(3): 27-32.
[10] Sun Haixia, Li Junlian, Li Danya, Wu Yingjie, Li Xiaoying. The Study on Semantic Mapping from Free Word to Subject Headings Based on Semantic System of CMeSH[J]. 现代图书情报技术, 2013, 29(11): 46-51.
[11] Ma Junhong. A Staged and Integrated Semantic Similarity Algorithm of Text[J]. 现代图书情报技术, 2013, 29(10): 20-26.
[12] Wang Li. Dynamic Faceted Method Based on Keyword Chains[J]. 现代图书情报技术, 2012, 28(7): 76-81.
[13] Xing Meifeng. Study on Solution to Redundancy of Scientific Literature Keywords[J]. 现代图书情报技术, 2012, 28(1): 34-39.
[14] Xu Jian Zhang Zhixiong Xiao Zhuo Deng Zhaojun. Review on Scientific and Technical Term Semantic Similarity Measure Methods[J]. 现代图书情报技术, 2010, 26(7/8): 51-57.
[15] Sun Haixia Qian Qing Wu Yingjie Li Junlian. Research on Semantic Similarity Measuring of MeSH[J]. 现代图书情报技术, 2010, 26(6): 12-16.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn