Please wait a minute...
New Technology of Library and Information Service  2013, Vol. 29 Issue (10): 20-26    DOI: 10.11925/infotech.1003-3513.2013.10.04
Current Issue | Archive | Adv Search |
A Staged and Integrated Semantic Similarity Algorithm of Text
Ma Junhong
Engineering Institute, Xi'an International University, Xi'an 710077, China
Download: PDF(672 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  For Chinese text information retrieval, a staged and integrated similarity algorithm of text is proposed, which processes sentences, paragraphs and the whole document stage by stage. The algorithm combines the topic and application ranges of document, and the corresponding weight is given to the feature words via the weighted calculation method with the semantic enhancement. Moreover, these weights are integrated into the calculated factors of the text semantic with the characteristics of each calculation phase, respectively to reach the aim of finding a more accurate similarity calculation results for Chinese text similarity calculation. Finally, a text similarity computing system is built and the improved algorithm of the system achieves better experimental results comparing with the traditional algorithms.
Key wordsTexts similarity      Information retrieval      Semantic similarity      Term weight     
Received: 05 July 2013      Published: 04 November 2013
:  TP391  

Cite this article:

Ma Junhong. A Staged and Integrated Semantic Similarity Algorithm of Text. New Technology of Library and Information Service, 2013, 29(10): 20-26.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2013.10.04     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2013/V29/I10/20

[1] 赵辉, 刘怀亮, 范云杰. 复杂网络理论在中文文本特征选择中的应用研究[J]. 现代图书情报技术, 2012(9):23-28. (Zhao Hui, Liu Huailiang, Fan Yunjie. Study on the Application of Complex Network Theory in Chinese Text Feature Selection [J]. New Technology of Library and Information Service,2012(9):23-28.)
[2] 金希茜. 基于语义相似度的中文文本相似度算法研究[D]. 杭州:浙江工业大学, 2009. (Jin Xiqian. Chinese Text Similarity Algorithm Research Based on Semantic Similarity[D].Hangzhou: Zhejiang University of Technology, 2009.)
[3] 舒晓明. 基于语义网的个性化信息检索的研究与实现[D].沈阳:沈阳工业大学, 2011.(Shu Xiaoming. Research and Realization of Personalized Information Retrieval Based on Semantic Web[D]. Shenyang: Shenyang University of Technology, 2011.)
[4] 陈涛, 林杰. 基于搜索引擎关注度的网络舆情时空演化比较分析——以谷歌趋势和百度指数比较为例[J]. 情报杂志, 2013,32(3):7-11.(Chen Tao, Lin Jie. Comparative Analysis of Temporal-Spatial Evolution of Online Public Opinion Based on Search Engine Attention——Cases of Google Trends and Baidu Index[J]. Journal of Intelligence,2013,32(3): 7-11.)
[5] 王静帆. 基于文本相似度的二阶段招聘信息检索[D]. 北京: 清华大学, 2007. (Wang Jingfan. Two-Step Job Information Retrieval Based on Document Similarity [D]. Beijing: Tsinghua University, 2007.)
[6] 谭慧琳, 刘先锋. 基于遗传算法的知识推理研究[J]. 电脑知识与技术,2011,7(31):55-59.(Tan Huilin, Liu Xianfeng. The Research of the Selection of Knowledge Reasoning Method Based on Genetic Algorithm [J]. Computer Knowledge and Technology, 2011, 7(31):55-59.)
[7] 路永和, 李焰锋. 多因素影响的特征选择方法[J]. 现代图书情报技术, 2013(5): 34-39.(Lu Yonghe, Li Yanfeng. A Feature Selection Based on Consideration of Multiple Factors[J]. New Technology of Library and Information Service, 2013(5): 34-39.)
[8] 黎邦群. 基于Mashup的特殊词快捷检索及检索建议[J]. 图书情报工作, 2012, 56(17): 126-130.(Li Bangqun. Quick Search of Special Words and Search Suggestions Based on Mashup [J]. Library and Information Service, 2012, 56(17): 126-130.)
[9] Duan Y X, Lei H. The Formal Definitions of Semantic Web Services and Satisfiability [J]. International Journal of Advancements in Computing Technology,2012,4(23): 327-335.
[10] Lee M C. A Novel Sentence Similarity Measure for Semantic-based Expert Systems [J]. Expert Systems with Applications, 2011, 38(5):6392-6399.
[11] 王蕊,冯登国,杨轶. 基于语义的恶意代码行为特征提取及检测方法[J]. 软件学报,2012,23(2):378-393.(Wang Rui,Feng Dengguo,Yang Yi. Semantics-based Malware Behavior Signature Extraction and Detection Method [J]. Journal of Software, 2012, 23(2):378-393.)
[12] 刘兵.Web数据挖掘[M].北京:清华大学出版社,2011:113-119.(Liu Bing. Web Data Mining [M].Beijing: Tsinghua University Press, 2011:113-119.)
[1] Haixia Sun,Lei Wang,Yingjie Wu,Weina Hua,Junlian Li. Matching Strategies for Institution Names in Literature Database[J]. 数据分析与知识发现, 2018, 2(8): 88-97.
[2] Longjia Jia,Bangzuo Zhang. Classifying Topics of Internet Public Opinion from College Students: Case Study of Sina Weibo[J]. 数据分析与知识发现, 2018, 2(7): 55-62.
[3] Chaofan Yang,Zhonghua Deng,Xin Peng,Bin Liu. Review of Information Retrieval Research: Case Study of Conference Papers[J]. 数据分析与知识发现, 2017, 1(7): 35-43.
[4] Erjing Chen,Enbo Jiang. Review of Studies on Text Similarity Measures[J]. 数据分析与知识发现, 2017, 1(6): 1-11.
[5] Zixuan Wang,Xiaoqiu Le,Yuanbiao He. Recognizing Core Topic Sentences with Improved TextRank Algorithm Based on WMD Semantic Similarity[J]. 数据分析与知识发现, 2017, 1(4): 1-8.
[6] Dongsheng Zhai,Wenhao Cai,Jie Zhang,Zhenfei Li. An Improved Method of Semantic Similarity Calculation of Chinese Trademarks[J]. 数据分析与知识发现, 2017, 1(11): 19-28.
[7] Xiaojuan Zhang, Yi Han. Reviews on Temporal Information Retrieval[J]. 数据分析与知识发现, 2017, 1(1): 3-15.
[8] Mingxuan Huang. Cross Language Information Retrieval Model Based on Matrix-weighted Association Patterns Mining[J]. 数据分析与知识发现, 2017, 1(1): 26-36.
[9] Liu Jian,Bi Qiang,Liu Qingxu,Wang Fu. New Content Recommendation Service of Digital Literature[J]. 现代图书情报技术, 2016, 32(9): 70-77.
[10] Ding Heng,Lu Wei. Building Standard Literature Knowledge Service System[J]. 现代图书情报技术, 2016, 32(7-8): 120-128.
[11] Ba Zhichao,Li Gang,Zhu Shiwei. Similarity Measurement of Research Interests in Semantic Network[J]. 现代图书情报技术, 2016, 32(4): 81-90.
[12] Qiang Bi, Jian Liu, Yulai Bao. A New Text Clustering Method Based on Semantic Similarity[J]. 数据分析与知识发现, 2016, 32(12): 9-16.
[13] Heng Ding, Wei Lu. A Study on Correlation-based Cross-Modal Information Retrieval[J]. 现代图书情报技术, 2016, 32(1): 17-23.
[14] Liu Huailiang, Du Kun, Qin Chunxiu. Research on Chinese Text Categorization Based on Semantic Similarity of HowNet[J]. 现代图书情报技术, 2015, 31(2): 39-45.
[15] Fan Xuexue, Wang Zhirong, Xu Wu, Liang Yin, Ma Xiaohu. Research on Semantic Similarity Estimation Algorithm of Medical Terminology Based on Medical Ontology[J]. 现代图书情报技术, 2015, 31(12): 57-64.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn