|
|
A Staged and Integrated Semantic Similarity Algorithm of Text |
Ma Junhong |
Engineering Institute, Xi'an International University, Xi'an 710077, China |
|
|
Abstract For Chinese text information retrieval, a staged and integrated similarity algorithm of text is proposed, which processes sentences, paragraphs and the whole document stage by stage. The algorithm combines the topic and application ranges of document, and the corresponding weight is given to the feature words via the weighted calculation method with the semantic enhancement. Moreover, these weights are integrated into the calculated factors of the text semantic with the characteristics of each calculation phase, respectively to reach the aim of finding a more accurate similarity calculation results for Chinese text similarity calculation. Finally, a text similarity computing system is built and the improved algorithm of the system achieves better experimental results comparing with the traditional algorithms.
|
Received: 05 July 2013
Published: 04 November 2013
|
|
[1] 赵辉, 刘怀亮, 范云杰. 复杂网络理论在中文文本特征选择中的应用研究[J]. 现代图书情报技术, 2012(9):23-28. (Zhao Hui, Liu Huailiang, Fan Yunjie. Study on the Application of Complex Network Theory in Chinese Text Feature Selection [J]. New Technology of Library and Information Service,2012(9):23-28.) [2] 金希茜. 基于语义相似度的中文文本相似度算法研究[D]. 杭州:浙江工业大学, 2009. (Jin Xiqian. Chinese Text Similarity Algorithm Research Based on Semantic Similarity[D].Hangzhou: Zhejiang University of Technology, 2009.) [3] 舒晓明. 基于语义网的个性化信息检索的研究与实现[D].沈阳:沈阳工业大学, 2011.(Shu Xiaoming. Research and Realization of Personalized Information Retrieval Based on Semantic Web[D]. Shenyang: Shenyang University of Technology, 2011.) [4] 陈涛, 林杰. 基于搜索引擎关注度的网络舆情时空演化比较分析——以谷歌趋势和百度指数比较为例[J]. 情报杂志, 2013,32(3):7-11.(Chen Tao, Lin Jie. Comparative Analysis of Temporal-Spatial Evolution of Online Public Opinion Based on Search Engine Attention——Cases of Google Trends and Baidu Index[J]. Journal of Intelligence,2013,32(3): 7-11.) [5] 王静帆. 基于文本相似度的二阶段招聘信息检索[D]. 北京: 清华大学, 2007. (Wang Jingfan. Two-Step Job Information Retrieval Based on Document Similarity [D]. Beijing: Tsinghua University, 2007.) [6] 谭慧琳, 刘先锋. 基于遗传算法的知识推理研究[J]. 电脑知识与技术,2011,7(31):55-59.(Tan Huilin, Liu Xianfeng. The Research of the Selection of Knowledge Reasoning Method Based on Genetic Algorithm [J]. Computer Knowledge and Technology, 2011, 7(31):55-59.) [7] 路永和, 李焰锋. 多因素影响的特征选择方法[J]. 现代图书情报技术, 2013(5): 34-39.(Lu Yonghe, Li Yanfeng. A Feature Selection Based on Consideration of Multiple Factors[J]. New Technology of Library and Information Service, 2013(5): 34-39.) [8] 黎邦群. 基于Mashup的特殊词快捷检索及检索建议[J]. 图书情报工作, 2012, 56(17): 126-130.(Li Bangqun. Quick Search of Special Words and Search Suggestions Based on Mashup [J]. Library and Information Service, 2012, 56(17): 126-130.) [9] Duan Y X, Lei H. The Formal Definitions of Semantic Web Services and Satisfiability [J]. International Journal of Advancements in Computing Technology,2012,4(23): 327-335. [10] Lee M C. A Novel Sentence Similarity Measure for Semantic-based Expert Systems [J]. Expert Systems with Applications, 2011, 38(5):6392-6399. [11] 王蕊,冯登国,杨轶. 基于语义的恶意代码行为特征提取及检测方法[J]. 软件学报,2012,23(2):378-393.(Wang Rui,Feng Dengguo,Yang Yi. Semantics-based Malware Behavior Signature Extraction and Detection Method [J]. Journal of Software, 2012, 23(2):378-393.) [12] 刘兵.Web数据挖掘[M].北京:清华大学出版社,2011:113-119.(Liu Bing. Web Data Mining [M].Beijing: Tsinghua University Press, 2011:113-119.) |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|