|
|
Identifying Synonyms Based on Sentence Structure Analysis |
Yu Juan1, Yin Jidong2, Fei Shu3 |
1. School of Public Administration and Policy, Fuzhou University, Fuzhou 350108, China; 2. Jiangxi Institute of Standardization, Nanchang 330029, China; 3. Library of Dalian Vocational & Technical College, Dalian 116035, China |
|
|
Abstract A new method of identifying synonyms is proposed for the purpose of reducing the deviation when calculating the semantic similarity between two different terms or phrases. The method first analyzes sentence structures of the concerned terms (or phrases), and then calculates the semantic similarity between two terms (or phrases) based on Tongyici Cilin (a Chinese thesaurus). This method weights each word in the concerned terms (or phrases) equally to reduce identifying errors made by gravity-centre-backward methods. Experiments show that the proposed method of identifying synonyms is accurate and has good potentials for text mining and semantic retrieval applications.
|
Received: 08 May 2013
Published: 27 September 2013
|
|
[1] 宋明亮. 汉语词汇字面相似度性原理与后控制词表动态维护研究[J]. 情报学报, 1996, 15(4):261-271.(Song Mingliang. Research on Principle of Literal Similarity Among Chinese Words and Maintaining Post-controlled Vocabulary[J]. Journal of the China Society for Scientific and Technical Information, 1996, 15(4): 261-271.) [2] 王源,吴晓滨,涂从文,等. 后控规范的计算机处理[J]. 现代图书情报技术, 1993(2): 4-7. (Wang Yuan, Wu Xiaobin, Tu Congwen, et al. Computer Processing of Post-control Indexing[J]. New Technology of Library and Information Service, 1993(2): 4-7.) [3] 刘群, 李素建. 基于《知网》的词汇语义相似度计算[EB/OL]. [2013-08-22]. http://www.docin.com/p-23739023.html. (Liu Qun, Li Sujian. Word Similarity Computing Based on HowNet [EB/OL]. [2013-08-22].http://www.docin.com/p-23739023.html.) [4] 朱毅华, 侯汉清, 沙印亭.计算机识别汉语同义词的两种算法比较和测评[J]. 中国图书馆学报, 2002, 28(4): 82-85. (Zhu Yihua, Hou Hanqing, Sha Yinting. A Comparison of Two Algorithms for Computer Recognition of Chinese Synonyms[J].Journal of Library Science in China, 2002, 28(4): 82-85.) [5] 王兰成, 李超. 改进的中文同义词相似匹配方法[J]. 中国图书馆学报, 2005,31(3): 61-64.(Wang Lancheng, Li Chao. An Improved Chinese Synonym Similarity Matching Method[J]. Journal of Library Science in China, 2005,31(3): 61-64.) [6] 余刚, 裴仰军, 朱征宇, 等. 基于词汇语义计算的文本相似度研究[J]. 计算机工程与设计, 2006, 27(2): 241-244.(Yu Gang, Pei Yangjun, Zhu Zhengyu, et al. Research of Text Similarity Based on Word Similarity Computing[J]. Computer Engineering and Design, 2006, 27(2): 241-244.) [7] 穗志方, 俞士汶. 主题概念规范化研究中的自然语言处理策略[EB/OL]. [2013-08-22].http://icl.pku.edu.cn/icl_tr/collected_papers/chinese/collection-3/24-szf2.htm. (Sui Zhifang, Yu Shiwen. Natural Language Processing Strategy in the Standardization of Theme Concepts[EB/OL]. [2013-08-22].http://icl.pku.edu.cn/icl_tr/collected_papers/chinese/collection-3/24-szf2.htm.) [8] 田久乐, 赵蔚. 基于同义词词林的词语相似度计算方法[J]. 吉林大学学报:信息科学版, 2010, 28(6): 602-608.(Tian Jiule, Zhao Wei. Words Similarity Algorithm Based on Tongyici Cilin in Semantic Web Adaptive Learning System[J]. Journal of Jilin University:Information Science Edition, 2010, 28(6): 602-608.) [9] 于娟, 党延忠. 结合词性分析与串频统计的词语提取方法[J]. 系统工程理论与实践, 2010, 30(1): 105-111.(Yu Juan, Dang Yanzhong. Chinese Term Extraction Based on POS Analysis & String Frequency [J]. Systems Engineering—Theory & Practice, 2010, 30(1): 105-111.) [10] 哈尔滨工业大学社会计算与信息检索研究中心. 哈工大停用词表 [EB/OL].[2013-05-30]. http://ir.hit.edu.cn/. (Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology. StopWords List[EB/OL].[2013-05-30]. http://ir.hit.edu.cn/.) [11] 张华平, 刘群. 基于N-最短路径方法的中文词语粗分模型[J]. 中文信息学报, 2002, 16(5): 1-7. (Zhang Huaping, Liu Qun. Model of Chinese Words Rough Segmentation Based on N-Shortest-Paths Method[J]. Journal of Chinese Information Processing, 2002, 16(5): 1-7.) [12] 刘群, 张华平, 俞鸿魁, 等. 基于层叠隐马模型的汉语词法分析[J]. 计算机研究与发展, 2004, 41(8): 1421-1429. (Liu Qun, Zhang Huaping, Yu Hongkui, et al. Chinese Lexical Analysis Using Cascaded Hidden Markov Model[J]. Journal of Computer Research and Development, 2004, 41(8): 1421-1429.) [13] 张艳. 汉语句法分析的理论方法的研究及其应用[D]. 北京:中国科学院自动化研究所, 2003. (Zhang Yan. Research and Its Application of Chinese Syntactic Analysis Theoretical Methods[D]. Beijing: Institute of Automation,Chinese Academy of Sciences, 2003.) [14] Liu T,Ma J,Li S.Building a Dependency Treebank for Improving Chinese Parser[J]. Journal of Chinese Language and Computing, 2006,16(4): 207-224. [15] 哈尔滨工业大学社会计算与信息检索研究中心. 中文依存句法分析[EB/OL].[2013-01-16]. http://ir.hit.edu.cn/. (Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology. Chinese Dependency Parser[EB/OL]. [2013-01-16]. http://ir.hit.edu.cn/.) |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|