Please wait a minute...
New Technology of Library and Information Service  2013, Vol. 29 Issue (9): 35-40    DOI: 10.11925/infotech.1003-3513.2013.09.06
Current Issue | Archive | Adv Search |
Identifying Synonyms Based on Sentence Structure Analysis
Yu Juan1, Yin Jidong2, Fei Shu3
1. School of Public Administration and Policy, Fuzhou University, Fuzhou 350108, China;
2. Jiangxi Institute of Standardization, Nanchang 330029, China;
3. Library of Dalian Vocational & Technical College, Dalian 116035, China
Download: PDF(596 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  A new method of identifying synonyms is proposed for the purpose of reducing the deviation when calculating the semantic similarity between two different terms or phrases. The method first analyzes sentence structures of the concerned terms (or phrases), and then calculates the semantic similarity between two terms (or phrases) based on Tongyici Cilin (a Chinese thesaurus). This method weights each word in the concerned terms (or phrases) equally to reduce identifying errors made by gravity-centre-backward methods. Experiments show that the proposed method of identifying synonyms is accurate and has good potentials for text mining and semantic retrieval applications.
Key wordsIdentifying synonyms      Sentence structure analysis      Text mining     
Received: 08 May 2013      Published: 27 September 2013
:  TP182  

Cite this article:

Yu Juan, Yin Jidong, Fei Shu. Identifying Synonyms Based on Sentence Structure Analysis. New Technology of Library and Information Service, 2013, 29(9): 35-40.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2013.09.06     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2013/V29/I9/35

[1] 宋明亮. 汉语词汇字面相似度性原理与后控制词表动态维护研究[J]. 情报学报, 1996, 15(4):261-271.(Song Mingliang. Research on Principle of Literal Similarity Among Chinese Words and Maintaining Post-controlled Vocabulary[J]. Journal of the China Society for Scientific and Technical Information, 1996, 15(4): 261-271.)
[2] 王源,吴晓滨,涂从文,等. 后控规范的计算机处理[J]. 现代图书情报技术, 1993(2): 4-7. (Wang Yuan, Wu Xiaobin, Tu Congwen, et al. Computer Processing of Post-control Indexing[J]. New Technology of Library and Information Service, 1993(2): 4-7.)
[3] 刘群, 李素建. 基于《知网》的词汇语义相似度计算[EB/OL]. [2013-08-22]. http://www.docin.com/p-23739023.html. (Liu Qun, Li Sujian. Word Similarity Computing Based on HowNet [EB/OL]. [2013-08-22].http://www.docin.com/p-23739023.html.)
[4] 朱毅华, 侯汉清, 沙印亭.计算机识别汉语同义词的两种算法比较和测评[J]. 中国图书馆学报, 2002, 28(4): 82-85. (Zhu Yihua, Hou Hanqing, Sha Yinting. A Comparison of Two Algorithms for Computer Recognition of Chinese Synonyms[J].Journal of Library Science in China, 2002, 28(4): 82-85.)
[5] 王兰成, 李超. 改进的中文同义词相似匹配方法[J]. 中国图书馆学报, 2005,31(3): 61-64.(Wang Lancheng, Li Chao. An Improved Chinese Synonym Similarity Matching Method[J]. Journal of Library Science in China, 2005,31(3): 61-64.)
[6] 余刚, 裴仰军, 朱征宇, 等. 基于词汇语义计算的文本相似度研究[J]. 计算机工程与设计, 2006, 27(2): 241-244.(Yu Gang, Pei Yangjun, Zhu Zhengyu, et al. Research of Text Similarity Based on Word Similarity Computing[J]. Computer Engineering and Design, 2006, 27(2): 241-244.)
[7] 穗志方, 俞士汶. 主题概念规范化研究中的自然语言处理策略[EB/OL]. [2013-08-22].http://icl.pku.edu.cn/icl_tr/collected_papers/chinese/collection-3/24-szf2.htm. (Sui Zhifang, Yu Shiwen. Natural Language Processing Strategy in the Standardization of Theme Concepts[EB/OL]. [2013-08-22].http://icl.pku.edu.cn/icl_tr/collected_papers/chinese/collection-3/24-szf2.htm.)
[8] 田久乐, 赵蔚. 基于同义词词林的词语相似度计算方法[J]. 吉林大学学报:信息科学版, 2010, 28(6): 602-608.(Tian Jiule, Zhao Wei. Words Similarity Algorithm Based on Tongyici Cilin in Semantic Web Adaptive Learning System[J]. Journal of Jilin University:Information Science Edition, 2010, 28(6): 602-608.)
[9] 于娟, 党延忠. 结合词性分析与串频统计的词语提取方法[J]. 系统工程理论与实践, 2010, 30(1): 105-111.(Yu Juan, Dang Yanzhong. Chinese Term Extraction Based on POS Analysis & String Frequency [J]. Systems Engineering—Theory & Practice, 2010, 30(1): 105-111.)
[10] 哈尔滨工业大学社会计算与信息检索研究中心. 哈工大停用词表 [EB/OL].[2013-05-30]. http://ir.hit.edu.cn/. (Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology. StopWords List[EB/OL].[2013-05-30]. http://ir.hit.edu.cn/.)
[11] 张华平, 刘群. 基于N-最短路径方法的中文词语粗分模型[J]. 中文信息学报, 2002, 16(5): 1-7. (Zhang Huaping, Liu Qun. Model of Chinese Words Rough Segmentation Based on N-Shortest-Paths Method[J]. Journal of Chinese Information Processing, 2002, 16(5): 1-7.)
[12] 刘群, 张华平, 俞鸿魁, 等. 基于层叠隐马模型的汉语词法分析[J]. 计算机研究与发展, 2004, 41(8): 1421-1429. (Liu Qun, Zhang Huaping, Yu Hongkui, et al. Chinese Lexical Analysis Using Cascaded Hidden Markov Model[J]. Journal of Computer Research and Development, 2004, 41(8): 1421-1429.)
[13] 张艳. 汉语句法分析的理论方法的研究及其应用[D]. 北京:中国科学院自动化研究所, 2003. (Zhang Yan. Research and Its Application of Chinese Syntactic Analysis Theoretical Methods[D]. Beijing: Institute of Automation,Chinese Academy of Sciences, 2003.)
[14] Liu T,Ma J,Li S.Building a Dependency Treebank for Improving Chinese Parser[J]. Journal of Chinese Language and Computing, 2006,16(4): 207-224.
[15] 哈尔滨工业大学社会计算与信息检索研究中心. 中文依存句法分析[EB/OL].[2013-01-16]. http://ir.hit.edu.cn/. (Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology. Chinese Dependency Parser[EB/OL]. [2013-01-16]. http://ir.hit.edu.cn/.)
[1] Yanan Yang,Wenhui Zhao,Jian Zhang,Shen Tan,Beibei Zhang. Visualizing Policy Texts Based on Multi-View Collaboration[J]. 数据分析与知识发现, 2019, 3(6): 30-41.
[2] Mengji Zhang,Wanyu Du,Nan Zheng. Predicting Stock Trends Based on News Events[J]. 数据分析与知识发现, 2019, 3(5): 11-18.
[3] Ning Zhang,Lemin Yin,Lifeng He. Impacts of “Poster-Follower” Sentiment on Stock Market Performance[J]. 数据分析与知识发现, 2018, 2(6): 1-12.
[4] Xinyue Fan,Lei Cui. Using Text Mining to Discover Drug Side Effects: Case Study of PubMed[J]. 数据分析与知识发现, 2018, 2(3): 79-86.
[5] Qiangbing Wang,Chengzhi Zhang. Constructing Users Profiles with Content and Gesture Behaviors[J]. 数据分析与知识发现, 2017, 1(2): 80-86.
[6] Xiufang Xie,Xiaolin Zhang. Integrated Analysis and Visualization of Sci-Tech Roadmaps: Case Study of Renewable Energy[J]. 数据分析与知识发现, 2017, 1(1): 16-25.
[7] Yao Zhaoxu,Ma Jing. Extracting Topic and Opinion from Microblog Posts with New Algorithm[J]. 现代图书情报技术, 2016, 32(7-8): 78-86.
[8] Lan Qiujun,Liu Wenxing,Li Weikang,Hu Xingye. Sentiment Analysis of Financial Forum Textual Message[J]. 现代图书情报技术, 2016, 32(4): 64-71.
[9] Qiang Bi, Jian Liu, Yulai Bao. A New Text Clustering Method Based on Semantic Similarity[J]. 数据分析与知识发现, 2016, 32(12): 9-16.
[10] Lin Yuanyuan,Zhan Hongfei,Yu Junhe,Li Changjiang,Zhang Fan. Using Product Reviews to Analyze Sentiment Fluctuation of Consumer[J]. 现代图书情报技术, 2016, 32(11): 44-53.
[11] Zhao Dongxiao,Wang Xiaoyue,Bai Rujiang,Liu Ziqiang. Semantic Text Mining Methodologies for Intelligence Analysis[J]. 现代图书情报技术, 2016, 32(10): 13-24.
[12] Sui Mingshuang,Cui Lei. Extracting Chemical and Disease Named Entities with Multiple-Feature CRF Model[J]. 现代图书情报技术, 2016, 32(10): 91-97.
[13] Ruyi Yang,Dongsu Liu,Hui Li. An Improved Topic Model Integrating Extra-Features[J]. 现代图书情报技术, 2016, 32(1): 48-54.
[14] Wang Ying, Wu Zhenxin, Xie Jing. Review on Semantic Retrieval System for Scientific Literature[J]. 现代图书情报技术, 2015, 31(5): 1-7.
[15] Hao Mei, Yang Xiaoyuan. Credibility Research on Chinese Online Customer Reviews[J]. 现代图书情报技术, 2015, 31(2): 55-63.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn