Please wait a minute...
New Technology of Library and Information Service  2015, Vol. 31 Issue (6): 64-70    DOI: 10.11925/infotech.1003-3513.2015.06.10
Current Issue | Archive | Adv Search |
A Noise Cleaning Method for Synonym Extraction Results
Liu Wei, Wang Xing, Song Peiyan
Institute of Scientific & Technical Information of China, Beijing 100038, China
Download: PDF(1270 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] There are lots of noises in synonym extraction results, and the noises would hurt the availability of extraction results. [Methods] This paper proposes a noise cleaning solution based on synonym graph. The proposed method firstly transforms synonym extraction results into an undirected synonym graph, and then detects the noises in the graph. The method is improved by incorporating the distribution similarity. [Results] The terms randomly selected from the technique field are used in the experiments, and the experiments show that this method can remove noises from the synonym extraction results to some extend. [Limitations] Only part of noises is cleaned, hence the accuracy of detecting noises needs be increased by improving the methods. [Conclusions] This is a feasible approach to clean the noises in the synonym extraction results, which is worth further study.

Key wordsSynonym      Information extraction      Noise cleaning      Synonym relation graph     
Received: 18 November 2014      Published: 08 July 2015
:  TP18  

Cite this article:

Liu Wei, Wang Xing, Song Peiyan. A Noise Cleaning Method for Synonym Extraction Results. New Technology of Library and Information Service, 2015, 31(6): 64-70.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2015.06.10     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2015/V31/I6/64

[1] 同义关系抽取结果评测 [EB/OL]. [2014-12-29]. http://tcci. ccf.org.cn/conference/2012/dldoc/2012语义关系评测结果. pdf. (Synonym Extraction Results Evaluation [EB/OL]. [2014- 12-29]. http://tcci.ccf.org.cn/conference/2012/dldoc/2012语义关系评测结果.pdf.)
[2] Pantel P, Lin D. Discovering Word Senses from Text [C]. In: Proceedings of the 8th SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'02). New York: ACM, 2002: 613-619.
[3] Cheng T, Lauw H W, Paparizos S. Entity Synonyms for Structured Web Search [J]. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(10): 1862-1875.
[4] Berry M W, Castellanos M. Survey of Text Mining II [M]. London: Springer, 2008: 25-44.
[5] Bøhn C, Nørvāg K. Extracting Named Entities and Synonyms from Wikipedia [C]. In: Proceedings of the 24th IEEE International Conference on Advanced Information Networking and Applications (AINA'10). IEEE Computer Society, 2010: 1300-1307.
[6] 陆勇, 侯汉清. 基于模式匹配的汉语同义词自动识别[J]. 情报学报, 2006, 25(6): 720-724. (Lu Yong, Hou Hanqing. Automatic Recognition of Chinese Synonyms Based on Pattern Matching Algorithm [J]. Journal of the China Society for Scientific and Technical Information, 2006, 25(6): 720-724.)
[7] 于娟, 尹积栋, 费庶. 基于句法结构分析的同义词识别方法研究[J]. 现代图书情报技术, 2013(9): 35-40. (Yu Juan, Yin Jidong, Fei Shu. Identifying Synonyms Based on Sentence Structure Analysis [J]. New Technology of Library and Information Service, 2013 (9): 35-40.)
[8] Hagiwara M, Ogawa Y, Toyama K. Supervised Synonym Acquisition Using Distributional Features and Syntactic Patterns [J]. Information and Media Technologies, 2009, 4(2): 558-582.
[9] Kaji N, Kitsuregawa M. Using Hidden Markov Random Fields to Combine Distributional and Pattern-based Word Clustering [C]. In: Proceedings of the 22nd International Conference on Computational Linguistics, Manchester, UK. Stroudsburg: Association for Computational Linguistics Press, 2008: 401-408.
[10] 陆勇, 章成志, 侯汉清. 基于百科资源的多策略中文同义词自动抽取研究[J]. 中国图书馆学报, 2010, 36(1): 56-62. (Lu Yong, Zhang Chengzhi, Hou Hanqing. Using Multiple Hybrid Strategies to Extract Chinese Synonyms from Encyclopedia Resources [J]. Journal of Library Science in China, 2010, 36(1): 56-62.)
[11] 刘伟, 黄小江, 万小军, 等. 互联网环境下的英文同义术语自动发现研究与系统实现[J]. 图书情报工作, 2012, 56(22): 26-31. (Liu Wei, Huang Xiaojiang, Wan Xiaojun, et al. Study on Automatic English Synonym Terms Discovery from Web and the System Implementation [J]. Library and Information Service, 2012, 56(22): 26-31.)
[12] 李晓瑛, 李丹亚, 钱庆, 等. 面向知识组织系统整合的英文同义关系自动发现算法研究[J]. 现代图书情报技术, 2014(5): 26-32. (Li Xiaoying, Li Danya, Qian Qing, et al. Research on Automatic Algorithm of Finding English Synonymous Relations for Knowledge Organization System Integration [J]. New Technology of Library and Information Service, 2014(5): 26-32.)

[1] Zhiqiang Liu,Yuncheng Du,Shuicai Shi. Extraction of Key Information in Web News Based on Improved Hidden Markov Model[J]. 数据分析与知识发现, 2019, 3(3): 120-128.
[2] Dongmei Mu,Shan Jin,Yuanhong Ju. Finding Association Between Diseases and Genes from Literature Abstracts[J]. 数据分析与知识发现, 2018, 2(8): 98-106.
[3] Yufeng Duan,Sisi Huang. Information Extraction from Chinese Plant Species Diversity Description Text[J]. 现代图书情报技术, 2016, 32(1): 87-96.
[4] Jiang Chuntao. Automatic Annotation of Bibliographical References in Chinese Patent Documents[J]. 现代图书情报技术, 2015, 31(10): 81-87.
[5] Li Xiaoying, Li Danya, Qian Qing, Sun Haixia, Li Junlian, Hu Tiejun. Research on Automatic Algorithm of Finding English Synonymous Relations for Knowledge Organization System Integration[J]. 现代图书情报技术, 2014, 30(5): 26-32.
[6] Yin Xihong, Qiao Xiaodong, Zhang Yunliang. Chinese Synonyms Discovery Based on the Term Definition[J]. 现代图书情报技术, 2014, 30(4): 41-47.
[7] Li Xiangdong, Huo Yayong, Huang Li. Study of Book Pages Automatic Identification and Bibliographic Information Extraction[J]. 现代图书情报技术, 2014, 30(4): 71-77.
[8] Liu Yajing, Wang Yanxi, Hao Dan, Zhou Jinhui. Study on the Methods of Institutional Repository Supporting Research Services[J]. 现代图书情报技术, 2014, 30(3): 1-7.
[9] Yu Juan, Yin Jidong, Fei Shu. Identifying Synonyms Based on Sentence Structure Analysis[J]. 现代图书情报技术, 2013, 29(9): 35-40.
[10] Zhang Han, Liu Shuangmei. Comparative Analysis of Centrality Indices in Extracting Concepts from Semantic Predication Network——Based on Disease Treatment Research[J]. 现代图书情报技术, 2013, (6): 30-35.
[11] Song Peiyan, Li Jingjing, Zhao Xing. Recommended Method for Cross-language Term Synonymous Relationship and Its Empirical Research[J]. 现代图书情报技术, 2013, (5): 40-45.
[12] Huang Xun, You Hongliang, Yu Yang. A Review of Relation Extraction[J]. 现代图书情报技术, 2013, 29(11): 30-39.
[13] He Lin, He Juan, Shen Gengyu, Yang Bo, Huang Shuiqing. An Approach to Discovery of Reference Control Gene for qRT-PCR Experiment Based on Texting Mining[J]. 现代图书情报技术, 2012, 28(7): 109-114.
[14] Teng Guangqing, Bi Qiang, Gao Ya. A Study on Knowledge Organization of Folksonomy Based on Concept Lattice: Analysis on Structural Characteristics of Related Tags[J]. 现代图书情报技术, 2012, 28(6): 22-28.
[15] Gao Qiang, You Hongliang. Study on Named Entity Recognition Based on Cascaded Model for Field of Defense[J]. 现代图书情报技术, 2012, (11): 47-52.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn