Please wait a minute...
New Technology of Library and Information Service  2015, Vol. 31 Issue (6): 64-70    DOI: 10.11925/infotech.1003-3513.2015.06.10
Current Issue | Archive | Adv Search |
A Noise Cleaning Method for Synonym Extraction Results
Liu Wei, Wang Xing, Song Peiyan
Institute of Scientific & Technical Information of China, Beijing 100038, China
Export: BibTeX | EndNote (RIS)      

[Objective] There are lots of noises in synonym extraction results, and the noises would hurt the availability of extraction results. [Methods] This paper proposes a noise cleaning solution based on synonym graph. The proposed method firstly transforms synonym extraction results into an undirected synonym graph, and then detects the noises in the graph. The method is improved by incorporating the distribution similarity. [Results] The terms randomly selected from the technique field are used in the experiments, and the experiments show that this method can remove noises from the synonym extraction results to some extend. [Limitations] Only part of noises is cleaned, hence the accuracy of detecting noises needs be increased by improving the methods. [Conclusions] This is a feasible approach to clean the noises in the synonym extraction results, which is worth further study.

Key wordsSynonym      Information extraction      Noise cleaning      Synonym relation graph     
Received: 18 November 2014      Published: 08 July 2015
:  TP18  

Cite this article:

Liu Wei, Wang Xing, Song Peiyan. A Noise Cleaning Method for Synonym Extraction Results. New Technology of Library and Information Service, 2015, 31(6): 64-70.

URL:     OR

[1] 同义关系抽取结果评测 [EB/OL]. [2014-12-29]. http://tcci.语义关系评测结果. pdf. (Synonym Extraction Results Evaluation [EB/OL]. [2014- 12-29].语义关系评测结果.pdf.)
[2] Pantel P, Lin D. Discovering Word Senses from Text [C]. In: Proceedings of the 8th SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'02). New York: ACM, 2002: 613-619.
[3] Cheng T, Lauw H W, Paparizos S. Entity Synonyms for Structured Web Search [J]. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(10): 1862-1875.
[4] Berry M W, Castellanos M. Survey of Text Mining II [M]. London: Springer, 2008: 25-44.
[5] Bøhn C, Nørvāg K. Extracting Named Entities and Synonyms from Wikipedia [C]. In: Proceedings of the 24th IEEE International Conference on Advanced Information Networking and Applications (AINA'10). IEEE Computer Society, 2010: 1300-1307.
[6] 陆勇, 侯汉清. 基于模式匹配的汉语同义词自动识别[J]. 情报学报, 2006, 25(6): 720-724. (Lu Yong, Hou Hanqing. Automatic Recognition of Chinese Synonyms Based on Pattern Matching Algorithm [J]. Journal of the China Society for Scientific and Technical Information, 2006, 25(6): 720-724.)
[7] 于娟, 尹积栋, 费庶. 基于句法结构分析的同义词识别方法研究[J]. 现代图书情报技术, 2013(9): 35-40. (Yu Juan, Yin Jidong, Fei Shu. Identifying Synonyms Based on Sentence Structure Analysis [J]. New Technology of Library and Information Service, 2013 (9): 35-40.)
[8] Hagiwara M, Ogawa Y, Toyama K. Supervised Synonym Acquisition Using Distributional Features and Syntactic Patterns [J]. Information and Media Technologies, 2009, 4(2): 558-582.
[9] Kaji N, Kitsuregawa M. Using Hidden Markov Random Fields to Combine Distributional and Pattern-based Word Clustering [C]. In: Proceedings of the 22nd International Conference on Computational Linguistics, Manchester, UK. Stroudsburg: Association for Computational Linguistics Press, 2008: 401-408.
[10] 陆勇, 章成志, 侯汉清. 基于百科资源的多策略中文同义词自动抽取研究[J]. 中国图书馆学报, 2010, 36(1): 56-62. (Lu Yong, Zhang Chengzhi, Hou Hanqing. Using Multiple Hybrid Strategies to Extract Chinese Synonyms from Encyclopedia Resources [J]. Journal of Library Science in China, 2010, 36(1): 56-62.)
[11] 刘伟, 黄小江, 万小军, 等. 互联网环境下的英文同义术语自动发现研究与系统实现[J]. 图书情报工作, 2012, 56(22): 26-31. (Liu Wei, Huang Xiaojiang, Wan Xiaojun, et al. Study on Automatic English Synonym Terms Discovery from Web and the System Implementation [J]. Library and Information Service, 2012, 56(22): 26-31.)
[12] 李晓瑛, 李丹亚, 钱庆, 等. 面向知识组织系统整合的英文同义关系自动发现算法研究[J]. 现代图书情报技术, 2014(5): 26-32. (Li Xiaoying, Li Danya, Qian Qing, et al. Research on Automatic Algorithm of Finding English Synonymous Relations for Knowledge Organization System Integration [J]. New Technology of Library and Information Service, 2014(5): 26-32.)

[1] Tan Ying, Tang Yifei. Extracting Citation Contents with Coreference Resolution[J]. 数据分析与知识发现, 2021, 5(8): 25-33.
[2] Ji Youshu, Wang Dongbo, Huang Shuiqing. Automatically Extracting Ancient Chinese Synonyms with Word Alignment——Case Study of Pre-Four-History Corpus[J]. 数据分析与知识发现, 2021, 5(11): 135-144.
[3] Wang Yi,Shen Zhe,Yao Yifan,Cheng Ying. Domain-Specific Event Graph Construction Methods:A Review[J]. 数据分析与知识发现, 2020, 4(10): 1-13.
[4] Tao Yue,Yu Li,Zhang Runjie. Active Learning Strategies for Extracting Phrase-Level Topics from Scientific Literature[J]. 数据分析与知识发现, 2020, 4(10): 134-143.
[5] Haixia Sun,Panpan Deng,Jiao Li,Liu Shen,Qing Qian. Automatic Concept Update Strategy Towards Heterogeneous Terminology Integration[J]. 数据分析与知识发现, 2020, 4(1): 121-130.
[6] Zhiqiang Liu,Yuncheng Du,Shuicai Shi. Extraction of Key Information in Web News Based on Improved Hidden Markov Model[J]. 数据分析与知识发现, 2019, 3(3): 120-128.
[7] Chengzhi Zhang,Zheng Li. Extracting Sentences of Research Originality from Full Text Academic Articles[J]. 数据分析与知识发现, 2019, 3(10): 12-18.
[8] Mu Dongmei,Jin Shan,Ju Yuanhong. Finding Association Between Diseases and Genes from Literature Abstracts[J]. 数据分析与知识发现, 2018, 2(8): 98-106.
[9] Yufeng Duan,Sisi Huang. Information Extraction from Chinese Plant Species Diversity Description Text[J]. 现代图书情报技术, 2016, 32(1): 87-96.
[10] Jiang Chuntao. Automatic Annotation of Bibliographical References in Chinese Patent Documents[J]. 现代图书情报技术, 2015, 31(10): 81-87.
[11] Li Xiaoying, Li Danya, Qian Qing, Sun Haixia, Li Junlian, Hu Tiejun. Research on Automatic Algorithm of Finding English Synonymous Relations for Knowledge Organization System Integration[J]. 现代图书情报技术, 2014, 30(5): 26-32.
[12] Yin Xihong, Qiao Xiaodong, Zhang Yunliang. Chinese Synonyms Discovery Based on the Term Definition[J]. 现代图书情报技术, 2014, 30(4): 41-47.
[13] Li Xiangdong, Huo Yayong, Huang Li. Study of Book Pages Automatic Identification and Bibliographic Information Extraction[J]. 现代图书情报技术, 2014, 30(4): 71-77.
[14] Liu Yajing, Wang Yanxi, Hao Dan, Zhou Jinhui. Study on the Methods of Institutional Repository Supporting Research Services[J]. 现代图书情报技术, 2014, 30(3): 1-7.
[15] Yu Juan, Yin Jidong, Fei Shu. Identifying Synonyms Based on Sentence Structure Analysis[J]. 现代图书情报技术, 2013, 29(9): 35-40.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938