|
|
A Noise Cleaning Method for Synonym Extraction Results |
Liu Wei, Wang Xing, Song Peiyan |
Institute of Scientific & Technical Information of China, Beijing 100038, China |
|
|
Abstract [Objective] There are lots of noises in synonym extraction results, and the noises would hurt the availability of extraction results. [Methods] This paper proposes a noise cleaning solution based on synonym graph. The proposed method firstly transforms synonym extraction results into an undirected synonym graph, and then detects the noises in the graph. The method is improved by incorporating the distribution similarity. [Results] The terms randomly selected from the technique field are used in the experiments, and the experiments show that this method can remove noises from the synonym extraction results to some extend. [Limitations] Only part of noises is cleaned, hence the accuracy of detecting noises needs be increased by improving the methods. [Conclusions] This is a feasible approach to clean the noises in the synonym extraction results, which is worth further study.
|
Received: 18 November 2014
Published: 08 July 2015
|
|
[1] 同义关系抽取结果评测 [EB/OL]. [2014-12-29]. http://tcci. ccf.org.cn/conference/2012/dldoc/2012语义关系评测结果. pdf. (Synonym Extraction Results Evaluation [EB/OL]. [2014- 12-29]. http://tcci.ccf.org.cn/conference/2012/dldoc/2012语义关系评测结果.pdf.)
[2] Pantel P, Lin D. Discovering Word Senses from Text [C]. In: Proceedings of the 8th SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'02). New York: ACM, 2002: 613-619.
[3] Cheng T, Lauw H W, Paparizos S. Entity Synonyms for Structured Web Search [J]. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(10): 1862-1875.
[4] Berry M W, Castellanos M. Survey of Text Mining II [M]. London: Springer, 2008: 25-44.
[5] Bøhn C, Nørvāg K. Extracting Named Entities and Synonyms from Wikipedia [C]. In: Proceedings of the 24th IEEE International Conference on Advanced Information Networking and Applications (AINA'10). IEEE Computer Society, 2010: 1300-1307.
[6] 陆勇, 侯汉清. 基于模式匹配的汉语同义词自动识别[J]. 情报学报, 2006, 25(6): 720-724. (Lu Yong, Hou Hanqing. Automatic Recognition of Chinese Synonyms Based on Pattern Matching Algorithm [J]. Journal of the China Society for Scientific and Technical Information, 2006, 25(6): 720-724.)
[7] 于娟, 尹积栋, 费庶. 基于句法结构分析的同义词识别方法研究[J]. 现代图书情报技术, 2013(9): 35-40. (Yu Juan, Yin Jidong, Fei Shu. Identifying Synonyms Based on Sentence Structure Analysis [J]. New Technology of Library and Information Service, 2013 (9): 35-40.)
[8] Hagiwara M, Ogawa Y, Toyama K. Supervised Synonym Acquisition Using Distributional Features and Syntactic Patterns [J]. Information and Media Technologies, 2009, 4(2): 558-582.
[9] Kaji N, Kitsuregawa M. Using Hidden Markov Random Fields to Combine Distributional and Pattern-based Word Clustering [C]. In: Proceedings of the 22nd International Conference on Computational Linguistics, Manchester, UK. Stroudsburg: Association for Computational Linguistics Press, 2008: 401-408.
[10] 陆勇, 章成志, 侯汉清. 基于百科资源的多策略中文同义词自动抽取研究[J]. 中国图书馆学报, 2010, 36(1): 56-62. (Lu Yong, Zhang Chengzhi, Hou Hanqing. Using Multiple Hybrid Strategies to Extract Chinese Synonyms from Encyclopedia Resources [J]. Journal of Library Science in China, 2010, 36(1): 56-62.)
[11] 刘伟, 黄小江, 万小军, 等. 互联网环境下的英文同义术语自动发现研究与系统实现[J]. 图书情报工作, 2012, 56(22): 26-31. (Liu Wei, Huang Xiaojiang, Wan Xiaojun, et al. Study on Automatic English Synonym Terms Discovery from Web and the System Implementation [J]. Library and Information Service, 2012, 56(22): 26-31.)
[12] 李晓瑛, 李丹亚, 钱庆, 等. 面向知识组织系统整合的英文同义关系自动发现算法研究[J]. 现代图书情报技术, 2014(5): 26-32. (Li Xiaoying, Li Danya, Qian Qing, et al. Research on Automatic Algorithm of Finding English Synonymous Relations for Knowledge Organization System Integration [J]. New Technology of Library and Information Service, 2014(5): 26-32.) |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|