|
|
Acquisition of Synonym from Patent Query Logs |
Gu Wei1, Li Chaofan1, Wang Hongjun2, Xiao Shibin3, Shi Shuicai3 |
1. The Patent Office of the State Intellectual Property Office of the P.R.C, Beijing 100088, China;
2. Beijing TRS Information Technology Co., Ltd., Beijing 100101, China;
3. TRS Software Opening Laboratory, Beijing Information Science & Technology University, Beijing 100101, China |
|
|
Abstract [Objective] This paper researches on the acquisition of synonym from patent query logs. [Methods] Propose a method based on the analysis of user behavior. Use logic expression parser to generate candidate synonym pairs, combine features such as pinyin, Chinese character pattern, abbreviation, traditional Chinese and simplified style to generate a synonym dictionary. [Results] Experiment results show that precision rate reaches 74.5%. This method generates 17 495 synonym pairs and the scale of dictionary exceeds some existing methods. [Limitations] This method is feasible for library and information retrieval with complex expressions. [Conclusions] This research provides a certain significant reference for log-based knowledge acquisition.
|
Received: 06 January 2014
Published: 17 March 2015
|
|
[1] Aureka [EB/OL]. [2014-06-18]. http://ip-science.thomsonreuters.com/m/pdfs/aureka_factsheet. pdf.
[2] TDA [EB/OL]. [2014-06-18]. http://ip.thomsonreuters.com/sites/default/files/m/1004788.pdf.
[3] PIAS [EB/OL]. [2014-06-18]. http://search.cnipr.com/topic!toAnalyse.action.
[4] PatentEX [EB/OL]. [2014-06-18]. http://www.daweisoft.com/Product/detail.aspx?ID=43.
[5] 翟东升, 刘晨, 欧阳轶慧. 专利信息获取分析系统设计与 实现[J]. 现代图书情报技术, 2009(5): 55-60. (Zhai Dongsheng, Liu Chen, Ouyang Yihui. The Design and Implementation of Patent Information Acquiring and Analysis System [J]. New Technology of Library and Information Service, 2009(5): 55-60.)
[6] 王源, 吴晓滨, 涂从文, 等. 后控规范的计算机处理[J]. 现 代图书情报技术, 1993(2): 4-7. (Wang Yuan, Wu Xiaobin, Tu Congwen, et al. Computer Processing of Post-Contral Indexing [J]. New Technology of Library and Information Service, 1993(2): 4-7.)
[7] 宋明亮. 汉语词汇字面相似性原理与后控制词表动态维护 研究[J]. 情报学报, 1996, 15(4): 261-271. (Song Mingliang. Research on Principle of Literal Similarity Among Chinese Words and Maintaining Post-Controlled Vocabulary [J]. Journal of the China Society for Scientific and Technical Information, 1996, 15(4): 261-271.)
[8] 朱毅华. 智能搜索引擎中的同义词识别算法研究[D]. 南 京: 南京农业大学, 2001. (Zhu Yihua. Automatic Recognition of Synonym in Construction of Intelligent Search Engine [D]. Nanjing: Nanjing Agricultural University, 2001.)
[9] Agirre E, Rigau G. A Proposal for Word Sense Disambiguation Using Conceptual Distance [C]. In: Proceedings of the 1st Conference on Recent Advances in NLP, Tzigov Chark, Bulgaria. 1995: 16-22.
[10] 刘群, 李素建. 基于《知网》的词汇语义相似度计算[J]. 中 文计算语言学及中文语言处理, 2002, 7(2): 59-76. (Liu Qun, Li Sujian. Word Similarity Computing Based on How-net [J]. Computational Linguistics and Chinese Language Processing, 2002, 7(2): 59-76.)
[11] Chen H, Lynch K J. Automatic Construction of Networks of Concepts Characterizing Document Database [J]. IEEE Transactions on Systems, Man and Cybernetics, 1992, 22(5): 885-902.
[12] Grefenstette G. Automatic Thesaurus Generation from Raw Text Using Knowledge-Poor Techniques [C]. In: Proceedings of the 9th Annual Conference of the UW Centre for the New OED and Text Research. 1993.
[13] Turney P D.Mining the Web for Synonyms: PMI-IR Versus LSA on TOEFL [C]. In: Proceedings of the 12th European Conference on Machine Learning, Freiburg, Germany. 2001: 491-502.
[14] Higgins D.Which Statistic Reflect Semantics? Rethinking Synonymy and Word Similarity [C]. In: Proceedings of International Conference on Linguistic Evidence. 2004: 265-284.
[15] Wei X, Peng F, Tseng H, et al. Context Sensitive Synonym Discovery for Web Search Queries [C]. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. New York, USA: ACM, 2009: 1585-1588.
[16] The Lex & Yacc Page [EB/OL]. [2014-06-18]. http://dinosaur.compilertools.net/. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|