Please wait a minute...
New Technology of Library and Information Service  2015, Vol. 31 Issue (2): 24-30    DOI: 10.11925/infotech.1003-3513.2015.02.04
Current Issue | Archive | Adv Search |
Acquisition of Synonym from Patent Query Logs
Gu Wei1, Li Chaofan1, Wang Hongjun2, Xiao Shibin3, Shi Shuicai3
1. The Patent Office of the State Intellectual Property Office of the P.R.C, Beijing 100088, China;
2. Beijing TRS Information Technology Co., Ltd., Beijing 100101, China;
3. TRS Software Opening Laboratory, Beijing Information Science & Technology University, Beijing 100101, China
Download: PDF(511 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper researches on the acquisition of synonym from patent query logs. [Methods] Propose a method based on the analysis of user behavior. Use logic expression parser to generate candidate synonym pairs, combine features such as pinyin, Chinese character pattern, abbreviation, traditional Chinese and simplified style to generate a synonym dictionary. [Results] Experiment results show that precision rate reaches 74.5%. This method generates 17 495 synonym pairs and the scale of dictionary exceeds some existing methods. [Limitations] This method is feasible for library and information retrieval with complex expressions. [Conclusions] This research provides a certain significant reference for log-based knowledge acquisition.

Key wordsPatent query log      Log mining      Semantic knowledge acquisition      Dictionary construction     
Received: 06 January 2014      Published: 17 March 2015
:  G353  
  TP391  

Cite this article:

Gu Wei, Li Chaofan, Wang Hongjun, Xiao Shibin, Shi Shuicai. Acquisition of Synonym from Patent Query Logs. New Technology of Library and Information Service, 2015, 31(2): 24-30.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2015.02.04     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2015/V31/I2/24

[1] Aureka [EB/OL]. [2014-06-18]. http://ip-science.thomsonreuters.com/m/pdfs/aureka_factsheet. pdf.
[2] TDA [EB/OL]. [2014-06-18]. http://ip.thomsonreuters.com/sites/default/files/m/1004788.pdf.
[3] PIAS [EB/OL]. [2014-06-18]. http://search.cnipr.com/topic!toAnalyse.action.
[4] PatentEX [EB/OL]. [2014-06-18]. http://www.daweisoft.com/Product/detail.aspx?ID=43.
[5] 翟东升, 刘晨, 欧阳轶慧. 专利信息获取分析系统设计与 实现[J]. 现代图书情报技术, 2009(5): 55-60. (Zhai Dongsheng, Liu Chen, Ouyang Yihui. The Design and Implementation of Patent Information Acquiring and Analysis System [J]. New Technology of Library and Information Service, 2009(5): 55-60.)
[6] 王源, 吴晓滨, 涂从文, 等. 后控规范的计算机处理[J]. 现 代图书情报技术, 1993(2): 4-7. (Wang Yuan, Wu Xiaobin, Tu Congwen, et al. Computer Processing of Post-Contral Indexing [J]. New Technology of Library and Information Service, 1993(2): 4-7.)
[7] 宋明亮. 汉语词汇字面相似性原理与后控制词表动态维护 研究[J]. 情报学报, 1996, 15(4): 261-271. (Song Mingliang. Research on Principle of Literal Similarity Among Chinese Words and Maintaining Post-Controlled Vocabulary [J]. Journal of the China Society for Scientific and Technical Information, 1996, 15(4): 261-271.)
[8] 朱毅华. 智能搜索引擎中的同义词识别算法研究[D]. 南 京: 南京农业大学, 2001. (Zhu Yihua. Automatic Recognition of Synonym in Construction of Intelligent Search Engine [D]. Nanjing: Nanjing Agricultural University, 2001.)
[9] Agirre E, Rigau G. A Proposal for Word Sense Disambiguation Using Conceptual Distance [C]. In: Proceedings of the 1st Conference on Recent Advances in NLP, Tzigov Chark, Bulgaria. 1995: 16-22.
[10] 刘群, 李素建. 基于《知网》的词汇语义相似度计算[J]. 中 文计算语言学及中文语言处理, 2002, 7(2): 59-76. (Liu Qun, Li Sujian. Word Similarity Computing Based on How-net [J]. Computational Linguistics and Chinese Language Processing, 2002, 7(2): 59-76.)
[11] Chen H, Lynch K J. Automatic Construction of Networks of Concepts Characterizing Document Database [J]. IEEE Transactions on Systems, Man and Cybernetics, 1992, 22(5): 885-902.
[12] Grefenstette G. Automatic Thesaurus Generation from Raw Text Using Knowledge-Poor Techniques [C]. In: Proceedings of the 9th Annual Conference of the UW Centre for the New OED and Text Research. 1993.
[13] Turney P D.Mining the Web for Synonyms: PMI-IR Versus LSA on TOEFL [C]. In: Proceedings of the 12th European Conference on Machine Learning, Freiburg, Germany. 2001: 491-502.
[14] Higgins D.Which Statistic Reflect Semantics? Rethinking Synonymy and Word Similarity [C]. In: Proceedings of International Conference on Linguistic Evidence. 2004: 265-284.
[15] Wei X, Peng F, Tseng H, et al. Context Sensitive Synonym Discovery for Web Search Queries [C]. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. New York, USA: ACM, 2009: 1585-1588.
[16] The Lex & Yacc Page [EB/OL]. [2014-06-18]. http://dinosaur.compilertools.net/.

[1] Tong Guoping, Sun Jianjun. User Behavior Analysis Based on Search Engine Log[J]. 现代图书情报技术, 2015, 31(7-8): 80-88.
[2] Qiang Shaohua, Wu Peng. The Research of Spatial Measure of Users' Mental Model of Website Category from the View of Regional Differences[J]. 现代图书情报技术, 2015, 31(11): 68-74.
[3] Wang Jimin, Lilei Mingzi, Zhang Peng. Co-authorship Network Analysis in the Research Field of Search Engine’s Log Mining[J]. 现代图书情报技术, 2011, 27(4): 58-63.
[4] Zhu Ling, Nie Hua. Research of User’s Searching Behaviour of Library Resource Discovery Service by Log Mining[J]. 现代图书情报技术, 2011, 27(12): 74-78.
[5] Lai Maosheng,Qu Peng. The POS &|Mining Study on Search Engine’s Query Log[J]. 现代图书情报技术, 2009, 25(4): 50-56.
[6] Lai Maosheng,Qu Peng. Study on the Characters of Language Used in Web Searching[J]. 现代图书情报技术, 2008, 24(7): 47-53.
[7] Wang Yuanyuan,Zhong Yongheng . The Architecture of Web Log Mining System Based on SQL Server 2005[J]. 现代图书情报技术, 2006, 1(5): 58-61.
[8] Liu Shengguo. Research on Data Preprocessing Method in Web Log Mining[J]. 现代图书情报技术, 2004, 20(12): 55-57.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn