|
|
Identifying Terminology from Search Engine Query Logs |
Liu Tong,Ni Weijian(),Liu Mei |
College of Information Science and Engineering, Shandong University of Science and Technolgoy, Qingdao 266590, China |
|
|
Abstract [Objective] This study proposes a new approach to identify terminologies from search engine query logs for the purpose of improving traditional technology.[Methods]First, used the four-partite graph to re-present those query logs.Then,ranked the candidate terminologies with the help of manifold ranking algorithm. Those top ranked ones were domain-specified. [Results]We tested the proposed method with real search engine query logs and found the precision rates were about 20% higher than the standard approach. [Limitations] The coverage of those identified terminologies relies on the initial domain-specified queries manually chosen by the experts. [Conclusions]The proposed approach could build high quality domain thesaurus without pre-defined large domain corpus and annotations. Thus, the new method was more practical for real world issues.
|
Received: 13 August 2015
Published: 08 March 2016
|
[1] | 刘春燕, 安小米, 侯人华. 术语标准研制方法及在信息与文献领域中的应用[J]. 图书情报工作,2014,58(9):91-95. | [1] | (Liu Chunyan, An Xiaomi, Hou Renhua.Vocabulary Standard Development Methodology and Its Application in the Information and Documentation Fields[J]. Library and Information Service,2014,58(9):91-95.) | [2] | Caracciolo C, Stellato A,Morshed A, et al.TheAGROVOCLinked Dataset[J].Semantic Web, 2013, 4(3): 341-348. | [3] | Bodenreider O.The Unified Medical Language System (UMLS): Integrating Biomedical Terminology[J]. Nucleic Acids Research, 2004, 32(S1): D267-D270. | [4] | Bonin F, Dell’Orletta F, Venturi G, et al. A Contrastive Approach to Multi-word Term Extraction from Domain Corpora[C]. In: Proceedings of the 7th International Conference on Language Resources and Evaluation. 2010: 3222-3229. | [5] | 化柏林. 针对中文学术文献的情报方法术语抽取[J]. 现代图书情报技术, 2013(6):68-75. | [5] | (Hua Bolin.Extracting Information Method Term from Chinese Academic Literature[J]. New Technology of Library and Information Service, 2013(6):68-75.) | [6] | 何远标, 乐小虬, 张帆. 学术论文大纲中关键术语抽取方法研究[J]. 现代图书情报技术, 2014(3):73-79. | [6] | (He Yuanbiao, Le Xiaoqiu,Zhang Fan.Research on Keyphrase Extraction from Scholarly Article Outline[J]. New Technology of Library and Information Service, 2014(3):73-79.) | [7] | 曾文, 徐硕, 张运良,等. 科技文献术语的自动抽取技术研究与分析[J]. 现代图书情报技术, 2014(1):51-55. | [7] | (Zeng Wen,Xu Shuo,Zhang Yunliang,etal. The Research and Analysis on Automatic Extraction of Science and Technology Literature Terms[J].New Technology of Library and Information Service, 2014(1):51-55.) | [8] | Dorji T C, Atlam E-S, Tata S, et al.Extraction, Selection and Ranking of Field Association (FA) Terms from Domain-specific Corpora for Building a Comprehensive FA Terms Dictionary[J]. Knowledge and Information Systems, 2011, 27(1):141-161. | [9] | 屈鹏, 王惠临. 面向信息分析的专利术语抽取研究[J]. 图书情报工作, 2013, 57(1):130-135. | [9] | (Qu Peng, Wang Huilin.Patent Term Extraction for Information Analysis[J].Library and Information Service, 2013, 57(1):130-135.) | [10] | 谷俊, 王昊. 基于领域中文文本的术语抽取方法研究[J]. 现代图书情报技术, 2011(4):29-34. | [10] | (Gu Jun, Wang Hao.Study on Term Extraction on the Basis of Chinese Domain Texts[J]. New Technology of Library and Information Service, 2011(4):29-34.) | [11] | 闫兴龙, 刘奕群, 方奇,等. 基于网络资源与用户行为信息的领域术语提取[J]. 软件学报, 2013, 24(9):2089-2100. | [11] | (Yan Xinglong, Liu Yiqun, Fang Qi, et al.Domain-Specific Terms Extraction Based on Web Resource and User Behavior[J]. Journal of Software, 2013, 24(9):2089-2100.) | [12] | Jiang D, Pei J, Li H. Mining Search and Browse Logs for Web Search: ASurvey[J]. ACM Transactions on Intelligent Systems and Technology, 2013,4(4): Article No. 57. | [13] | 季培培, 鄢小燕, 岑咏华. 面向领域中文文本信息处理的术语识别与抽取研究综述[J].图书情报工作,2010, 54(16):124-129. | [13] | (Ji Peipei, Yan Xiaoyan, Cen Yonghua.A Survey of Term Recognition and Extraction for Domain-specific Chinese Text Information Processing[J]. Library and Information Service, 2010,54(16): 124-129.) | [14] | 宋培彦, 路青, 刘宁静. 一种从术语定义句中自动抽取知识单元的方法[J]. 情报杂志, 2014, 33(4):139-143. | [14] | (Song Peiyan, Lu Qing, Liu Ningjing.A New Method for Knowledge Unit Automatic Extraction Using Definitions of Terms[J]. Journal of Intelligence, 2014, 33(4):139-143.) | [15] | 熊李艳, 谭龙, 钟茂生. 基于有效词频的改进C-value 自动术语抽取方法[J]. 现代图书情报技术, 2013(9):54-59. | [15] | (Xiong Liyan, Tan Long, Zhong Maosheng.An Automatic Term Extraction System of Improved C-value Based on Effective Word Frequency[J]. New Technology of Library and Information Service, 2013(9): 54-59.) | [16] | Foo J, Merkel M.Using Machine Learning to Perform Automatic Term Recognition[C]. In:Proceedings of the LREC 2010 Workshop on Methods for Automatic Acquisition of Language Resources and Their Evaluation Methods, Malta.2010: 49-54. | [17] | Da Silva Conrado M, Pardo T, Rezende S O. A Machine Learning Approach to Automatic Term Extraction Using a Rich Feature Set[C]. In: Proceedings of NAACL HLT 2013 Student Research Workshop. 2013: 16-23. | [18] | Loukachevitch N V.Automatic Term Recognition Needs Multiple Evidence[C]. In: Proceedings of the 8th International Conference on Language Resources and Evaluation. 2012: 2401-2407. | [19] | Jiang D, Leung K W T, Yang L, et al. Query Suggestion with Diversification and Personalization[J]. Knowledge-Based Systems, 2015, 89: 553-568. | [20] | Rose D E, Levinson D.Understanding User Goals in Web Search[C]. In: Proceedings of the 13th International Conference on World Wide Web. ACM, 2004:13-19. | [21] | 翟海军, 郭嘉丰, 王小磊,等. 基于用户查询日志的命名实体挖掘[J]. 中文信息学报, 2010, 24(1): 71-76,116. | [21] | (Zhai Haijun, Guo Jiafeng, Wang Xiaolei, et al.Mining Named Entities from Query Logs[J]. Journal of Chinese Information Processing, 2010, 24(1): 71-76,116.) | [22] | Xu G, Yang S H, Li H.Named Entity Mining from Click-through Data Using Weakly Supervised LatentDirichletAllocation[C]. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2009:1365-1374. | [23] | Jain A, Pennacchiotti M.Domain-independent Entity Extraction from Web Search Query Logs[C]. In: Proceedings of the 20th International Conference Companion on World Wide Web. ACM, 2011:63-64. | [24] | Dalvi B, Xiong C, Callan J.A Language Modeling Approach to Entity Recognition and Disambiguation for Search Queries[C]. In: Proceedings of the 1st International Workshop on Entity Recognition & Disambiguation. ACM, 2014: 45-54. | [25] | Zhou D, Weston J, Gretton A, et al.Ranking on Data Manifolds[J]. Advances in Neural Information Processing Systems, 2004, 16: 169-176. | [26] | Singhal A.Modern Information Retrieval: A Brief Overview[J]. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering,2001,24(4):35-43. | [27] | Van de Cruys T. Two Multivariate Generalizations ofPointwiseMutual Information[C]. In: Proceedings of the Workshop on Distributional Semantics and Compositionality. Association for Computational Linguistics, 2011: 16-20. |
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|