Please wait a minute...
New Technology of Library and Information Service  2016, Vol. 32 Issue (2): 25-33    DOI: 10.11925/infotech.1003-3513.2016.02.04
Orginal Article Current Issue | Archive | Adv Search |
Identifying Terminology from Search Engine Query Logs
Liu Tong,Ni Weijian(),Liu Mei
College of Information Science and Engineering, Shandong University of Science and Technolgoy, Qingdao 266590, China
Download: PDF(1915 KB)   HTML ( 62
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This study proposes a new approach to identify terminologies from search engine query logs for the purpose of improving traditional technology.[Methods]First, used the four-partite graph to re-present those query logs.Then,ranked the candidate terminologies with the help of manifold ranking algorithm. Those top ranked ones were domain-specified. [Results]We tested the proposed method with real search engine query logs and found the precision rates were about 20% higher than the standard approach. [Limitations] The coverage of those identified terminologies relies on the initial domain-specified queries manually chosen by the experts. [Conclusions]The proposed approach could build high quality domain thesaurus without pre-defined large domain corpus and annotations. Thus, the new method was more practical for real world issues.

Key wordsDomain terminology      Search engine      Query logs      Manifold ranking     
Received: 13 August 2015      Published: 08 March 2016

Cite this article:

Liu Tong,Ni Weijian,Liu Mei. Identifying Terminology from Search Engine Query Logs. New Technology of Library and Information Service, 2016, 32(2): 25-33.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2016.02.04     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2016/V32/I2/25

[1] 刘春燕, 安小米, 侯人华. 术语标准研制方法及在信息与文献领域中的应用[J]. 图书情报工作,2014,58(9):91-95.
[1] (Liu Chunyan, An Xiaomi, Hou Renhua.Vocabulary Standard Development Methodology and Its Application in the Information and Documentation Fields[J]. Library and Information Service,2014,58(9):91-95.)
[2] Caracciolo C, Stellato A,Morshed A, et al.TheAGROVOCLinked Dataset[J].Semantic Web, 2013, 4(3): 341-348.
[3] Bodenreider O.The Unified Medical Language System (UMLS): Integrating Biomedical Terminology[J]. Nucleic Acids Research, 2004, 32(S1): D267-D270.
[4] Bonin F, Dell’Orletta F, Venturi G, et al. A Contrastive Approach to Multi-word Term Extraction from Domain Corpora[C]. In: Proceedings of the 7th International Conference on Language Resources and Evaluation. 2010: 3222-3229.
[5] 化柏林. 针对中文学术文献的情报方法术语抽取[J]. 现代图书情报技术, 2013(6):68-75.
[5] (Hua Bolin.Extracting Information Method Term from Chinese Academic Literature[J]. New Technology of Library and Information Service, 2013(6):68-75.)
[6] 何远标, 乐小虬, 张帆. 学术论文大纲中关键术语抽取方法研究[J]. 现代图书情报技术, 2014(3):73-79.
[6] (He Yuanbiao, Le Xiaoqiu,Zhang Fan.Research on Keyphrase Extraction from Scholarly Article Outline[J]. New Technology of Library and Information Service, 2014(3):73-79.)
[7] 曾文, 徐硕, 张运良,等. 科技文献术语的自动抽取技术研究与分析[J]. 现代图书情报技术, 2014(1):51-55.
[7] (Zeng Wen,Xu Shuo,Zhang Yunliang,etal. The Research and Analysis on Automatic Extraction of Science and Technology Literature Terms[J].New Technology of Library and Information Service, 2014(1):51-55.)
[8] Dorji T C, Atlam E-S, Tata S, et al.Extraction, Selection and Ranking of Field Association (FA) Terms from Domain-specific Corpora for Building a Comprehensive FA Terms Dictionary[J]. Knowledge and Information Systems, 2011, 27(1):141-161.
[9] 屈鹏, 王惠临. 面向信息分析的专利术语抽取研究[J]. 图书情报工作, 2013, 57(1):130-135.
[9] (Qu Peng, Wang Huilin.Patent Term Extraction for Information Analysis[J].Library and Information Service, 2013, 57(1):130-135.)
[10] 谷俊, 王昊. 基于领域中文文本的术语抽取方法研究[J]. 现代图书情报技术, 2011(4):29-34.
[10] (Gu Jun, Wang Hao.Study on Term Extraction on the Basis of Chinese Domain Texts[J]. New Technology of Library and Information Service, 2011(4):29-34.)
[11] 闫兴龙, 刘奕群, 方奇,等. 基于网络资源与用户行为信息的领域术语提取[J]. 软件学报, 2013, 24(9):2089-2100.
[11] (Yan Xinglong, Liu Yiqun, Fang Qi, et al.Domain-Specific Terms Extraction Based on Web Resource and User Behavior[J]. Journal of Software, 2013, 24(9):2089-2100.)
[12] Jiang D, Pei J, Li H. Mining Search and Browse Logs for Web Search: ASurvey[J]. ACM Transactions on Intelligent Systems and Technology, 2013,4(4): Article No. 57.
[13] 季培培, 鄢小燕, 岑咏华. 面向领域中文文本信息处理的术语识别与抽取研究综述[J].图书情报工作,2010, 54(16):124-129.
[13] (Ji Peipei, Yan Xiaoyan, Cen Yonghua.A Survey of Term Recognition and Extraction for Domain-specific Chinese Text Information Processing[J]. Library and Information Service, 2010,54(16): 124-129.)
[14] 宋培彦, 路青, 刘宁静. 一种从术语定义句中自动抽取知识单元的方法[J]. 情报杂志, 2014, 33(4):139-143.
[14] (Song Peiyan, Lu Qing, Liu Ningjing.A New Method for Knowledge Unit Automatic Extraction Using Definitions of Terms[J]. Journal of Intelligence, 2014, 33(4):139-143.)
[15] 熊李艳, 谭龙, 钟茂生. 基于有效词频的改进C-value 自动术语抽取方法[J]. 现代图书情报技术, 2013(9):54-59.
[15] (Xiong Liyan, Tan Long, Zhong Maosheng.An Automatic Term Extraction System of Improved C-value Based on Effective Word Frequency[J]. New Technology of Library and Information Service, 2013(9): 54-59.)
[16] Foo J, Merkel M.Using Machine Learning to Perform Automatic Term Recognition[C]. In:Proceedings of the LREC 2010 Workshop on Methods for Automatic Acquisition of Language Resources and Their Evaluation Methods, Malta.2010: 49-54.
[17] Da Silva Conrado M, Pardo T, Rezende S O. A Machine Learning Approach to Automatic Term Extraction Using a Rich Feature Set[C]. In: Proceedings of NAACL HLT 2013 Student Research Workshop. 2013: 16-23.
[18] Loukachevitch N V.Automatic Term Recognition Needs Multiple Evidence[C]. In: Proceedings of the 8th International Conference on Language Resources and Evaluation. 2012: 2401-2407.
[19] Jiang D, Leung K W T, Yang L, et al. Query Suggestion with Diversification and Personalization[J]. Knowledge-Based Systems, 2015, 89: 553-568.
[20] Rose D E, Levinson D.Understanding User Goals in Web Search[C]. In: Proceedings of the 13th International Conference on World Wide Web. ACM, 2004:13-19.
[21] 翟海军, 郭嘉丰, 王小磊,等. 基于用户查询日志的命名实体挖掘[J]. 中文信息学报, 2010, 24(1): 71-76,116.
[21] (Zhai Haijun, Guo Jiafeng, Wang Xiaolei, et al.Mining Named Entities from Query Logs[J]. Journal of Chinese Information Processing, 2010, 24(1): 71-76,116.)
[22] Xu G, Yang S H, Li H.Named Entity Mining from Click-through Data Using Weakly Supervised LatentDirichletAllocation[C]. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2009:1365-1374.
[23] Jain A, Pennacchiotti M.Domain-independent Entity Extraction from Web Search Query Logs[C]. In: Proceedings of the 20th International Conference Companion on World Wide Web. ACM, 2011:63-64.
[24] Dalvi B, Xiong C, Callan J.A Language Modeling Approach to Entity Recognition and Disambiguation for Search Queries[C]. In: Proceedings of the 1st International Workshop on Entity Recognition & Disambiguation. ACM, 2014: 45-54.
[25] Zhou D, Weston J, Gretton A, et al.Ranking on Data Manifolds[J]. Advances in Neural Information Processing Systems, 2004, 16: 169-176.
[26] Singhal A.Modern Information Retrieval: A Brief Overview[J]. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering,2001,24(4):35-43.
[27] Van de Cruys T. Two Multivariate Generalizations ofPointwiseMutual Information[C]. In: Proceedings of the Workshop on Distributional Semantics and Compositionality. Association for Computational Linguistics, 2011: 16-20.
[1] Tong Guoping, Sun Jianjun. User Behavior Analysis Based on Search Engine Log[J]. 现代图书情报技术, 2015, 31(7-8): 80-88.
[2] Wang Xiwei, Zhao Dan, Yang Mengqing, Wei Junwei. Indices and Empirical Research on Search Engine Optimization of the Industry Websites: An Analysis from the Perspective of Information Ecology[J]. 现代图书情报技术, 2015, 31(3): 75-83.
[3] Chen Yong, Li Honglian, Lv Xueqiang. Analysis for the Search Behavior of Web Users[J]. 现代图书情报技术, 2014, 30(12): 10-17.
[4] Zhang Liyi, Chen Mingying. Research on the Sensitivity and Specificity of Search Engines[J]. 现代图书情报技术, 2011, 27(7/8): 41-46.
[5] Wang Jimin, Lilei Mingzi, Zhang Peng. Co-authorship Network Analysis in the Research Field of Search Engine’s Log Mining[J]. 现代图书情报技术, 2011, 27(4): 58-63.
[6] Zhang Hongbin, Cao Yiqin. A New Classifier Design in a Topic Search Engine by Combining Multi-layer Classifier with Naive Bayes Classification Model[J]. 现代图书情报技术, 2011, 27(3): 73-79.
[7] Zhou Zhicheng. Real-Time Search Suggestions Based on the Clustering of the User’ s Query Intent[J]. 现代图书情报技术, 2011, 27(2): 87-93.
[8] Ke Qing, Cheng Ying, Zheng Yanning, Pan Yuntao. Construction of the Usability Evaluation Indicators on Search Engine[J]. 现代图书情报技术, 2011, (11): 24-30.
[9] Jing Jing, Hong Ying, Jiang Yuanyuan, Gao Xiaofeng. Study on Web Retrieval Query Fusion Based on Relevance Feedback[J]. 现代图书情报技术, 2011, 27(1): 57-62.
[10] Guo Shaoyou. Research on Deep Web Surfacing Based on Common Search Engines[J]. 现代图书情报技术, 2010, 26(2): 24-30.
[11] Fu Zhenzhen,Lu Wei. The Search Engine Optimizing Strategy and Evaluation Based on Keywords[J]. 现代图书情报技术, 2009, 25(6): 61-65.
[12] Xu Fang. The Secondary Development of Site Search Based on Common Search Engines[J]. 现代图书情报技术, 2009, 25(5): 81-85.
[13] Miao Chen,Xiaozhong Liu,Jian Qin. Semantic Relation Extraction from Socially-generated Tags:A Methodology for Metadata Generation[J]. 现代图书情报技术, 2009, 3(3): 38-45.
[14] Xu Xin,Huang Zhongqing. Research on the Policy of Vertical Search Engine Application——An Example of 12580 Search Engine[J]. 现代图书情报技术, 2009, 3(2): 62-70.
[15] Tang Tianbo,Gao Feng. The Application of Visualization Technology in Link Analysis[J]. 现代图书情报技术, 2009, 3(2): 78-82.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn