[Objective] This study proposes a new approach to identify terminologies from search engine query logs for the purpose of improving traditional technology.[Methods]First, used the four-partite graph to re-present those query logs.Then,ranked the candidate terminologies with the help of manifold ranking algorithm. Those top ranked ones were domain-specified. [Results]We tested the proposed method with real search engine query logs and found the precision rates were about 20% higher than the standard approach. [Limitations] The coverage of those identified terminologies relies on the initial domain-specified queries manually chosen by the experts. [Conclusions]The proposed approach could build high quality domain thesaurus without pre-defined large domain corpus and annotations. Thus, the new method was more practical for real world issues.
刘彤,倪维健,柳梅. 面向搜索引擎查询日志的领域术语自动识别方法*[J]. 现代图书情报技术, 2016, 32(2): 25-33.
Liu Tong,Ni Weijian,Liu Mei. Identifying Terminology from Search Engine Query Logs. New Technology of Library and Information Service, 2016, 32(2): 25-33.
(Liu Chunyan, An Xiaomi, Hou Renhua.Vocabulary Standard Development Methodology and Its Application in the Information and Documentation Fields[J]. Library and Information Service,2014,58(9):91-95.)
[2]
Caracciolo C, Stellato A,Morshed A, et al.TheAGROVOCLinked Dataset[J].Semantic Web, 2013, 4(3): 341-348.
[3]
Bodenreider O.The Unified Medical Language System (UMLS): Integrating Biomedical Terminology[J]. Nucleic Acids Research, 2004, 32(S1): D267-D270.
[4]
Bonin F, Dell’Orletta F, Venturi G, et al. A Contrastive Approach to Multi-word Term Extraction from Domain Corpora[C]. In: Proceedings of the 7th International Conference on Language Resources and Evaluation. 2010: 3222-3229.
(He Yuanbiao, Le Xiaoqiu,Zhang Fan.Research on Keyphrase Extraction from Scholarly Article Outline[J]. New Technology of Library and Information Service, 2014(3):73-79.)
(Zeng Wen,Xu Shuo,Zhang Yunliang,etal. The Research and Analysis on Automatic Extraction of Science and Technology Literature Terms[J].New Technology of Library and Information Service, 2014(1):51-55.)
[8]
Dorji T C, Atlam E-S, Tata S, et al.Extraction, Selection and Ranking of Field Association (FA) Terms from Domain-specific Corpora for Building a Comprehensive FA Terms Dictionary[J]. Knowledge and Information Systems, 2011, 27(1):141-161.
(Yan Xinglong, Liu Yiqun, Fang Qi, et al.Domain-Specific Terms Extraction Based on Web Resource and User Behavior[J]. Journal of Software, 2013, 24(9):2089-2100.)
[12]
Jiang D, Pei J, Li H. Mining Search and Browse Logs for Web Search: ASurvey[J]. ACM Transactions on Intelligent Systems and Technology, 2013,4(4): Article No. 57.
(Ji Peipei, Yan Xiaoyan, Cen Yonghua.A Survey of Term Recognition and Extraction for Domain-specific Chinese Text Information Processing[J]. Library and Information Service, 2010,54(16): 124-129.)
(Song Peiyan, Lu Qing, Liu Ningjing.A New Method for Knowledge Unit Automatic Extraction Using Definitions of Terms[J]. Journal of Intelligence, 2014, 33(4):139-143.)
(Xiong Liyan, Tan Long, Zhong Maosheng.An Automatic Term Extraction System of Improved C-value Based on Effective Word Frequency[J]. New Technology of Library and Information Service, 2013(9): 54-59.)
[16]
Foo J, Merkel M.Using Machine Learning to Perform Automatic Term Recognition[C]. In:Proceedings of the LREC 2010 Workshop on Methods for Automatic Acquisition of Language Resources and Their Evaluation Methods, Malta.2010: 49-54.
[17]
Da Silva Conrado M, Pardo T, Rezende S O. A Machine Learning Approach to Automatic Term Extraction Using a Rich Feature Set[C]. In: Proceedings of NAACL HLT 2013 Student Research Workshop. 2013: 16-23.
[18]
Loukachevitch N V.Automatic Term Recognition Needs Multiple Evidence[C]. In: Proceedings of the 8th International Conference on Language Resources and Evaluation. 2012: 2401-2407.
[19]
Jiang D, Leung K W T, Yang L, et al. Query Suggestion with Diversification and Personalization[J]. Knowledge-Based Systems, 2015, 89: 553-568.
[20]
Rose D E, Levinson D.Understanding User Goals in Web Search[C]. In: Proceedings of the 13th International Conference on World Wide Web. ACM, 2004:13-19.
(Zhai Haijun, Guo Jiafeng, Wang Xiaolei, et al.Mining Named Entities from Query Logs[J]. Journal of Chinese Information Processing, 2010, 24(1): 71-76,116.)
[22]
Xu G, Yang S H, Li H.Named Entity Mining from Click-through Data Using Weakly Supervised LatentDirichletAllocation[C]. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2009:1365-1374.
[23]
Jain A, Pennacchiotti M.Domain-independent Entity Extraction from Web Search Query Logs[C]. In: Proceedings of the 20th International Conference Companion on World Wide Web. ACM, 2011:63-64.
[24]
Dalvi B, Xiong C, Callan J.A Language Modeling Approach to Entity Recognition and Disambiguation for Search Queries[C]. In: Proceedings of the 1st International Workshop on Entity Recognition & Disambiguation. ACM, 2014: 45-54.
[25]
Zhou D, Weston J, Gretton A, et al.Ranking on Data Manifolds[J]. Advances in Neural Information Processing Systems, 2004, 16: 169-176.
[26]
Singhal A.Modern Information Retrieval: A Brief Overview[J]. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering,2001,24(4):35-43.
[27]
Van de Cruys T. Two Multivariate Generalizations ofPointwiseMutual Information[C]. In: Proceedings of the Workshop on Distributional Semantics and Compositionality. Association for Computational Linguistics, 2011: 16-20.