New Technology of Library and Information Service  2016, Vol. 32 Issue (2): 34-42    DOI: 10.11925/infotech.1003-3513.2016.02.05
Auto-Correction Search Model Based on Statistics and Characteristics
Duan Jianyong(),
College of Computer Science, North China University of Technology, Beijing 100144, China
[Objective] This study aims to improve the precision, recall and user experience of the search engine. [Methods] We proposed an automatic query correction model based on the statistics and characteristics. First, established a model to generate the confusion query set for the users’ search terms, Then, created a ranking algorithm for the confusion set and chose the best match for the original queries. [Results] Our new model improved the search engine’s performance. The precision and recall rates were 92.2% and 95% on a testing set of 110k, which were 13.6% and 8.3% higher than those of the N-gram model. [Limitations] Our model only generated four types of words for the confusion set, and the training process required a lot of computation. [Conclusions] The new model can improve the precision, recall and user experience of the search engine.

Key wordsQuery correction      Confusion sets      N-gram model      N-gram similarity      Levenshtein Distance(LD)      Frequent click rate     
Received: 03 August 2015      Published: 08 March 2016

Duan Jianyong,. Auto-Correction Search Model Based on Statistics and Characteristics. New Technology of Library and Information Service, 2016, 32(2): 34-42.

