%A Hao Hui %T A Duplicate Removal Algorithm of Cross-database Search Based on Sci-tech Novelty Retrieval %0 Journal Article %D 2015 %J Data Analysis and Knowledge Discovery %R 10.11925/infotech.1003-3513.2015.01.13 %P 89-95 %V 31 %N 1 %U {https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/abstract/article_4005.shtml} %8 2015-01-25 %X

[Objective] Remove the data redundancy of cross-database searching in sci-tech novelty retrieval and improve the retrieval efficiency. [Methods] Choose thesis names, serial titles, publication dates and first authors of search records from different databases and build the character strings of search records by modifying comparison algorithm related to I-Match as the evidence of duplicate removal. [Results] The duplicate removal algorithm can improve retrieval effeciency by analyzing and duplicating the retrieval results from different databases. The experient suggests the precision of algorithm is superior, while the recall of the algorithm could be improved by modifying database records. [Limitations] The treatment effect depends on four characters extracted from database search records, different feature extraction model of search records needed to be customized according to different thesis databases due to the search result diffenrence. [Conclusions] The experiment test suggests the algorithm has a decent precision of duplicate removal and treatment efficency, which accords with the requirement of sci-tech retreival.