Data Analysis and Knowledge Discovery  2019, Vol. 3 Issue (6): 50-56    DOI: 10.11925/infotech.2096-3467.2018.1390
Semantic Matching for Sci-Tech Novelty Retrieval
Junliang Yao,Xiaoqiu Le()
(National Science Library, Chinese Academy of Sciences, Beijing 100190, China);(Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China)
[Objective] This paper tries to identify semantics similar to the novelty points from preliminary searching results, aiming to retrieve needed journal articles or patents automatically. [Methods] First, we designed a deep multi-task hierarchical classification model based on Bi-GRU-ATT. Then, we trained several different hierarchical classification models using International Patent Classification Table (IPC) categories and patents. Third, we used a small amount of paper data to fine-tune the model for papers and patents. Finally, we compared the semantic categories of new points and candidate records to collect the matching ones. [Results] With two-level classification of patents under IPC (E21B), the new model’s precisions were 82.37% and 73.55% respectively, which were better than the benchmark models. For real novelty search points data, the precision of semantic matching was 88.13%, which was 15.16% higher than that of TF-IDF. [Limitations] Only examined our model with a small amount of IPC categories . [Conclusions] The proposed method improves the semantic matching of novelty search points.

Key wordsSci-tech Novelty Retrieval      Semantic Matching      Multitask Learning      Bi-GRU-ATT     
Received: 10 December 2018      Published: 15 August 2019

